6,659 Matching Annotations
  1. Apr 2026
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. As the reviewer points out, the two cell lines need typical culture conditions to grow in an in-vivo like architecture, such as acinar structures for mammary tissue, and a pseudostratified architecture for the bronchial tissue, and it certainly would be interesting to subject the cell lines in these organotypic architectures and study the fate of oncogenic mutant cells. However, this would be an independent study on its own and is out of the scope of the current manuscript. Here, we intend to compare these two well-established epithelial lines from mammary and bronchial epithelial tissues, with distinct intrinsic mechanical and organisational properties, in minimal culture conditions, and study how just the context of having two different sources of epithelial cells can change the fate of oncogenic cells present in the wild-type population. We have now also performed experiments with the MDCK cell line, which is not like the BEAS2B line, and has well-defined cell-cell adhesions [Supplementary figure. 4a], and epithelial morphology, and shown that the fate of HRasV<sup>12</sup> mutants is different here as well, as compared to the MCF10A cell line.

      (2) As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in the segregation of oncogenic cells.

      We agree with the reviewer and in line with this suggestion, we have repeated the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. Our results show that even though MDCK cells show significantly distinct properties compared to BEAS2B cells (MDCK being more epithelial like than BEAS2B), the dynamics of the HRasV<sup>12</sup> clusters in both these systems are similar [Supplementary figure. 4b], and distinctly different from the mammary epithelial cells (MCF10A). We did not observe the formation of an actin belt around HRasV<sup>12</sup> clusters in MDCK monolayers, which indeed forms in MCF10A monolayers. Additionally, in MDCK cells, the HRasV<sup>12</sup> mutant clusters are not under compaction or jamming, instead, they form protrusions similar to the ones seen in BEAS2B monolayers. These results solidify our hypothesis of tissue-specific differences in the mechanics of cancer initiation.

      (3) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a foundational formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions.

      While in its original form, DITH emphasised segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues, our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells. These local interfacial dynamics, however, is related to global contractility of cells- From our experiments with blebbistatin, we have observed a loss in the efficiency of segregation upon reducing global contractility, consequently inhibiting the formation of the interfacial actomyosin belt, which serves as the source of the interfacial tension between healthy and mutant populations. Therefore, the differences in local interfacial mechanics stem from intrinsic global contractility of cells in discussion here.

      We have also clarified this distinction more clearly in the discussion and have explicitly stated that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that a higher efficiency of segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (4) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that a detailed visualisation of actomyosin distribution would strengthen our conclusions. We have now added a few more images of the interface to the Supplementary Data [Supplementary figure. 5], which show that cortical actin accumulates in individual cells, at the wild type cell-mutant cell interface, and actin levels go up in both wild type and mutant populations at the interface. This is also clear from the quantifications of different region of interests [Figure 2e], which is done by segmenting individual cells in these regions and quantifying actin intensity in each cell.

      (5) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      We thank the reviewer for raising this important point. While a direct experimental dissection of how HRasV<sup>12</sup> mutation affects actin levels in BEAS2B and MCF10A cells individually is beyond the scope of the present study, we do not rule out the possibility that a HRasV<sup>12</sup> mutation may exert cell-type-specific biochemical effects on actin regulation in these two epithelial systems.

      Although the difference in actin between the mutants and the wild-type cells has not been incorporated into the model presented in the manuscript, we have now shown how actin levels change in response to the interfacial tension formed between the mutant and wildtype cells by adding a mechanochemical feedback to the model. Rather than prescribing intrinsic differences in actin levels between mutant and wild-type cells, we asked whether the feedback between the actin cytoskeleton and mechanical stress alone is sufficient to generate the observed actin reorganization. To address this, we incorporate a mechanochemical feedback loop (MCFL-I), originally developed in our earlier work [35], into the vertex model framework. This feedback captures the experimentally observed coupling between cell shape, actomyosin organization, and mechanical stress (i.e., heterotypic interfacial tension), and has previously been shown to reproduce biologically realistic epithelial behaviours such as dynamic cell shapes and heterogeneous actomyosin distributions [35].

      In this framework, actin is not introduced as an explicit or intrinsic variable. Instead, changes in actomyosin organization emerge dynamically in response to mechanical stresses. Specifically, MCFL-I allows the preferred area and preferred perimeter of cells to evolve depending on cell shape and actomyosin binding, rather than remaining fixed. From these evolving parameters, we compute the normalized contractility, , which we interpret as a proxy for bulk actin, and normalized line tension which we interpret as a proxy for junctional actin. These normalized quantities provide size-independent measures of actomyosin organization across the tissue. 

      The equations for MCFL-I can be written as:

      Thus, with MCFLs, the vertex model does not have fixed 𝐴<sub>0</sub> and 𝑃<sub>0</sub>. The cells dynamically change these parameters depending on the vertex model dynamics. The constitutive relations for the and are given below [1]:

      Here, is the fraction of myosin bound to actin as a function of cell area 𝐴. This nonlinear dependence arises from the load or strain-dependent binding of myosin to actin, and is a model parameter which is proportional to the binding affinity of myosin to actin in the absence of any strain. We consider to the be the same for both mutant and wild-type . Importantly, both mutant and wild-type cells obey identical mechanochemical rules in the model. Differences in actin organization arise solely due to differences in mechanical stress generated by differential interfacial tension. Positive differential interfacial tension compresses mutant cells within clusters. This will lead to different and P<sub>0>/sub> across the monolayer via MCFL-I, and thus reduced bulk actin and increased junctional actin [Appendix figure. 4], consistent with experimental observations. Conversely, when differential interfacial tension is weak or negative, mutant and wild-type cells experience similar stresses, and the model predicts minimal differences in actin organization [Appendix figure. 5].

      Thus, while HRasV<sup>12</sup>-dependent biochemical effects may indeed differ between BEAS2B and MCF10A cells, our results demonstrate that mechanical interactions at mutant– wild-type interfaces are sufficient to generate distinct actin signatures in the two tissues, without invoking cell-type-specific actin regulation. We have added the details of the mechanochemical feedback loop in the model to the Appendix to emphasize that the model tests the sufficiency of mechanics-driven actin reorganization rather than excluding additional biochemical contributions. 

      Although it looks that even for Λ > 0 we see that the normalized line tension seems to be negative. This is however just an artefact of the colorbar limits we have used to compare with the Λ < 0 case. If we plot with different colorbar limits, we see that the interface has as shown in Author response image 1.

      Author response image 1.

      Reviewer #2 (Public review):

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling.

      The reviewer is correct in stating that most of the observed behaviour is already captured by the differential tension, without the shape-tension coupling. However, the shape tension coupling has been used here in accordance with the experimental observation that the cells at the interface are aligned and elongated along the interface [Fig. 2h], which can not be captured without the shape-tension coupling. The difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Appendix figure 2]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Appendix figure 3], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments [Fig. 2h].

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling [Appendix figure 3] clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [45] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Appendix.

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The difference in shape indices between the interfacial and bulk cells in simulations has now been calculated over 11 different seed values. The observed differences in simulations, along with the standard deviations have been plotted in Figure 4b. This figure will be updated to include the standard deviations. The nonzero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling. This has also been stated in section 4 of the paper. The importance of the shape-tension coupling has been stated in response to the previous point.

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      The line tension for each edge can be calculated as .

      Although the line tension distributions look similar to the ones obtained from Bayesian Force Inference, a better comparison is between the normalized line tension and actin seen in experiment as we have discussed under point (4) asked by Reviewer 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors claim that the negative tension Lambda<0 resembles the Beas2b phenotype. This is not consistent with the expression of actin in Figure 2f, which seems very similar in all four regions of interest (ROIs). Also, the segregation index data for Beas2b in Figure 1h looks very different from the demixing parameter in Figure 4f for the negative value of Lambda.

      In the model presented in the previous version of the manuscript, actin differences have not been incorporated. We have only added an interfacial line tension, which might arise only at the interface between cells. In response to comment (4) from Reviewer 1, we have considered a vertex model with mechanochemical feedback and interfacial line tension to understand how actin distribution in the tissue is affected by interfacial tension. The results presented match very well with experimental images.

      The reviewer has rightly pointed out that the segregation index (SI) data presented in Fig. 1h have a different trend compared to those in Fig. 4f. However, it is essential to note that in the simulation, the initial condition is one in which the mutant cluster is already fully segregated, and thus, at the initial time point. This is not the case in experiments, and at initial time points. Thus, the two plots are not directly comparable and only show how SI changes in our simulations. It is more effective to compare the final time points in Fig. 2f with those in Fig. 4e, where we observe that Mcf10a has a higher SI compared to Beas2b, and the case with Λ > 0 has a higher SI than the case with Λ < 0. This supports our claim that Λ < 0 resembles the Beas2b phenotype and Λ > 0 resembles the Mcf10a phenotype.

      (2) It is unclear how the threshold pressure Pi_0 is implemented for the shape-tension coupling in the vertex model. Is the value of the additional tension gamma_ij equal to 0 if the internal pressure is below that threshold?

      The stress threshold is implemented for the shape-tension in the vertex model in the following way. The line tension forces can be written as:

      where, and . If the stress on the cell is below the threshold, then for those cells.

      (3) In vertex model simulations, the authors use identical parameters for wild-type and mutant cells. This does not seem to be consistent with experimental observations in Figure 2, where the expression of actin is different, and also, cell shape indices are different for the wild-type and mutant cells. The authors should comment on how that choice affects their simulation results.

      We thank the reviewer for this comment. As noted in our response to comment 4 from  reviewer 1, we have now attempted our simulations after adding a mechanochemical feedback to the model. Here, both wild-type and mutant cells follow identical mechanochemical rules within the vertex model. This choice does not imply that the cells are mechanically identical in the tissue; rather, it allows us to test whether differences in cell shape and actin organization can emerge purely from mechanical interactions.

      By incorporating the mechanochemical feedback loop (MCFL-I), the model captures how heterotypic interfacial tension redistributes mechanical stresses between mutant and wild-type cells. These stresses lead to differences in cell area, perimeter, and shape, which are then translated via MCFL-I into distinct bulk and junctional actin signatures. Consequently, even though the intrinsic parameters are the same, the emergent mechanical environment reproduces the experimentally observed differences in actin intensity and cell shape indices (as shown in Figure 2).

      Thus, our approach demonstrates that the experimentally observed heterogeneity between mutant and wild-type cells can arise solely from interface-driven mechanical effects, without prescribing any cell-type-specific parameters in the model.

      (4) Also provide data for cell line tensions in the vertex model, which can then be compared with the experimental data in Figure 2. This is especially important because the differential cell line tension at the interface of mutants and wild-type cells seems to be playing a very important role.

      The cell tensions from the vertex model have been plotted in the response to main comment (3) from Reviewer 2. Since the interfacial tension has been included as an extra term in the vertex model by hand, it is not trivial to simply compare the line tensions from the vertex model to the experimental data. However, we can understand how the tensions are by looking at the normalised tension and normalised contractility plotted as a response to comment (4) from Reviewer 1. Those plots are from a vertex model with mechanochemical feedback and the plots match well with experimental actin images.

      (5) In Figure 2j, the authors should report the relative cell pressure and line tension for all four ROIs. The data is only shown for the wild-type cells and for mutants in clusters, even though the figure caption states that the data is presented for all four ROIs. It would also be useful to report the cell tension at the interface between the mutant cells and wild-type cells since this is the key parameter for the vertex model simulations.

      We agree and have updated the graph [Figure 2j].

      (6) The tangential motion of cells around oncogenic clusters only shows up towards the end of Supplementary Video 3. It is unclear whether this is a transient effect or whether this tangential motion would persist for a longer time.

      We thank the reviewer for raising this point. In our experiments, tangential cell motion in the wild type population along the boundary of oncogenic cluster consistently emerges as the oncogenic cluster becomes compacted. We have plotted tangential velocity in interfacial wild type cells over time (Supplementary Fig. 6b), and show that such a motion persist at the cluster-wild-type interface, until the end of time-lapse recordings in all cases. 

      (7) It is very awkward that the authors are representing an integral of the tangential velocity over different loops in Figures 3c and 4i. Thus, it is very hard to separate how much of the increase in the integrated velocity is due to larger loops and how much is due to changes in the average tangential velocity. Since different loops have different perimeters, it would have been better to report the average tangential velocity by dividing the integrated tangential velocity by the perimeter length of each loop. In the methods, the authors state that the concentric circles go from the center to a point twice the radius of the mutant cluster, but this is not consistent with the image in Figure 3c, where the concentric circles seem to go only to the boundary of the mutant cluster.

      We thank the reviewer for raising the point regarding the dependence of the loop-integrated tangential velocity on the perimeter length. While the circulation (loop-integrated tangential velocity) indeed scales with loop size, it increases with radius only if tangential velocity components are directionally coherent along the loop.

      In our data, concentric-loop analysis centered on mutant clusters reveals a systematic increase in tangential motion with radius, with the largest values occurring at the outermost loops corresponding to the cluster–tissue interface. In contrast, applying the identical analysis to randomly selected wild-type regions does not yield any monotonic increase with radius, despite the increasing perimeter of the loops, and instead shows fluctuations around zero. This control demonstrates that the observed increase around mutant clusters is not a trivial geometric consequence of larger loop size but reflects the emergence of coherent tangential motion specifically at the mutant cluster boundary.

      To further address the reviewer’s concern, we additionally computed the mean tangential velocity by normalizing the loop-integrated tangential velocity by the loop perimeter. As shown in Supplementary figure. 6a, this normalization preserves the same qualitative trend: tangential motion peaks near the periphery of mutant clusters, whereas no such trend is observed in wild-type regions. We therefore conclude that both metrics capture the same physical phenomenon: enhanced tangential cell motion localized to the mutant cluster boundary, consistent with the behavior observed in the time-lapse videos.

      Author response image 2.

      From simulation data

      (8) The authors should comment on how jamming and unjamming are related to shape indices because some readers may not be familiar with them.

      We have updated the same in the text of Results 2.

      (9) In the captions of Figure 3, the authors state that the bronchial epithelium gets kinetically arrested. This is not evident from the data in Figure 3d, where the velocity magnitude drops just a little bit for the bronchial epithelium, and it remains much higher compared to the mammary epithelium at long times.

      We agree with this comment, and that using the word, kinetically arrested, for Beas2b cells is misleading, since their motion is much higher, even after the initial drop. We have updated the text in the caption accordingly.

      (10) It is unclear why the authors have used the segregation index for analyzing experiments and the demixing parameter for analyzing simulations. Both parameters are trying to quantify the same thing, so it would have been better to use the same quantity for both experiments and simulations to enable easier comparison.

      We agree that using the same quantity for both experiments and simulation would enable easier comparison. Thus, we have replaced the demixing parameter with segregation index in Figure 4. 

      (11) It is unclear what experimental data were used for shape indices in Figure 4c. Was it the data from Mcf10a or Beas2b? It is also unclear which ROIs were used because different ROIs have very different shape indices in experiments, according to Figure 2e,f.

      We have used the experimental ∆(𝑆ℎ𝑎𝑝𝑒 𝑖𝑛𝑑𝑒𝑥) = 0.75, which is a rough estimate of the difference between the shape indices for ROI 2 (interface), and ROI 1, ROI 3 and ROI 4 (away from interface) from Fig. 2 e for MCFL10a. 

      (12) The authors find that the differences in shape indices are non-zero even for Lambda=0 for some threshold pressure parameters Pi_0 in Figure 4c. This should not happen because all the cells are identical in that case. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. How is this simulation data obtained? Is it from a single simulation, or is this averaged over a certain number of simulations? Authors should perform multiple simulations and report both the mean values and the standard deviation.

      We have addressed this in the response under main comments (1) and (2) from Reviewer 2.

      (13) It is unclear how the cell extrusion was simulated in the vertex model.

      Extrusion probability calculation: Simulations with just a single mutant cell were run for a range of differential interfacial line tension values (Λ = 0, 0.1, 0.4, 0.8, 1.2, 1.6) with shape tension coupling. The simulation was run till the area of the mutant cell fell below a threshold area = 0.1, after which we consider the mutant cell to be extruded. 9 different random initial seeds were run and analysed. Each seed gives a binary result – either extruded or not. This was used to calculate the extrusion probability. We have added this section to the Appendix.

      (14) The authors claim that HRas^V12 clusters in bronchial epithelium grew on top of one another, but it is not clear how this can be observed in Figure 2b or in any other Figure.

      We thank the reviewer for raising this point. Our original statement that cells were growing on top of each other was based on observations from the Z-stack images, which allowed us to resolve cell positions along the apico–basal axis. However, since these Zstack data are not included in the current manuscript, we agree that this claim cannot be directly supported by the figures shown. We have therefore removed this statement from the text and restricted our conclusions to what is directly supported by the presented data.

      (15) In the main text, the authors state that bronchial epithelial cells exhibited higher F-actin intensities compared to mammary bronchial cells, but this difference is not statistically significant according to Figure 5e.

      We agree with the reviewer and have thus changed the text because even though the Factin intensities seemed higher in bronchial epithelium visually, the difference was not statistically significant.

      (16) The definition of eccentricity is incorrect in the text. The authors state that the eccentricity is quantified as the ratio of the length of the minor axis to the major axis of an ellipse. According to this definition, the eccentricity would be 1 for a circle and not 0.

      We have updated the definition of eccentricity in the text to the correct one, including the correct equation.

      (17) It is unclear whether the active force F_act is used in the vertex model simulations. The active force is defined, but then its value is never specified. Note that the motility force is also an active force, so it is unclear why the motility and active forces were separated.

      In our model, the line tension force arising from the shape tension coupling is the active force. We agree that the motility force is also an active force, however, in the absence of any directional movement for instance, the homeostatic tissues in discussion here, we have discounted the role of motility force in our mode, presented here. 

      (18) The authors use inconsistent naming for different types of epithelia throughout the manuscript. Mcf10a cells are referred to as either mammary epithelium or breast epithelium, and Beas2b cells are referred to as either lung epithelium or bronchial epithelium. Because of the very broad spectrum of journal readers, it may not be obvious to all readers that different names refer to the same cell types.

      We have updated the text to keep the naming consistent throughout.

      (19) Many references to individual figure panels in the main text are incorrect. The authors should carefully check all the references to figures.

      We apologize for these errors. We have updated the incorrect references after carefully reviewing the entire manuscript.

      (20) In Figure 5, panel b is incorrectly labeled as d.

      We have corrected the same.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without the removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones, are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Weaknesses:

      It is clear that, due to the relatively long time intervals between RF excitation and signal readout, standard hardware in whole-body MRI systems can only be used to examine surrounding water and not hydrogen bound to collagen molecules.

      We agree that this is a regrettable situation (see also Discussion section). We are hoping that current and future efforts of MRI manufacturers towards improved hardware will eventually enable the technique for broader application.

      Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing. Through a series of thoughtfully prepared experiments, I found the evidence that the images reflect direct measurements of collagen to be highly compelling.

      Due to the technical demands, direct collagen imaging is unlikely to become widespread for routine clinical work, at least not anytime soon. That said, this work is nonetheless transformative and will likely be highly significant for research and perhaps clinical trials.

      Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      We have added arrows in the images showing the direction of the main magnetic field. For the in vivo case, the subject lay in the superman position, with B0 pointing from the hand towards the shoulder.

      (2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.

      We have added a distal view of the forearm, where more tendon structures are observed.

      (3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.

      This is certainly an interesting take. While the magic angle effect is well established for collagen bound water, the orientation effects on the macromolecular collagen signal are still to be investigated. Our initial experiences so far suggest that the direct collagen signal is not as sensitive to orientation as the bound water.  

      Regarding the described observation for the interosseous membrane, we expect the high signal coming from collagen-bound water (yet not quite at the magic angle), which hardly decays between the two TEs, as their difference is small as compared to the T2* of this signal. Hence, this signal is removed in the subtraction image, and only the macromolecular collagen signal remains, which appears to be very low. Working with samples of the interosseus membrane may provide further insights into why this is the case.

      (4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.

      We have added the aponeurosis as a possible signal contributor in the muscle tissue.

      (5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      We have updated the label accordingly.

      (6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      The described effect is clearly of importance for bone water but plays a negligible effect for the macromolecular signal. We would like to support this by a brief, coarse estimation. 𝑇<sub>2</sub>* can be approximated by 1/𝑇<sub>2</sub>* = 1/𝑇<sub>2</sub> + 1⁄𝑇<sub>2</sub>′, where 1⁄𝑇<sub>2</sub>′ \= 𝛾∆𝐵 = 𝛾∆𝜒𝐵<sub>0</sub> (Ref. 1).

      The susceptibilty difference reported for the interface between bone and water is ∆𝜒 = 2.5 ppm (Refs. 2 and 3), which at 3T leads to a 𝑇<sub>2</sub>′ ≈ 3000 𝜇𝑠. From our recorded FIDs, we use a 𝑇<sub>2</sub>* of 10 μs and thus obtain 𝑇<sub>2</sub> \= 10.03 𝜇𝑠.

      As can be seen, the change in the transverse relaxation constant due to susceptibility is negligible compared to the intrinsic decay of the macromolecular collagen signal. Notably, this is not the case for the pore water signal where T<sub>2</sub>s are on the order of milliseconds (Ref. 2).

      A footnote was added in the Introduction section regarding this topic.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      This is an important point regarding image interpretation. For this reason, it is advantageous that also the original anatomical images prior to subtraction are available, which will show such effects. They can be used in conjuction with the collagen-specific image to provide further insights regarding tissue disease. Increased T<sub>2</sub>* of diseased tissue has so far been reported for the bound water components due to a reduction of dipolar interactions between bound water and collagen (Ref. 4). A potential related change in T<sub>2</sub> for the macromolecular collagen component itself is certainly of interest and an avenue to explore in future work.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      We are aware of this important distinction. We therefore checked the manuscript for references to fibrosis, making sure that the meaning is as intended.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It should be stated that various methods with very short echo times (e.g. SWIFT by Garwood et al.) have been described in the past. This work shows for the first time that direct signals from collagen and be systematically detected in tissue samples.

      We have expanded a sentence in the introduction and reference selected publications studying short-T<sub>2</sub> water signal in collagen, including SWIFT.

      (2) It should be noted that the 1H atoms bound to collagen are located at different sites (at different amino acids of the protein) of the molecule and have different frequencies, and that further signal analyses are of interest.

      We have included additional information regarding distinct resonances of proton-binding sites of collagen in the introduction. The discrete observation of such signals requires advanced NMR methodology such as magic-angle spinning and RF decoupling, which is not a suitable approach for in vivo MRI. Without such methods, the broad lineshapes overlap strongly and are rather observed as a single decaying exponential with the dipolar oscillation as we observe in the FIDs.

      (3) Is it certain that the bump at 30 microseconds comes from 'dipolar coupling'? Is the development time probably too short for chemical shift-induced interference or J-coupling effects?

      30 microseconds is an extremely short interval to accumulate phase and requires large resonance offsets to observe significant changes. To investigate the nature of the bump, we also collected data on a Bruker 7T NMR spectrometer (see Author response image 1). Overall the same signal characteristics are observed as with 3T. In particular, the position of the bump is the same, excluding chemical shift as as source. However, with the higher field strength, chemical shift becomes significant for the signal phase, as observed by the change in the phase behavior at 50 microseconds, when the collagen component has decayed.

      While J-coupling is independent of field strength, the typical ranges are single-digit to tens of Hertz. In contrast, dipolar coupling interacts on the order of thousands of Hertz, which coincides with the values extracted from our signal model.

      To clarify this point, we extended the respective sentence in the Results section.

      Author response image 1.

      (4) It should be noted that short RF pulses have a relatively high energy content, and whether there are any particular stresses on patients during the examination (SAR, nerve stimulation?).

      SAR is an important issue in ZTE MRI. Since imaging bandwidths are large and excitation is performed with the imaging gradient being on, broadband pulses are necessary. Hence, significant RF deposition occurs and in vivo the flip angle can often not be optimized for the maximum signal, but will be limited by the SAR limit. We have added an explanation in the Discussion section.

      Peripheral nerve stimulation is generated by rapid switching of strong gradients. However, ZTE sequences are usually operated without switching gradients on and off, but with only minor adjustments of the gradient direction between TR intervals. Therefore, PNS is not a relevant issue.

      (5) In the Results section, Part B, 'substantial signal intensity' should be written instead of 'substantial image intensity'.

      We have changed this as suggested.

      References

      (1) Chavhan GB, Babyn PS, Thomas B, Shroff MM, Haacke EM. Principles, techniques, and applications of T2*-based MR imaging and its special applications. Radiographics. 2009 Sep-Oct;29(5):1433-49. doi: 10.1148/rg.295095034. PMID: 19755604; PMCID: PMC2799958.

      (2) Seifert, AC, Wehrli, SL, and Wehrli, FW (2015), Bi-component T<sub>2</sub>* analysis of bound and pore bone water fractions fails at high field strengths. NMR Biomed., 28, 861– 872. doi: 10.1002/nbm.3305.

      (3) Hopkins JA, Wehrli FW. Magnetic susceptibility measurement of insoluble solids by NMR: magnetic susceptibility of bone. Magn Reson Med. 1997 Apr;37(4):494-500. doi: 10.1002/mrm.1910370404. PMID: 9094070.

      (4) Loegering IF, Denning SC, Johnson KM, Liu F, Lee KS, Thelen DG. Ultrashort echo time (UTE) imaging reveals a shift in bound water that is sensitive to sub-clinical tendinopathy in older adults. Skeletal Radiol. 2021 Jan;50(1):107-113. doi: 10.1007/s00256-020-03538-1. Epub 2020 Jul 8. PMID: 32642791; PMCID: PMC7677198.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Based on the effects observed with OC vs. Ntf3 cKO, it is unclear whether OC is indeed exerting its non-cell-autonomous effects via Ntf3. Knocking out both Ntf3 and OC and comparing the effects to those seen with just OC cKO alone could provide more insight on this point.

      In this study, we did not intend to demonstrate that Onecut transcription factors exert their non-cell autonomous action on spinal interneuron development by regulating Ntf3 expression, and we do not state in the manuscript that this is the case. We only show that Onecut factors and Ntf3, the expression of which they regulate, contribute to the non-cell autonomous regulation of spinal interneuron development by the motor neurons. We are convinced that Onecut factors could regulate multiple independent factors and pathways involved in extrinsic regulation of interneuron development, as supported by the regulation of multiple secreted factor or membrane protein expression in motor neurons detected in the reported RNA-sequencing experiment (this manuscript and [1]). This possibly also includes, as demonstrated in cell culture for multiple homeoproteins including human Onecut factors [2], the intercellular transfer of the Onecut homeoproteins during spinal cord development, a process that we are currently investigating. Knocking out both OC and Ntf3 in the motor neurons, beyond being technically extremely challenging (1/64 probability to obtain triple-mutant embryos), would not enable to address this question, as it will simply results in the addition of two different defects.

      Also, a quantitative summary of the effects of Ntf3 overexpression in motor neurons in the chick is lacking.

      A quantitative summary of the effects of Ntf3 overexpression in the chicken embryonic spinal cord is provided in Figure S2.

      (2) How the authors assess changes in the spatial distribution of interneurons is unclear. In Figures 2 and 4, the control distributions (despite reporting the same populations in the same regions) look different, suggesting large sample-to-sample variance in distribution. Although the authors report that several sections in each level were taken from at least three animals for each condition, it's unclear how variance within WT or cKO sections was accounted for in the final statistical evaluation. It seems at a glance that a comparison between control samples in Figure 2 and Figure 4 could report statistically significant differences, which would be problematic. A more rigorous report of sample-to-sample variance and a more in-depth explanation of the statistical methods are needed.

      The experimental procedure to analyze the spatial distribution of spinal interneurons at different stages of development is described in details in the “Statistical analyses” paragraph of the Materials and Methods section of the manuscript, and has been repeatedly used by ourselves [3,4] and by others (see for example [5-7]) to conduct similar analyses.

      We also noticed that the distribution of the different analyzed interneuron populations in the control embryos showed some differences between the cOc1Oc<sup>2-/-</sup> and the cNtf3<sup>-/-</sup> lines. Several parameters can account for this observation. First, this study has been conducted over a period of 15 years, different investigators each contributing to different steps of the analysis. Second, the genetic background of these two lines is not identical, impacting both the duration of the gestation (hence, the embryonic stage of the performed analyses, even if the embryos were collected on the same gestation day) and possibly the distribution of some interneuron populations. Third, because of evolutions in the availability of the primary antibodies used to label the interneuron populations of interest, the same antibodies were not used throughout the study, as stated in the Materials and Methods section, although the same antibody was used by the same investigator to label the same interneuron population in each mouse line at each developmental stage.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses, which takes into account variance within control or mutant samples, will be provided in the revised version of the manuscript.

      Reviewer #2 (Public review):

      (1) The study primarily quantifies interneuron numbers and distribution at different levels of the spinal cord and under different genetic manipulations. Experimental details are lacking, defining how many sections were analyzed (several are noted in the methods) and how the rostrocaudal levels of the spinal cord were precisely aligned.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses will be provided in the revised version of the manuscript. The rostrocaudal levels of the spinal cord were precisely aligned using the distribution of Foxp1 in the Lateral Motor Columns (LMCs) at brachial or lumbar levels of the spinal cord [8,9], which will also be indicated in the revised version.

      In different figures, the values and distributions shown for controls vary quite a lot. For example, in Figure 2B vs Figure 4B, the number of FoxP2+ V1 neurons at brachial levels is ~350 vs 125. Similarly, the control distributions in 2I and 4I are quite different. This makes it challenging to determine whether the conclusions regarding the impact of each genetic manipulation on interneuron numbers and distribution are valid.

      Multiple factors may explain these observations. First, this study spans a 15-year period, with different researchers contributing to various stages of the analysis. Second, the genetic backgrounds of the two mouse lines are not identical, affecting both gestation length (thus influencing the embryonic stage at which analyses were performed, even when embryos were collected on the same gestational day) and potentially the distribution of certain interneuron populations. Third, due to changes in the availability of primary antibodies used to label the targeted interneuron populations, the same antibodies were not consistently employed throughout the study as noted in the Materials and Methods section though each investigator used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) The relationship between OC and NT3 deletion data is not entirely clear. Both deletions presumably lead to changes in interneuron distribution, but is there any reverse relationship between the two that relates to relative changes in NT3 levels? The authors do not directly compare NT3 and OC KO IN distributions. Similarly, one might expect a decrease in interneuron numbers in OC mutants, which is only reported for V2c neurons. However, the image presented in Figure 2G shows an equal number of V2c INs in control and mutant.

      This study was not designed to demonstrate that Onecut transcription factors influence spinal interneuron development in a non-cell-autonomous manner through Ntf3 regulation, nor do we claim this in the manuscript. Instead, we show that Onecut factors and Ntf3, whose expression they control contribute to the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We believe Onecut factors may regulate multiple independent factors and pathways involved in the extrinsic control of interneuron development. For instance, as noted earlier [2], we observed intercellular transfer of Onecut homeoproteins during spinal cord development, suggesting alternative mechanisms for non-cell-autonomous regulation.

      The two mouse lines studied here consist, on the one side, in a combination of OC inactivation and Ntf3 increased expression, and, on the other side, in Ntf3 inactivation. Therefore, a reverse relationship between the changes in interneuron distribution is not expected. Furthermore, gain-of-function and loss-of-function experiments in mouse models frequently generate phenotypes that are not inverse to each other [10-13].

      (3) It is not clear that the behavioral phenotypes seen in the olig2-cre mediated deletion of NT3 can be attributed to changes in interneuron development. How about a role of NT3 in oligodendrocytes? There is a big gap between the embryonic changes shown here and behavior, with no in-between circuit-level changes in locomotor circuits shown.

      We agree, the motor behavior changes that we recorded in Ntf3 conditional mutant mice are, as stated, “consistent with the hypothesis that Ntf3 produced by MNs is required to generate locomotor circuits with properly coordinated activity” but do not demonstrate a direct causal relationship. However, investigating the intrinsic activity of the spinal locomotor circuits, independently from, for example, oligodendrocyte contribution may prove to be extremely challenging and was beyond the scope of this study. In addition, to our best knowledge, Ntf3 has not been shown to be expressed in healthy oligodendrocytes in vivo, and TrkC has not been reported to be displayed by these cells in the same conditions.

      A more restricted manipulation would be deleting TrkC from specific interneuron populations. Related to this, although TrkC is shown to be broadly expressed in ventral interneurons, it is not shown specifically to colocalize with any of the interneuron markers. The authors should validate that the receptor is expressed in the subsets that they are investigating.

      We agree, investigating the consequences of inactivating the TrkC receptor in specific interneuron populations would be extremely informative. However, this experiment is also very challenging to perform, as most of the driver lines available to target spinal interneuron populations additionally target multiple neuronal populations outside of the spinal cord that are also involved in the control of movements and could therefore induce confounding effects on motor behavior analyses [14-20].

      We thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (4) The rationale for following up on NT3 seems to be the chick electroporation experiments; however, no changes in distribution are shown in those experiments, and only a very minor decrease in Chx10 interneurons. Shouldn't NT3 overexpression lead to substantial decreases in IN numbers according to the authors' model? The "data not shown", which presumably refers to distribution, would be important to show here, to further support this rationale.

      Chicken spinal cord electroporation only enables to study spinal cord development in a limited time-window, given the high mortality rate observed after longer incubation. At the stage we collected the electroporated embryos for analyses, interneuron migration has barely been initiated, and distribution cannot be studied yet. Consistently, we are not aware of any report of interneuron distribution analysis in electroporated chicken embryonic spinal cord, as compared to mouse embryos [3-7].

      (5) The idea that NT3 downregulation causes an increase in IN numbers is not intuitive. Also, considering the DTA experiments in Figure 1, showing that MN ablation leads to a decrease in several IN subtypes and no changes in V2a neurons. It would be helpful for the reader if the authors could synthesize their results in the discussion and reconcile their experimental findings.

      We agree, this will be included in the revise version of the manuscript.

      Reviewer #3 (Public review):

      (1) The manuscript relies heavily on quantifying numbers and the spatial distribution of interneuron populations. However, these do not seem to be consistent in control animals across experiments, making it difficult to interpret any changes observed in genetic manipulations. Specifically, in Figures 2 and 4, the same markers are being used to quantify V1, V2a, V2b, and V2c interneurons in controls vs. OC (Figure 2) or Ntf3 (Figure 4) conditional knockouts, but the numbers of neurons and their distribution in control animals are variable between these two figures. For example, there seems to be a mean of >300 V1 neurons in E12.5 brachial sections of Fig. 2 controls, but this number is <150 in Fig. 4 controls. The cell distribution scoring is similarly variable between these controls without any explanation. The same is true for E14.5 controls used in Figure S1 vs. Figure S3.

      We indeed observed variations in the quantifications and distributions of the analyzed interneuron populations in control embryos between the cOc1/Oc2<sup>⁻/⁻</sup> and cNtf3<sup>⁻/⁻</sup> lines. Several factors may explain this discrepancy. First, the study was carried out over 15 years, with different investigators contributing to distinct stages of the analysis—meaning interneuron distribution was not assessed by the same researchers in both lines. Second, the genetic backgrounds of the two lines differ, affecting gestation length (and thus the embryonic stage at analysis, even when embryos were collected on the same gestational day) as well as potentially altering the distribution of certain interneuron populations. Third, changes in the availability of primary antibodies targeting the interneuron populations of interest led to inconsistencies in antibody use across the study, as detailed in the Materials and Methods section. However, each investigator consistently used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) Neurotrophic factors generally promote neuronal survival. However, in this study, the loss of Ntf3 leads to increased numbers of interneurons. This finding is in disagreement with previous observations in slice cultures of spinal cords, as stated in the discussion. This discrepancy makes it even more important that the cell counts reported in the figures discussed above are robust.

      Considering that neurotrophic factors only support neuronal survival would strongly neglect their important function in neuronal differentiation, which has been broadly demonstrated. Severe immunotoxic ablation of motor neurons or anti-serum blockade of Ntf3 activity severely depleted inhibitory, but not excitatory, interneurons in a highly apoptotic-prone organotypic culture model of embryonic rat spinal cord slices, which was rescued by Ntf3 in the first model [21]. Opposite results were obtained in vivo by other researchers using mouse models lacking almost all MNs due to the elimination of skeletal muscles, where the number of spinal INs remained unaffected [22,23]. Combined to our results, these in vivo observations suggest that Ntf-3 is involved in interneuron differentiation rather in their survival. Consistently, Ntf3 has been shown to promote neuronal differentiation [24].

      (3) The claim that phenotypes are non-cell autonomously driven by motor neurons is not well supported. In Olig2-Cre conditional knockouts of Onecut and Ntf3, there is no confirmation that the loss of these factors is specific to motor neurons. Therefore, it cannot be ruled out that other cell populations may be mediating the phenotypes.

      Combined conditional inactivation of Oc1 and Oc2 has been reported in [1]. Conditional inactivation of Ntf3 only impacts motor neurons as it is the only cell population in the ventral spinal cord wherein this factor is produced (this study and [25-27]). Furthermore, Olig2-Cre has been shown to be active in motor neurons and in V3 interneurons (see for example [10]), which, for this reason, have not been studied in the frame of this project as stated in the manuscript.

      (4) The claim that interneuron development is regulated by OC control of Ntf3 expression in motor neurons is not well supported. The authors show that loss of OC1/2 leads to an increase in Ntf3 expression in motor neurons. If this pathway were controlling interneurons, loss of OC function and overexpression of Ntf3 would have the same phenotype, which is not the case. Additionally, it would also be expected that loss of OC function and loss of Ntf3 function would have inverse phenotypes, which is also not the case. The phenotypes from OC loss of function and Ntf3 loss of function seem distinct from one another. The authors state that too little and too much Ntf3 are both bad for interneuron development, but there is no data to support their claim that OC1/2 mutants have altered interneuron development because of higher Ntf3 expression.

      This study was not aimed at proving that Onecut transcription factors mediate their non-cell-autonomous effects on spinal interneuron development through Ntf3 regulation, nor do we make this claim in the manuscript. Rather, we demonstrate that Onecut factors and Ntf3, whose expression they control—participate in the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We propose that Onecut factors likely modulate multiple independent factors and pathways involved in the extrinsic regulation of interneuron development, as evidenced by the regulation of various secreted factors and membrane proteins in motor neurons observed in our RNA-sequencing data (this study and [1]). This may also involve intercellular transfer of Onecut homeoproteins during spinal cord development—a mechanism previously shown in cell culture for several homeoproteins, including human Onecut factors [2] and which we are currently exploring.

      (5) It is not clear that interneurons being studied express the Ntf3 receptor TrkC, which makes it difficult to assess whether changes in Ntf3 signaling are directly responsible for the phenotype.

      Immunofluorescence experiment in Figure 3C shows that TrkC receptor is present in cell populations surrounding motor neurons at e12.5, a stage where only the pre-motor interneuron populations reported in the manuscript are present. However, we thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (6) While the behavioral phenotypes are consistent with Ntf3 playing a role in motor circuits, there is no evidence to suggest that Ntf3's influence on premotor interneurons being studied is driving or contributing to this phenotype, as discussed by the authors.

      We acknowledge that the motor behavior changes observed in Ntf3 conditional mutant mice—as noted—are “consistent with the hypothesis that MN-derived Ntf3 is necessary for the formation of locomotor circuits with properly coordinated activity,” but they do not establish a direct causal link. However, analyzing the intrinsic activity of spinal locomotor circuits was beyond the scope of this study.

      (1) Toch, M. et al. Onecut-dependent Nkx6.2 transcription factor expression is required for proper formation and activity of spinal locomotor circuits. Sci Rep 10, 996 (2020). https://doi.org/10.1038/s41598-020-57945-4

      (2) Lee, E. J. et al. Global Analysis of Intercellular Homeodomain Protein Transfer. Cell Rep 28, 712-722 e713 (2019). https://doi.org/10.1016/j.celrep.2019.06.056

      (3) Harris, A. et al. Onecut factors and Pou2f2 regulate the distribution of V2 interneurons in the mouse developing spinal cord. Front Cell Neurosci 13 (2019). https://doi.org/10.3389/fncel.2019.00184

      (4) Kabayiza, K. U. et al. The Onecut Transcription Factors Regulate Differentiation and Distribution of Dorsal Interneurons during Spinal Cord Development. Front Mol Neurosci 10, 157 (2017). https://doi.org/10.3389/fnmol.2017.00157

      (5) Deska-Gauthier, D. et al. Embryonic temporal-spatial delineation of excitatory spinal V3 interneuron diversity. Cell Rep 43, 113635 (2024). https://doi.org/10.1016/j.celrep.2023.113635

      (6) Bikoff, J. B. et al. Spinal Inhibitory Interneuron Diversity Delineates Variant Motor Microcircuits. Cell165, 207-219 (2016). https://doi.org/10.1016/j.cell.2016.01.027

      (7) Hayashi, M. et al. Graded Arrays of Spinal and Supraspinal V2a Interneuron Subtypes Underlie Forelimb and Hindlimb Motor Control. Neuron 97, 869-884 e865 (2018). https://doi.org/10.1016/j.neuron.2018.01.023

      (8) Rousso, D. L., Gaber, Z. B., Wellik, D., Morrisey, E. E. & Novitch, B. G. Coordinated actions of the forkhead protein Foxp1 and Hox proteins in the columnar organization of spinal motor neurons. Neuron59, 226-240 (2008). https://doi.org/10.1016/j.neuron.2008.06.025 [pii]

      (9) Roy, A. et al. Onecut transcription factors act upstream of Isl1 to regulate spinal motoneuron diversification. Development 139, 3109-3119 (2012). https://doi.org/10.1242/dev.078501

      (10) Debrulle, S. et al. Vsx1 and Chx10 paralogs sequentially secure V2 interneuron identity during spinal cord development. Cell Mol Life Sci 77, 4117-4131 (2020). https://doi.org/10.1007/s00018-019-03408-7

      (11) Brunklaus, A. et al. in Brain Vol. 145 3816-3831 (2022).

      (12) Scekic-Zahirovic, J. et al. in EMBO J Vol. 35 1077-1097 (2016).

      (13) Wong, J. C. in Epilepsy Curr Vol. 25 347-349 (2025).

      (14) Hafler, B. P., Choi, M. Y., Shivdasani, R. A. & Rowitch, D. H. Expression and function of Nkx6.3 in vertebrate hindbrain. Brain Res 1222, 42-50 (2008). https://doi.org/10.1016/j.brainres.2008.04.072 [pii]

      (15) Nardelli, J., Thiesson, D., Fujiwara, Y., Tsai, F. Y. & Orkin, S. H. Expression and genetic interaction of transcription factors GATA-2 and GATA-3 during development of the mouse central nervous system. Dev Biol 210, 305-321 (1999).

      (16) Bretzner, F. & Brownstone, R. M. in J Neurosci Vol. 33 14681-14692 (2013).

      (17) Chopek, J. W., Zhang, Y. & Brownstone, R. M. in J Neurophysiol Vol. 126 1978-1990 (2021).

      (18) Miyagi, S., Kato, H. & Okuda, A. in Cell Mol Life Sci Vol. 66 3675-3684 (2009).

      (19) French, C. A. et al. in Mol Psychiatry Vol. 24 447-462 (2019).

      (20) Khouri-Farah, N., Guo, Q., Perry, T. A., Dussault, R. & Li, J. Y. H. in Nat Neurosci Vol. 28 2022-2033 (2025).

      (21) Bechade, C., Mallecourt, C., Sedel, F., Vyas, S. & Triller, A. in J Neurosci Vol. 22 8779-8784 (2002).

      (22) Grieshammer, U., Lewandoski, M., Prevette, D., Oppenheim, R. W. & Martin, G. R. Muscle-specific cell ablation conditional upon Cre-mediated DNA recombination in transgenic mice leads to massive spinal and cranial motoneuron loss. Dev Biol 197, 234-247 (1998). https://doi.org/10.1006/dbio.1997.8859

      (24) Kablar, B. & Rudnicki, M. A. Development in the absence of skeletal muscle results in the sequential ablation of motor neurons from the spinal cord to the brain. Dev Biol 208, 93-109 (1999). https://doi.org/10.1006/dbio.1998.9184

      (25) Dutton, R., Yamada, T., Turnley, A., Bartlett, P. F. & Murphy, M. Regulation of spinal motoneuron differentiation by the combined action of Sonic hedgehog and neurotrophin 3. Clin Exp Pharmacol Physiol 26, 746-748 (1999). https://doi.org/10.1046/j.1440-1681.1999.03108.x

      (26) Buck, C. R., Seburn, K. L. & Cope, T. C. Neurotrophin expression by spinal motoneurons in adult and developing rats. J Comp Neurol 416, 309-318 (2000).

      (27) Henderson, C. E. et al. Neurotrophins promote motor neuron survival and are present in embryonic limb bud. Nature 363, 266-270 (1993). https://doi.org/10.1038/363266a0

      (28) Usui, N. et al. Role of motoneuron-derived neurotrophin 3 in survival and axonal projection of sensory neurons during neural circuit formation. Development 139, 1125-1132 (2012). https://doi.org/10.1242/dev.069997

    1. Author response:

      General Statements

      We would like to extend our gratitude to all reviewers for their supportive feedback, which acknowledges our study as well performed and of interest to investigators studying muscle development and diseases and supporting a role for the fly model in testing potential human disease gene variants. We also thank the reviewers for their valuable critical comments. We carefully considered all of them and made additional experiments and suggested text amendments.

      We believe these modifications substantially improve the quality of our results and enhance general interest of our work.

      Point-by-point description of the revisions

      Reviewer #1:

      In this manuscript, Zmojdzian et al. provide an analysis of ryanodine receptor (RyR) expression and function in Drosophila. They also use CRISPR to engineer into flies a RyR variant of unknown significance (VUS) found in a human myopathy patient and demonstrate that it is likely a pathogenic mutation. From studies of RyR expression in embryonic and larval stages, and effects of RyR knockdown or overexpression in various muscle groups, the authors show that, in addition to its known actions in calcium-dependent excitation-contraction coupling, RyR promotes myogenesis during development.

      The key conclusions of the paper are convincing. I do not have suggestions for necessary additional experimental work, and my comments are minor. One conclusion, that RyR dysfunction may be involved in aging, is stated in multiple places, sometimes speculatively but once very forcefully. The latter is in the final paragraph of the Discussion, which states RyR "plays an instrumental anti-aging role in differentiated striated muscle". This conclusion must be tempered, as even if RyR knockdown phenotypes resemble some of those seen in aging flies, the study does not examine aged flies, and there is no mechanistic analysis that might link the two. I assume the authors would prefer to modify that sentence than initiate work with aging flies to prove the assertion.

      We thank the Reviewer for this comment and remove from the concluding sentence hypothetical anti-aging role of RyR. The modified sentence reads as follow:

      “To conclude, we report functional analysis of dRyR, the sole fruit fly RyR gene and show that in addition to ensuring contractile properties of differentiated striated muscle it plays a key pro-myogenic role during muscle development.”

      Finally, the use of CRISPR to test a VUS is excellent and suggests a good way for testing of additional RyR variants in the future.

      Minor comments:

      (1) Figure 1A: In the Introduction it is stated that non-mammalian vertebrates have two RyR genes, alpha and beta. In Fig. 1A, a single chicken and single frog gene are listed under names different than alpha or beta. The figure also focuses on RyR2 genes, yet the Introduction states that the non-mammalian vertebrate genes are homologous to RyR1 and RyR3 in mammals. The dichotomy between the text and the figure is confusing. Finally, the font used in Fig. 1A should be enlarged for better visibility.

      To avoid the dichotomy we modified our sentence concerning the non-mammalian vertebrate RYR genes in the Introduction section. As indicated, there are two RYR genes in chicken and frog, with one that shares homology with vertebrate RYR2 and is represented in the phylogenetic tree (Fig. 1A).  As requested by the reviewer, to ensure better visibility we enlarged the font in the revised Fig. 1A.

      (2) Figure 3G-I: IF to Kettin is used to reveal sarcomeres but is not mentioned in the text. This protein is not present in vertebrates (I believe) and may not be familiar to many readers. It should be described in the text when it is used.

      We are grateful for reminding us to provide information about Kettin, which represents the Drosophila counterpart of Titin. The following information has been added to the text on page 9: “ …which in turn correlated with shortening of Kettin/D-Titin-labelled sarcomeres…”

      (3) Figure S2: The panels are labelled E, F, G. They should be A-D, as is used in the text.

      In the revised version of Fig. S2 panel labels were amended and the panel E view enlarged. We also provide an additional control context (C57>LacZ).

      (4) The dRyR16 allele is used in Figure 5 and S4. It is described as a hypomorph in the text on page 12 but as a null in the legend to Figure 5. Do the authors actually mean "homozygous" in the legend? The difference should be clarified.

      The dRyR<sup>16</sup> allele has been previously described as hypomorph. Indeed, in the legend of Fig. 5 we by mistake describe it as a “null”. As suggested by the Reviewer we modify it to « homozygous ».

      (5) The Met codon that is mutated in the variant studied in Figure S5 and Figure 6 is position 488 in humans. It is referred to that way in the fly version also. Is that true, the actual amino acid number is identical in humans and flies? In Figure S5B, it might be worth showing the primary amino acid sequence surrounding Met488 to reveal the degree of local conservation (beyond the orange domain in that panel).

      To provide more information about the conservation we include to the revised Fig. S5 an alignment of amino acid sequence surrounding the human RYR1 4881 variant position, which corresponds to position 4971 in the Drosophila dRyR.

      Author response image 1 shows a snapshot from a larger portion of alignment encompassing variant mutation showing a high amino acids conservation around the variant position:

      Author response image 1.

      (6) At least two references cited in the text are not listed in the References section (Hadiatullah et al. and Nishimura et al.).

      We double check reference citation and two indicated positions are now listed in the References section.

      Reviewer #1 (Significance):

      The paper is significant in that RyR is known to be a critical protein in calcium-dependent excitationcontraction coupling but its role in developmental myogenesis is poorly studied. This study demonstrates that it is expressed during, and is important for, embryonic and larval myogenesis in the fly. RyR is also understudied in this valuable model organism, even though a P element-based mutant has been available since 2000. The mechanistic basis for the functional observations is not explored here but the work is well performed and will be of interest to investigators studying muscle development (my own field) and diseases caused by RyR mutations.

      To reinforce mechanistic/functional side of our studies we include to the revised Fig.5 a new panel G showing promyogenic role of another major cellular calcium regulator, ER calcium pump SERCA. The Lms targeted RNAi knockdown of SERCA leads to affected myotube growth resulting in a thin muscle fiber phenotype. This indicates that both dRyR-regulated cytosolic and SERCA-regulated ER store calcium levels are required to promote muscle development.

      Reviewer #2:

      Summary:

      This paper presents data using the Drosophila model to analyze the effects of a rare human mutation in the gene encoding the ryanodine receptor (ryr). The authors present a nice, comprehensive phylogenetic analysis that shows the Drosophila version of Ryr to be most similar to human RYR2 and that the known "hot spots" for mutations in RYR2 coincide with highly conserved regions of the Drosophila Ryr. They characterize the functional effects of ryr knockdown and overexpression on both adult heart function and larval body wall muscle. They identified embryonic ryr expression in association with actin-stained muscle precursor cells and provide beautiful stains, which clearly showed that embryonic muscle cell development was disrupted in ryr mutants. In support of these findings, KD of Calmodulin in larva (an Ryr inhibitor) phenocopied Ryr OE. They recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene and tested the effect on larval muscle. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants.

      Major comments:

      (1) Fig, 1 In G there is no data for the RNAi KD situation.

      We are grateful to the Reviewer for pointing this out. We initially didn’t include these data because of large difference in crawling capacities of dRyR RNAi larvae. In the revised version of Fig. 1 we provide now dRyR-RNAi larva crawling data. Because of their inefficient crawling, the time scale in panel 1G was modified.

      (2) Fig. 2 Authors should include Diastolic Diameters; they mention dilated cardiomyopathy but don't show the dilation. The authors should also show staining in hearts with RYR OE and RNAi. It would be nice to have some kind of quantification of disorganized myofibrils.

      As requested, in the revised Fig. 2 we provide diastolic diameter measures. We also include systolic interval graph to show a full picture of cardiac parameters. We do not observe all signs of dilated cardiomyopathy in dRyR-RNAi context as there is systolic diameter increase but no significant change in diastolic diameter.

      We modify our comments in the text accordingly (page 7).

      “…As the diastolic diameter remained unchanged, we conclude that cardiac dRyR knockdown affects cardiac performance without causing dilated cardiomyopathy…”

      Regarding circular myofibrils pattern, we do not observe irregularity of myofibrils orientation but rather a fuzzy and less distinctive sarcomeric pattern that is difficult to quantify. We specify this in the figure 2 legend (page 8).

      “…circular fibers in Hand>dRyR RNAi (E) context showed a fuzzy pattern suggesting an affected sarcomeric organisation…”

      Author response image 2 shows the entire view of the cardiac tube in dRyRRNAi context (stained with phalloidin) in which in spite of less distinctive circular myofibrils no obvious differences with wt are observed.

      Author response image 2.

      (3) To evaluate and reproduce the data on the larva muscle parameters the authors should provide more details on how sarcomere length was quantified in each larva (replicates, ROI size, etc). Similarly, how were # of nuclei quantified / normalized? Importantly for these measurements, did the authors know what the contraction state of the muscles were when fixed?

      We add the requested information to the Materials and Methods section:

      “Muscle characteristics measurements:

      All analyses of muscle length and sarcomere size were performed on fixed larval muscle preparations in a relaxed state. Acquired confocal images were analysed in Fiji using the line tool. Analyze – Measure tool was then applied to obtain muscle length values and measurements were analysed with Prism. Sarcomere size and number were calculated using Analyze – Plot profile Fiji tool. The sarcomere size was measured between peaks corresponding to Z-disc (revealed with Z-line specific marker) on approximatively 100 µm of muscle length. Sarcomere measurements were then analysed with Prism.

      DAPI-stained nuclei were counted in Z-stacks of confocal views of VL3 larval muscle and data analysed with Prism. About 30 larval muscles from 6-8 larval filets were analysed for each measurement. »  Statistics

      All statistical analyses were performed using Prism (v9.5.1, GraphPad, Software, La Jolla, CA, USA). The t test was used to compare control to variant context and one-way ANOVA tests were used for comparisons with more than two datasets. Bar plot represent the mean and the standard deviation. On the figures, statistical comparisons of sample vs control are indicated as ****: P ≤ 0.0001; ***: P ≤ 0.001; **: P ≤ 0.01; *: P ≤ 0.05; ns > 0.05.

      (4) Fig. 3, Are RNAi and OE in the same background? I only see one control in the graphs for the RNAi line background.

      We agree and to avoid potential bias between the RNAi versus OE genetic contexts we provide now in the revised version of Fig. 3 an additional OE control (C57>lacZ).

      Thus, two controls, one for RNAi and one for OE contexts are now included.

      (5) Fig. 3 How VL3 length was determined needs more detail, the Zhang ref is not adequate.

      We are thanking the Reviewer for this comment and provide now more details about the method used to calculate VL3 length (new paragraph in Materials and Methods), see also our answer to point 3. Zhang et al. reference is in relation to the mitochondria pattern quantification.

      (6) In order to be able to evaluate the data, the statistical tests used should be cited in the figure legends along with what *, ** ,*** stand for (or just provide p values).

      We add now the information about the statistical tests to the Fig legends in addition to the specific paragraph in Materials and Methods section (answer to point 3).

      Minor comments:

      (1) Need more detail in the figures, e.g. add what colors go with which stain to the picture.

      We provide this information in the revised version of the figure legends

      (2) Page 13, (Fig. ?F, G).

      We apologize for this mistake and add the number - Fig. 5

      (3) Fig. 4 "partially co-localizing with actin".... this is confusing and probably an overstatement based on the staining pattern in a whole embryo and not on an optical section or a higher power image with a more restricted field of view.

      We agree and remove this statement from the Fig.4 legend.

      (4) Some of the graphs are a bit small, recommend reducing the statistical comparison brackets to straight lines, which eliminates a lot of white space and would allow the graphs to be enlarged.

      We increased the size of graphs in revised Fig. S2 and Fig.5.

      Reviewer #2 (Significance):

      The authors nicely characterized the role of Ryr in muscle development and function and recreated a human variant of unknown function (RyR1 p.Met4881Ile ) in the conserved region of the fly gene. Their data suggested that this variant was likely deleterious as it negatively affected most muscle parameters. This work supports a role for the fly model in testing potential human disease gene variants. The reviewers field of expertise is in Drosophila genetics and in the use of the fly as a model system for understanding the genetic networks contributing to muscle structure and function at the cellular level.

      Reviewer #3:

      Summary

      This paper examines the Drosophila Ryanodine Receptor (RyR or dRyR). Ryanodine receptors are enormous channel proteins that mediate calcium efflux from the endoplasmic reticulum and sarcoplasmic reticulum. One goal of the work is to describe salient developmental features of Drosophila RyR (i.e., where it localizes in the cell and how it contributes to muscle development and function) and to refine knowledge from prior reports. Many of the analyses toward that goal are well done; this reviewer especially likes the examination of how muscles develop (Fig. 5).

      Another goal is to compare this information with what is known about mammalian RyRs. There seems to be a lot in common between Drosophila and mammalian RyRs. The paper finishes by taking a human ryanodine receptor variant of unknown significance and generating the corresponding amino-acid substitution in Drosophila RyR. The substitution has some phenotypic consequences for fly coordination, so the authors conclude that the human variant is likely to be pathogenic.

      In terms of investigation, a refined description of RyR biology is welcome. Ryanodine receptors are critical contributors/mediators of intracellular calcium signaling processes. Understanding their properties can help to contextualize the results of studies where calcium dynamics are at play. This is true of for both Drosophila and non-Drosophila work. For this version of the paper, there are several statements that should be edited, both in terms of accuracy and in terms of reporting prior knowledge. Additionally, some experiments are missing controls or reagent verification. Importantly, the anti-RyR antibody needs supporting information regarding its specificity.

      Main Comments

      (1) The paper does not fully state what has been done before in terms of studying Drosophila ryanodine receptor expression. In comparing the work on ryanodine receptors in vertebrates versus Drosophila, the authors write, "By contrast, no systematic analyses have yet been performed to assess the expression of the sole Drosophila dRyR gene." I was a little surprised by this sentence, so I examined the literature. There are hundreds of Drosophila publications that mention the ryanodine receptor in some way, but they are not about gene expression . As stated, the sentence might depend on what the authors mean by "systematic analyses." Two early works are relevant here: the Hasan and Rosbash, 1992 paper and the Sullivan et al., 2000 paper. Both are cited in this study. And both of these early papers addressed RyR gene expression, so that fact should be acknowledged up front.

      We agree with the Reviewer that there is a large number of publications that mention Drosophila ryanodine receptor with two of them identified by the Reviewer that provide information about Drosophila RyR expression. We refer to both of them and follow Reviewer’s suggestion to further acknowledge their work. The modified sentence in the text reads as follow:

      “…in spite of early works by Hasan and Rosbash (1992) and Sullivan et al., (2000) no systematic analyses have yet been performed to assess the developmental expression pattern of the sole Drosophila dRyR gene…”

      Concerning “systematic analyses” we mean the analyses of dRyR expression at both transcripts and protein levels during embryonic development and in differentiated muscles.

      (2) (Related) I examined those two early papers to cross-check the extent of analysis done previously. The text of Hasan and Rosbash reports in situ examination of RyR transcript using a digoxigenin probe (though the online version of that 1992 paper seems to have left out the relevant mesodermal and muscle images referenced in the paper, in favor of duplicating Figure 5 three times - I emailed Development to alert them). More relevant, several experiments executed in the Sullivan paper agrees closely with the current paper. As such, it needs more complete referencing. The Sullivan paper showed short, round larvae in mutants (Fig. 1 of Sullivan); ubiquitous mRNA, strongly in muscle and mesoderm (Fig. 2 of Sullivan); impaired muscle function in mutants (Fig. 3 of Sullivan), and impaired larval heart rate (Fig. 4 of Sullivan).

      Sullivan et al. paper is indeed a reference paper for Drosophila RyR. Our data are however largely novel and/or substantially extending those reported by Sullivan. Notably, we show for the first time developmental dRyR protein expression pattern in embryos and in larval filets, we also analyse dRyR isoform transcripts expression and provide for the first time embryonic muscle phenotype analyses that shed light on so far under investigated developmental function of dRyR.

      We follow Reviewer’s suggestion and provide in the revised version additional citations of this work:

      “…attenuation of dRyR (C57>dRyR RNAi) led to a significantly reduced larva body length (Fig. 3B, M) compared to control (Fig. 3A, Q), an observation that correlates with previously observed (Sullivan et al., 2000) reduced body size of dRyR<sup>16</sup> mutant larvae…”.

      “…our data extend previous observations of affected muscle contractility in RyR mutants (Sullivan et al., 2000)…”

      “…Overall, observed dRyR loss-of-function heart phenotypes with a slow heart rate and increased arrhythmia correlate with impaired cardiac function in RyR mutant larvae (Sullivan et al., 2000)…”

      (3) Fig. 1B-D (antibody staining): There are puzzles with this experiment. The first is with the anti-Dlg channel. Dlg is a core component of the NMJ postsynaptic density, and the antibody reveals a bright cage of Dlg around the boutons. But with the muscle images in Figure 1B, there are no boutons apparent (unless they are so far out of focus as to be invisible).

      Indeed, Dlg also stains postsynaptic NMJs at the muscle surface. On the Fig. 1B showing more internal optical sections to reveal T tubules Dlg-positive NMJs are out of focus.

      The second question centers on the dRyR antibody. The results state, "We first tested the expression of dRYR at the protein level." This sentence appears immediately after the sentence for gene expression from point 1. Technically, this antibody will help determine protein localization, not gene expression. But more importantly, there is no supporting/verifying information about this guinea pig anti-dRYR antibody. The methods state that it was provided by Robert Scott from NIMH. But there is no accompanying citation, no information about the antigen used to raise the antibody, and no negative control (either mutant or RNAi) to show that the staining is specific. If this is a published anti dRyR antibody that already meets the standards of specificity, that should be made clear, and the citation should be given. But if not, the information and data about the production of the antibody and the testing of its quality needs to be shared.

      We apologize for this omitted citation. The anti-dRyR antibody has been previously described and its specificity tested in the article Gao et al., (2013). Corresponding author of this paper David J. Sandstrom left NIMH and anti-dRyR antibodies are currently curated by Rob Scott from Benjamin White’s lab at NIMH.

      He generously sent us sample of this antibody. We add this information to the Material and Methods section.

      (4) Fig. S1: Similar to the antibody, is there a negative control probe that does not reveal this expression pattern? There are any number of probes or secondary antibodies that non-specifically label Drosophila muscles in patterns just like this.

      We are confident that the HCR probes are working properly as they reveal dRyR transcripts expression that is consistent with dRyR protein expression pattern. In parallel they show differential expression in embryos.

      Author response image 3 shows the control HCR ISH experiment with a probe that detects Apterous transcripts (specific for a subset of embryonic muscles and not present in L3 larval muscles).

      Author response image 3.

      A comparison between Ap HCR (A, A’) and dRyR Ex23 HCR (E, E’) signals.

      Minor Comments

      (1) "Overall, observed dRYR loss-of-function heart phenotypes...are reminiscent of those associated with aging (Nishimura et al., 2010), indicating that dRyR RNAi-induced impairment of Ca2+ homeostasis contributes to cardiac aging..." The conclusion of the sentence does not logically follow from the first part. This is because the tests conducted here were on rhythm, not on calcium homeostasis and cardiac aging.

      So, the tests cannot definitively say anything about those latter phenotypes.

      To answer this reviewer’s coment we modify the concluding sentence as follow:

      “…We hypothesize that dRyR RNAi-induced impairment of Ca2+ homeostasis could contribute to cardiac aging, for which Drosophila is a recognized model (Nishimura et al., 2011).”

      (2) Fig. S2 (bar graph): "% of total" - Is this supposed to refer to the percentage of the total muscle area that is positive for ATP5a staining? That should be clarified.

      We provide clarification in the Fig.S2 legend. “% of total” means the percentage of the measured muscle area that is positive for ATP5a staining”.

      (3) Fig. 3M, should say length

      Done

      (4) Fig. 5A legend - See Sullivan; that paper concluded that RyR[16] was hypomorphic instead of null, based on RyR[16]/Df comparison to RyR[16]/RyR[16]. Intuitively, I agree; a lesion that rips out the start site would likely be null. The antibody could help with classifying the allele, depending on the part of RyR used as the antigen.

      The RyR<sup>16</sup> mutants were indeed described by Sullivan et al., as hypomorphic and not null. In the Fig. 5 legend we modify the comment to: “…homozygous dRyR<sup>16</sup> mutant embryo…”

      (5) Discussion: "This also suggests that all dRyR isoforms are collectively required for larval muscle function." That sentence does not logically follow the expression information. In order to test that idea, individual isoforms would need to be eliminated or knocked down.

      We agree with this comment and modify our sentence accordingly.

      “However, whether all dRyR isoforms are collectively required for larval muscle function requires further investigation.”

      Reviewer #3 (Significance):

      The idea that RyR is expressed in many kinds of muscle is put forth as a major conclusion. It is good that the authors report this fact, and the impacts on muscle development documented in Figure 5 are some of the best data in the paper. However, in terms of opening up a new understanding of RyR biology, the impact of this information seems modest. Prior Drosophila work and the work of others studying these channels show that ryanodine receptors are ubiquitous. The fact that there is only one Drosophila RyR gene would lead most scientists to hypothesize that it would be present on the ER surfaces of all kinds of tissues, including different types of muscle.Novel phenotypic information for Drosophila RyR is reported in the study, and this is good. But in terms of the model system, the strength of Drosophila is in using genetic combinations to make refined conclusions. That toolkit is not fully used here; therefore, the paper is mostly descriptive. This study is mostly a single-gene study (dRyR), with isolated exceptions, like Cam knockdown in Figure 5.

      To improve the functional/mechanistic aspect of the manuscript in the revised version we include to Fig.5 the analysis of myogenic role of additional calcium regulator: ER calcium pump SERCA.

    1. Author response:

      General Statements

      We thank the reviewers for their careful and supportive reviews of our manuscript. We have addresses all the reviewers comments and extensively revised the manuscript accordingly.

      During our revisions, we discovered a bug in the code that calculated the linear genomic distance between the captured promoter regions (bait regions) and the promoter-interacting fragments (PIFs). The error inadvertently halved the distance measurements in the output tables. This has been corrected in the revised manuscript and has resulted in updates to Figure 1B and corrected values in the ‘interaction_distance’ and/or ‘interaction_type’ columns of Supplementary Tables 2, 3, 6 and 8. We thank the reviewers for the opportunity to correct this.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this article, the authors conducted promoter-capture HiC experiments (pcHiC) in Mouse Cerebellar granule cell progenitors (GCps) and obtained a good set of 3D genome interactions map of protein-coding genes' promoters. This dataset was later integrated with ATAC-seq and ChIP-seq experiments to identify putative enhancer regions within promoter-interacting regions, and with higher base-pair resolution than what is obtained by pcHiC experiments. This set of enhancers is then compared to and presented as being more reliable than those present in VISTA enhancer database. In addition, ATAC-seq sites and RNA-seq datasets, both obtained in WT and CHD7 and KO conditions, are integrated to correlate expression of a set of genes to the chromatin accessibility of their distal enhancer(s) which is believed to be promoted by CHD7. The study is completed by focusing on transcription factor motif analysis on CHD7-regulated enhancers which shows an enrichment for proneural transcription factors, with special emphasis on Atoh1 found to be frequently co-recruited with CHD7. Data and methods are well detailed and correctly replicated and will be useful as a resource for the community. The overlap obtained between pcHiC experiments and auto-criticized by the authors is very common and expected in this kind of experiments. In general, the conclusions drawn the article are convincing but some aspects such as comparison to VISTA and the naming of 'enhancers' should be moderated.

      We thank the reviewer for their positive and constructive comments. We have amended the manuscript as indicated in detail below.

      (1) The comparison of pcHiC-identified enhancers vs. VISTA enhancers should be more balanced, as the two approaches have important conceptual differences. Although VISTA enhancers are based on functional annotation, their target genes might not necessarily be correctly assigned based on the distance. On the other hand, putative enhancer regions identified by pcHiC experiments do not rely on functional testing. So both type of information are useful but can be put in perspective.

      We thank the reviewer for making this point. We have amended the text to present a more balanced view e.g. “Using VISTA-designated hindbrain enhancers as an example, we identify the genes most likely regulated directly by these enhancers and update their annotation accordingly.”

      (2) To increase the strength of the paper, it would be preferable that authors include simple functional enhancer assays (e.g. CRISPR deletion of contacting enhancer, luciferase assay) to support their perspective since 3D conformation information in KO condition is lacking in the article. Although ideally these experiments should be better performed for a full demonstration, it would be acceptable to at least include a simple functional assay in the WT context to demonstrate that the regulatory regions obtained by crossing genomic data are real enhancers. This point is even more critical knowing that enhancers lacking classical histone marks (H3K27ac+H3K4me1) has been described. The same comment applies to promoter interacting fragments lacking these marks, that could be missing enhancers (i.e enhancers without these marks).

      To address this point, we performed luciferase assays to show that putative enhancers identified with our integrated bioinformatic approach (pcHi-C + ATACseq + H3K4me1 + H3K27ac) do indeed exhibit enhancer activity. For these experiments, we tested these putative fragments in an immortalized cell line SHH-NPD, a GCp-derived cell line generated by Fults laboratory (Jenkins et al. 2014). The results of these experiments are included as Suppl. Fig. 1 in the revised manuscript.

      Minor point

      - Figure 5B is lacking labels.

      We apologise for this oversight – labels have now been added.

      Reviewer #1 (Significance):

      This article, when completed with possible revision, will be be useful for the community in terms of useful resource of experimentally determined putative enhancers in Cerebellar granule cell progenitors. It also provides some insights into the association of CHD7 and Atoh1 in distal regulation in these cells.

      We thank the reviewer for acknowledging the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, the authors aim to identify active, long-range regulatory interactions in cerebellar granule cell progenitors (GCps). As such, the authors perform promoter capture Hi-C to map long-range interactions for all gene promoters, using cells isolated from P7 mouse brain samples. While the resolution of these maps is limited by the relatively large fragment sizes generated from a 6-bp cutter, the authors combine these interactions with other available published datasets, including from their own previous work, (e.g. ATAC-seq and ChIP-seq) to more precisely map putative enhancers within the long-range interacting regions of captured promoters. The paper further focuses on the importance of transcription factor Atoh1 and chromatin remodeler CHD7 in regulation of these putative enhancers in GCps. The authors suggest a direct interaction between CHD7 and Atoh1 by overexpression and co-immunoprecipitation in human embryonic kidney cells.

      As stated by the authors, this study represents a valuable resource for researchers interested in the identification of enhancers in GCps cells, and their linked target genes. While broadly descriptive, the study does highlight some gene loci of interest and of biological relevance. For example, through integration of previously published datasets, the study resolves which putative regulatory elements at the Reln locus may regulate its activity.

      We thank the reviewer for their supportive comments.

      We provide a summary of our major and minor comments here.

      Major comments:

      (1) The main take-home messages of the manuscript could be more clearly stated in the introduction to help readers understand the main conclusions of the work.

      We have added a sentence to the Introduction to clarify the key take-home messages:

      “We report putative distal regulatory elements for >12,000 genes, identify CHD7- and Atoh1-regulated enhancer elements and show that these factors interact and likely co-regulate the expression of key genes in the GCp lineage.”

      (2) In the discussion, a previous Hi-C dataset is referred to "Reddy et al. annotated 5,175 promoter-enhancer interactions in GCps using Hi-C without enrichment (Reddy, Majidi et al. 2021)." It would be beneficial to compare the interactions identified previously with the current study (5,175 vs 46,428 interactions).

      To address this comment we have performed an additional analysis and include text and Suppl. Figure 3 and Suppl. Table 13 to demonstrate the extent the two datasets compare, overlap and diverge. We have also added additional text to the discussion to highlight the difference and technical considerations between the two approaches and how they complement each other.

      The 5,174 enhancer-promoter (E-P) interactions identified by Reddy et al were downloaded and intersected with the 46,428 promoter-accessible PIF regions identified in our study. The new supplementary Figure 3A illustrates that 82% (843/1207) of genes that Reddy et al identifies long-range interacting regions for are represented in our pcHiC dataset. Our pcHiC data contains information on distal interacting regions and potential enhancer regions for an additional 11,511 protein coding genes. Suppl. Figure 3B provides an overview of the Reddy et al E-P interactions that are, and are not identified in the pcHiC. We replicate 38% of Reddy et al’s E-P findings, whilst 53% of the 3229 interactions unique to the Reddy data would not be detected in the pCHiC data due to technical reasons resulting from the capture design and analysis protocol. Of the remaining interactions that are specific to the Reddy data, we identify other distal regions interacting with those same promoters . Suppl. Table 13 details the full comparision of Reddy’s E-P interactions that are found within our dataset.

      The differences between the two datasets and the increased number of interactions detected in the pcHiC dataset likely result from the increased enrichment for the captured promoters enabling the detection of interactions that would have been below the detection threshold for the HiC study. In addition there are notable differences in analysis strategies for the two datasets which also contribute to differences in detection of regions. Reddy et al binned the HiC data into 10Kb regions to identify interacting regions and subsequently used chromatin marks to identify possible enhancer and promoter regions within these large regions. In contrast we have used the pCHiC and CHiCAGO algorithm to identify individual HindIII restriction fragments that are proximal to targeted promoter regions (PIFs), and prioritised those that have accessible regions within them which could represent various types of regions that play regulatory roles such as enhancers, CTCF site or facilitator regions, independent of their chromatin mark composition rather than focusing solely on enhancers.

      (3) The authors identify an overlap with some of their identified enhancers with those from VISTA. Is this a fair comparison seeing as the enhancer reporters were tested during early embryonic development (e.g. E11.5 and E13.5) and seen to be active in the hindbrain, would these stages be relevant to GCps from P7? Can the authors identify ATAC-seq for example from hindbrain from embryonic stages and determine if the enhancer accessibility profile looks similar to that for the P7 GCps cells?

      We thank the reviewer for this important question regarding the developmental relevance of our VISTA comparison and acknowledge that direct comparison between the time point requires careful consideration. Firstly ,to address the question of how similar the chromatin accessibility profiles are between the embryonic and P7 timepoints, we compared the ATAC-seq data from our paper to ENCODE data from the hindbrain. Of the 140 vista enhancers that were intersected with the pCHi-C dataset, 119 were identified from the lacZ studies as active in the hindbrain at E11.5 whilst 21 were identified as active at timepoint E12.5. We compared ENCODE ATAC-seq peaks from the E11.5 (ENCFF743IYX) and E12.5 ( ENCFF198TLF) hindbrain to the GCps from P7 across both the entire genome (global accessibility) as well as specifically +/- 3MB around the VISTA enhancer regions in the PIFs from the pCHiC to assess the conservation of local accessibility profiles.

      When looking at the global accessibility profile of embryonic hindbrain versus P7 GCps across the whole genome there was a large degree of overlap with ~85% (E11.5) and ~88% (E12.5) of all ENCODE ATAC peaks overlapping with accessible ATAC summit regions from P7 GCps:

      Author response image 1.

      To identify if this was consistent in the immediate chromatin environment of the VISTA enhancers themselves, we compared the accessibility profiles across timepoints in the local environment surrounding the VISTA enhancers. This local environment was defined as a region that added an additional 3MB on either side of all VISTA enhancer positions found in PIFs. 3MB was chosen as the longest interaction found for a single VISTA element was approximately 2.7MB. Consistent with the global analysis a similarly high level of overlap of accessible regions between the timepoints was found for the local chromatin environment in surrounding the VISTA enhancers that were found within PIFs in the pCHiC dataset with ~87% (E11.5) and ~89% (E12.5) of encode detected peaks overlapping with accessible ATAC summit regions from P7 GCps.

      Author response image 2.

      Regions +/-3MB of VISTA enhancers in PIFs

      Author response image 3.

      Regions +/-3MB of VISTA enhancers in PIFs

      Genome browser shots at the three example VISTA loci from Figure 1 further support this approach. In addition to this we also note that a recent study by Chen et al (2024 https://www.nature.com/articles/s41588-024-01681-2) where capture-HiC performed at E11.5 of 935 VISTA enhancers across multiple tissues confirmed that the majority of VISTA enhancer regions (61%) bypass adjacent genes which is consistent with our nearest gene comparison.

      (4) The co-IP experiment appears to support the conclusion that Atoh1 and CHD7 can interact, however there are bands in lanes where there should not be (i.e. Input lanes 1 and 4 for FLAG blot). It would be recommended to repeat this result at least once. [Expected time 2-4 weeks].

      This experiment has been repeated 3 times with the same result. It is normal for non-specific background bands to appear on Western blot from total cell lysates (inputs) as most antibodies have significant cross-reactivity. The anti-FLAG antibody clearly detects bands above background in lysates where FLAG-tagged CHD7 is expressed. Most critically, despite the presence of non-specific bands in input, FLAG-tagged CHD7 is only detected in immunoprecipitated samples where either FLAG-tagged proteins have been precipitated and FLAG-tagged CHD7 is expressed and HA-tagged Atoh1 has been precipitated when both FLAG-tagged CHD7 and HA-tagged Atoh1 are expressed.

      (5) The methods section describes analysis of several datasets, however we could not access the code at the time of review. Do the authors intend to make this code available at the time of publication?

      Yes once the publication is approved all code will be made available along with conda environment yaml files to replicate the software environment in which the analysis was performed.

      (6) Page 7 "replicate one and two, respectively". Can the authors clarify the number of biological replicates performed for pcHi-C?

      Two biological replicates were performed for pcHiC which were then bioinformatically combined into a ‘superset’ for CHiCAGO interaction calling as is standard practice for pcHiC data (see e.g. Cairns et al, 2016. We have revised the text to make this clearer.

      Minor comments:

      (1) Page 3 "controlling the expression of 577 genes in GCps" - the authors do not provide evidence that these enhancers control gene expression directly, this should be reworded.

      Thank you. We have reworded to: “contacting the promoters of 577 genes” to indicate that these were identified using pcHi-C and not functional assays.

      (2) Page 5 "where transient amplifying divisions exponentially expand GCps" - at what stages of embryonic/postnatal development are GCps first detected, and when do they amplify and then differentiate?

      GCps that form the EGL are specified in the rhombic lip from E13.5 (Machold, 2005 and Wang, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie, 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata, 1999). We have amended the text to include this additional information: “GCps that form the EGL are specified in the rhombic lip from E13.5 (Ben-Arie et al, 1997; Machold & Fishell, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie et al., 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata et al, 1999).”

      (3) Page 7 "identified 164,387 unique and significant interactions" - how is an interaction defined, a single read, or evidenced by a certain number of reads. "promoter interacting fragments or PIFs" - is PIF referring to a single read evidencing an interaction?

      An interaction is defined by the CHiCAGO algorithm. The number of reads needed to score an interaction depends on the both the distance away that PIF is from the promoter (this is modelled using a distance-dependent component that accounts for decay of contact frequence with genomic distance) and also includes a component that models how the sequence or other technical artifacts might influence the capture bias of some sequences compared to others. For each promoter a background model is generated of the expected number of reads that would be captured based on the above considerations and if the number of reads for those regions exceeds this background model by a certain threshold the interaction is deemed significant using a p-value like score. In practice this means that regions further from the promoter will often require less reads to signify a significant interaction compared to regions that are much closer to the promoter. The significant PIFs in the dataset are all evidenced by a minimum of 3 reads in at least one biological replicate. We have included a short explanation of this in the methods of the revised manuscript for clarity.

      The maximum reads in a single replicate library for a specific PIF was 1557, and the median number of reads per PIF was 17.

      (4) Page 8. What is the distinct between PIFs and "promoter interacting regions (PIRs)"? These could be better defined in the text.

      Thank you for picking up this discrepancy, we were using PIR and PIF interchany. We have amended the manuscript to refer to PIFs consistently throughout.

      (5) Figure 1C-F. Labels "Random" and "PIFs" don't line up well with the two bars.

      Thank you, this has been corrected.

      (6) Page 9. Could the authors show some representative images for the "VISTA hindbrain enhancers" (e.g. for Figure 1I-K).

      We have inserted representative images showing in vivo activity of these enhancers in mouse embryos from the VISTA enhancer site.

      (7) Fig 2G, Page 11 "The 12,354 genes that were linked to a PIF containing an ATAC-seq peak were found to have a higher median expression level than the 2,049 genes that had PIFs that did not coincide with ATAC-seq peaks" - is this significant?<br />

      Apologies for this oversight. We have performed a two-sided t-test on the log transformed TPMs between the two groups and have included the significance in the revised figure (p=1.8 e-40).

      (8) "Gene Ontology analysis of genes with accessible PIFs revealed a significant enrichment for 119 biological processes" - can you include the GO terms in a supplementary table? Is there a way to prioritise down the 12,354 genes to a shorter more significant list of genes, this seems a long list to include in GO analysis.

      We have included a supplementary table with this data in the revised manuscript (Suppl. Table 6). We included all 12,354 genes in this analysis as the point of this analysis was to demonstrate that developmental processes are enriched in the PIFs with accessible chromatin, compared to the genes where only PIFs without ATAC were identified.

      (9) Page 11 - "The chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum (Whittaker et al., 2017b) and deletion of Chd7 from GCps results in striking cerebellar hypoplasia and polymicrogyria (Feng et al., 2017; Reddy et al., 2021; Whittaker et al., 2017b). CHD7 haploinsufficiency is also sufficient to cause cerebellar hypoplasia and foliation defects both in mouse models and in the context of CHARGE syndrome in humans (Whittaker et al, 2017a; Yu et al, 2013)." - this appears more suitable for the introduction.

      Thank you, we have moved this text to the Introduction.

      (10) Page 12 "the majority of which (4,663/5,369) displayed decreased accessibility when Chd7 is depleted". This was difficult to understand initially - which are expected to be the direct effects? Increased or decreased accessibility? Perhaps it would be better to focus only on the decreased accessibility sites?

      We have previously shown that the majority of differentially accessible regions in Chd7-deficient GCps show decreased accessibility. Chromatin remodelling by CHD7 could conceptually reduce or increase accessibility of a particular locus and the only way to infer direct effects are by identifying regions to which CHD7 is recruited.

      Approximately ~9% of the sites that decreased in accessibility overlapped with regions bound by CHD7 (464/4663), whilst ~2% of sites that increased in accessibility overlapped with regions of CHD7 binding (14/706). Whilst it is likely that the majority of directly regulated sites decrease in chromatin accessibility when CHD7 is removed, the number of sites that increases in accessibility is small but observed and should be included for completeness.

      (11) The analysis in Fig 3A reveals that only a small number of CHD7-bound enhancers show differential accessibility and altered linked gene expression upon CHD7-knock down. This requires a little more discussion - why do so many sites change in accessibility compared to the number of sites which change accessibility or are associated with gene expression change?

      Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, the integration of this data with ATAC-seq accessibility, chromatin modification and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect. We have added the following text to the discussion to indicate this: “Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, integrating CHD7 ChIP-seq data with ATAC-seq accessibility, histone modification ChIP-seq and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect, as suggested by the data in Fig. 3A.”

      (12) Page 12 - "Over-representation analysis confirmed an enrichment of genes linked to nervous system development" - could this and the GO term analysis be included in a supplementary figure?

      We have included these results as Suppl. Table 7 in the revised manuscript.

      (13) Fig 3D - what does the arrow represent in the chromatin schematic?

      The arrow in the schematic indicates chromatin remodelling – we have clarified this in the figure legend and added headings to these panels to indicate the 3 different types of elements: Direct CHD7 targets, Indirect targets and CHD7-bound elements.

      (14) Fig 3G does not appear to be referenced in the text. The value of the Upset plots in the main figure 3 wasn't very clear, perhaps these could be moved to the supplement? Is there a clearer plot to support the conclusion "CHD7 primarily regulates enhancers".

      We apologise, the panels were mis-labeled in the text. This has now been corrected. We hope that the amendments in response to point 13 above now clarifies these findings showing that direct CHD7 targets are characterised by active enhancer marks.

      (15) Page 14 "putative consensus sites for proneural bHLH TAL-family of proteins Neurog2, Neurod2, Neurod1, and, Atoh1 in elements" - HOCOMOCO motifs are only shown for Atoh1 and Nhlh1. It may be valuable to show the sites for all the listed TFs. What does white represent in the heatmap in Fig 3H? This plot is difficult to interpret, and also relatively small in the figure but appears important to conclusions. Perhaps Fig 3H could be made more prominent?

      Thank you for highlighting that the white boxes might be confusing. The white blocks indicate that these motifs do not pass threshold for significantly enriched in the dataset based on the p and q values.This has now been clarified in the figure legend.

      We have enlarged panel H to make more prominent.

      (16) Page 15 - "Myb was the only motif specific to CHD7 bound regions that changed in accessibility compared to those that exhibited accessibility changes without CHD7 binding or CHD7 binding without accessibility changes (Suppl. Fig. 1)." I couldn't interpret this sentence, requires clarifying.

      We agree that this description is confusing and since it is difficult to draw clear conclusions about the significance of enhancers with Myb motifs in this context, we have removed this sentence from the revised manuscript.

      (17) Page 16 and Fig 4B - a discussion of why both up and down regulated genes are detected for Atoh1 depletion? Which class of genes are expected to be directly regulated (the down-regulated genes)?

      Like most transcription factors, ATOH1 may be able to function as both a repressor and activator depending on the context. Although the majority of genes are downregulated in Atoh1-defivcient cells, suggesting that Atoh1 functions as an activator in most cases, our analysis have identified several up-regulated genes that contain Atoh1 ChIP-seq peaks in their cognate enhancers (See Suppl. Table 7), consistent with these also being direct Atoh1 targets.

      (18) Fig 5B - the genomic traces are not labelled in this figure.

      Thank you, labels have been added.

      (19) Page 17 - "Pathway enrichment analysis of the 22 genes compared to all genes that were expressed in GCps shows a significant enrichment of terms: Hypoplasia of the pons (HP:0012110 P=0.006) and Abnormal pons morphology (HP:0007361 P=0.016) from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2." - this analysis should be included in the supplementary tables.

      These results have been included as Suppl. Table 12 in the revised manuscript.

      (20) Do the authors have a suggestion for which domains of Atoh1 and CHD7 could be interacting? Could the authors design truncated constructs for overexpression in HEK cells to test this hypothesis? [Expected time 4-6 weeks, interesting but not essential to do experimental work here].

      We agree this is an interesting question. Our collaborator, Professor Peter Scambler (UCL) has performed a yeast two hybrid screen for CHD7 interacting proteins in a mouse E11.5 library using the CHD7 BRK domain (aa 2521-2708) as bait. The screen had a single hit, which encompassed the N-term 127aa of ATOH1 (personal communication). This observation supports our co-IP data and suggests that the N-terminus of ATOH1 interacts with the BRK domain of CHD7 but further validation will be needed to confirm this.

      (21) Page 28 "Differential accessibility analysis was performed using DESeq2 (v 1.22.1)" and Page 19 "Whereas chromatin accessibility at some of these enhancers were affected by Chd7-deficiency" - what were the cutoffs used for looking at differentially accessible regions? Complete loss of accessibility or a quantitative change?

      Quantitative change rather than complete loss was used. Thresholds based on adjusted p-values (padj<0.05) were used as indicated in the methods.

      Requested comments on referencing:

      - "Long-range" - how do the authors define long-range? Can this be referenced. CO? good reference here.- look to CHiCAGO paper

      - "When chromatin conformation or 3D organisation data is not available, studies typically assign regulatory elements to the nearest gene promoter" - needs referencing.

      - "Many of these 22 genes regulated by CHD7 and Atoh1 have established critical roles in cerebellar development, including Neurod2, Pax6 and Gli2 (Fig. 5B)" - needs referencing. "from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2" - needs referencing.

      Thank you, references have been added.

      - "active enhancers (H3K27ac+, H3K4me1+), promoters (H3K27ac+, H3K4me3+), regulatory elements (H3K27ac+, H3K4me1+, H3K4me3+), or poised enhancers (H3K4me1+)" - needs referencing.

      Thank you, references have been added.

      - Reference required in main text for VISTA (e.g. Visel et al., 2007)

      Thank you, reference added.

      Reviewer #2 (Significance):

      The strengths of this manuscript are the integrated approach to identify cell-type specific enhancers utilizing available epigenomic datasets, and leveraging 3D genome topology to directly link them to their target genes. For example for the Reln gene previously implicated in cerebellar phenotypes for CHD7 mutants. The pcHi-C dataset generated in this study provides a valuable reference for the community of enhancer-promoter pairs for a specific cell-type of interest with human disease relevance.

      We thank the reviewer for recognising the potential value of our work to the community.

      The limitations of the study are partially addressed in the text by the authors, including the resolution from the pcHi-C using a 6-bp cutter, the limitation of sequencing depth (more interactions may have been identified with more depth), and the limitated of correlation between replicates (likely due to undersampling the library). Page 9 "some additional interactions with the nearest gene promoters might be identified in our pcHi-C dataset with deeper sequencing".

      We thank the reviewer for highlighting our acknowledgements of the potential limitations of our work.

      Additional limitations include the use of the VISTA browser mouse LacZ embryos to validate some of their enhancers, the limitation here being that the VISTA browser tests enhancers at embryonic stages (focused at E11.5 and E13.5) while the GCps cells were collected at P7. The LacZ images from VISTA are also not shown. The HEK cells used for the co-IP could be seen as a limitation as these are not relevant cells for the cell state studied, the authors could clarify their use of these cells.

      We thank the reviewer for their careful assessment of the limitations of our study. We have now included images of the VISTA enhancers in Fig. 1I,J,K. Rather than a limitation, using irrelevant cells for co-IP might be seen as a better approach, as conceivably the chances of an indirect interaction between the two proteins being tested by a bridging complex is less in an irrelevant cell types that might not contain such complexes. Either way, HEK293T cells is the standard laboratory model for co-IP studies as they can be transfected with ease.

      The study reported here is largely based on previous work from the authors (Whittaker et al 2017b). This study reported that the chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum and deletion of CHD7 from GCps resulted in the phenotype of cerebellar hypoplasia. This study also largely leverages previously published datasets from the Whittaker et al 2017b (e.g. CHD7 deletion data) and reanalyses it in the light of the new pcHi-C datasets.

      This manuscript will be of interest to researchers interested in analysing long-distance targets of as well as researchers trying to understand the precise gene regulation in cerebellar development. It may also be of interest to clinical geneticists to interpret novel putative non-coding disease mutations.

      We thank the reviewer for highlighting the wide interest of our manuscript.

      In assessing this manuscript, my expertise lies in models of human development and gene regulation, with a focus on enhancer function.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Riegman et al have explored the gene regulatory landscapes of cerebellar granule cell progenitors (GCps). They have generated promoter capture Hi-C data to identify regions that interact with promoters in these cells. In addition they generate ATACseq data in wild-type and CDH7 knock-out cells. They integrate these data to identify enhancers that potentially regulate genes in GCps. In addition, the authors identify an interaction between CHD7 and ATOH1, whose binding sites also overlap in the genome.

      The dataset can be potentially interesting for people studying cerebellar development.

      I have a few concerns regarding the paper. The most pressing one is that the authors seem to equate interactions in pcHi-C with regulation. This is problematic for two reasons. First whether interaction equates regulation is still debated and whether this can be detected with a low-resolution C-method (i.e. using HindIII) is a further point of contention.

      We thank the reviewer for pointing this out. We agree and apologise for not being clear in our manuscript. We have made the necessary amendments to indicate that pcHi-C by itself only assess proximity in the nucleus, not function.

      We acknowledge the limitations of the pcHi-C method, including that resolution is limited by the use of a restriction enzyme. However, we (see e..g. Suppl. Fig. 1) and others (see e.g. Freire-Pritchett et al (2017) and Mifsud et al (2015)) have used this approach successfully to identify functional enhancer elements.

      The second issue has to do with the way the pcHi-C data is interpreted. What is detected as a significant interaction by Chicago are regions that have a contact frequence above background. This means that local regions with a (much) higher contact frequency may not be called as significant. When we follow the logic that contact frequency is related to gene activation (which may not necessarily be true) whether a fragment is more frequently contacted than the background should not matter (relative contact frequency), rather it should be interpreted based on the absolute contact frequency.

      The reviewer is right that local regions will have a higher contact frequency and that local contacts aren’t always captured by the CHiCAGO model. However, the purpose of this study was to prioritise the identification of distal elements that are not captured by existing methods including nearest gene annotation.

      There are a number of reasons why absolute contact frequency might not be an appropriate measure to infer gene regulation: 1) Many factors can affect the absolute contact frequency including the proportion of cells that are exhibiting active transcription at that time across a population, especially if expression is limited to a small number of this population at that time. 2) Absolute contact frequency assumes that more contact results in more regulation which is not necessarily true and would depend on the combination of factors that are associated with that regulatory element. Figure 1 from https://www.nature.com/articles/s41596-023-00817-8 - Figure 1 – Micro capture C show that regions with low absolute contact frequency compared to adjacent regions have potential to regulate gene expression, as have other studies that have used CHiCAGO to identify regulatory elements. 3) The sequence of some fragments makes them more likely to captured or enriched in the HiC protocol, which the relative contact frequency above background controls for.

      This becomes relevant because the authors claim that 80% of enhancers are wrongly annotated based on their metrics. The only way to correctly annotate an enhancer is to knock it out and checking the effect on genes in the vicinity. Therefore, to claim that their method can correctly annotate enhancer is grossly overstated, particularly when considering the issues with contact frequency stated above. Therefore, claims like 80% of enhancers are wrongly annotated should be removed from the paper. The authors should discuss how to annotate enhancers, in the Discussion and what the proper method is for annotations.

      We have amended the text to indicate that we do not suggest that VISTA enhancers are wrongly annotated but incompletely assigned. We apologise for making this suggestion in the first draft. There is however complementary evidence from Cheng et al (2024), now referenced in the revised manuscript, that also find 60% of the VISTA enhancers skip their adjacent gene. It is also well established in the literature that nearest genes are not always regulated.

      Other points:

      - The authors claims that PIFs have 2.14 and 2.69 fold enrichment of H3K4me1 and H3K27ac sites. Did the authors use the whole genome as background. If so, they should take into account that promoter are more likely in regions of high gene density, which are more dense in active marks. It would be better to perform local, circular permuation of the the PIFs around the promoter.

      The reviewer is correct that a whole genome background is not an appropriate background for testing enrichment of active marks within PIFs. Fortunately, this is taken into account in the CHiCAGO enrichment test which selects the background from fragments that are matched to the same distance of the PIFs to account for the observation that promoters are more likely in regions of high gene density and are therefore more enriched for active chromatin modifications.

      - The authors talk about "lead PIF", which is the fragment with the "most significant CHICAGO score". What does this mean? Something is significant or not, despite common misuse of the term there is no gradient of significance.

      The reviewer makes a good point here and we apologise for the oversight in wording and have corrected the text to be more specific that the lead PIF is the one with the highest ChiCAGO score.

      - In the GO analysis the categories with the lowest p-value are presented, but this biases for large categories. It would be more relevant to also select for and show the enrichment scores.

      We agree with the reviewer that a drawback of GO analysis is that it biases for large categories and that if by ‘enrichment score’ the reviewer means the –log10(p-value) we have included that in the supplementary tables which also includes the size of the category and number of genes detected in it.

      Reviewer #3 (Significance):

      The study provides a dataset that may be interesting for people studying cerebellar development. In that sense the data is mostly interesting from a fundamental viewpoint. The data seem of good quality.

      The authors claim that they a very sizeable fraction of enhancers are misannotated, but I do not believe that this is correct.

      We thank the reviewer for pointing this out. We apologise for creating the impression that VISTA enhancers are incorrectly annotated. We have amended the text to reflect that these are incompletely annotated.

      My expertise is 3D genome, bioinformatics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. While the experimental observations are solid, the theoretical interpretation and model validation are currently incomplete and require further refinement. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

      In the revised manuscript, we have added new experimental results that strengthen the connection between our observations and the modeling framework used to interpret the collective oscillations. We have not introduced a new theoretical model; rather, we employed established active matter models and sought to link the observed phenomena to these frameworks. In particular, our new data demonstrate that the transition between the motile and biofilm-forming states specifically modulates the elasticity and elasto active coupling of the bacterial structure. This behavior is in excellent agreement with the predictions of the active solid model. All the experimental details are given below. We believe that the revised version of the manuscript now establishes this connection more clearly and convincingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors have found multiple experimental knobs to perturb a mechanical wave behavior driven by pilli feedback. The authors framed this as nonreciprocal interactions - while I can see how nonreciprocity could play a role - what about mechanical feedback? Phenomenological models are fine, but a lack of mechanistic understanding is a weakness. I think it will be more interesting to frame the model based on potential mechanochemical feedback to understand microscopic mechanisms. Regardless, more can be done to better constrain the model through finding knobs to explain experimental observations (in Figures 3, 4, 5, and 7).

      We thank the reviewer for the positive assessment and for highlighting this important point. The reviewer is correct that the phenomenological Kuramoto-based model does not explicitly show the detailed cell–cell interactions. However, the active solid model is formulated on detailed elastic couplings and active forces, which inherently represent mechanical feedback within the biofilm structure. In this framework, nonreciprocity emerges naturally from the tensorial nature of active forces between bacteria—a concept already well established in the active matter literature. Importantly, this mechanism is purely mechanical and closely parallels nonreciprocal hydrodynamic interactions among active particles, which also arise from tensorial couplings.

      In our system, elastic interactions within the biofilm matrix, combined with pilus-generated active forces, provide a natural origin for nonreciprocal interactions. To further validate this, we improved our imaging to record single-cell dynamics both at the colony edge and on the biofilm surface. (new supplementary Video). These experiments show that motile bacteria at the leading edge of the biofilm structure do not generate waves, whereas stationary bacteria within the biofilm display local oscillations within the elastic network. This observation supports the view that collective oscillations are a property of the elastic biofilm state rather than of freely motile cells.

      Moreover, the main control parameter for these oscillations is the ratio between elastic strength and the active force generated by pili. In the active solid model, this ratio is captured by the parameter π and alpha terms. Experimentally, we can tune this ratio simply by adding or removing water from the biofilm, thereby modulating its elasto active coupling. We further motivated the controllability of this feature experimentally. We let the plate dry nonuniformly and observed that the transition between spiral target and plane waves could emerge spontaneously across the plate (see Figure 3a). This observation also states the importance of moisture in the biofilm. Starting from this point we established the connection between experimental observation and modelling. In our new simulations we also noticed that the transition from spiral to target wave is particularly driven by merging processes of different topological charges +/- 1 spiral pairs. This critical point was also confirmed by modelling which links the process to elasto active coupling. Further we supported our claim by imagining the edge and the biofilm structure. These new results clarify that elastic structure of the biofilm is critically important (Supplementary Figure 3). We have clarified this mechanistic link in the revised manuscript and rewritten the relevant sections to make this connection explicit.

      Modification in the manuscript:

      “To gain deeper insight into the mechanisms underlying wave formation, we imaged the dynamics of individual bacteria from the fingering regions toward the center of the biofilm. This distinction is critical because, unlike the biofilm center, the edges do not generate waves. We observed that bacteria near the fingering regions remain motile and exhibit collective flow. In contrast, bacteria at the biofilm center are surface-attached and undergo periodic lifting motions. This behavior strongly resembles Mexican-wave dynamics.”

      “We further found that the central region of the biofilm is mechanically more elastic, whereas the edge regions—where wave formation is absent—are motile. These observations suggest that gradual biofilm maturation is a key factor that transforms motile bacteria into a periodically moving but spatially constrained state. Consistent with this picture, the PAO1 strain, which has a strong biofilm-forming capability, completely suppresses surface oscillations. In contrast, the PA14 strain exhibits intermediate behavior, sustaining a partial transition between motile and locally constrained dynamics. Remarkably, signatures of this transition and wave generation are already detectable at the earliest stages of finger formation.”

      Strengths:

      The report of mechanical waves in bacterial collectives. The mechanism has potential application in a multicellular context, such as morphogenesis.

      We thank the reviewer for the positive assessment and for highlighting this potential broad impact of our findings.

      Weaknesses:

      My most serious concern is about left-right symmetry breaking. I fail to see how the data in Figure 6 shows LR symmetry breaking. All they show is in-out directionality, which is a boundary condition. LR SM means breaking of mirror symmetry - the pattern cannot be superimposed on its mirror image using only rigid body transformations (translation and rotation) - as far as I am aware, this condition is not satisfied in this pattern-forming system.

      We thank the reviewer for pointing out this critical issue. We acknowledge that we overlooked the distinction between biological and physical definitions of left–right symmetry in our initial submission, and we agree that our terminology was confusing.

      In developmental biology, the term “left–right symmetry breaking” is often used to describe asymmetric flows generated by nodal cilia, which subsequently establish developmental asymmetry. This usage differs fundamentally from the physical definition of mirror symmetry breaking, which refers to chirality switching upon mirror reflection. As the reviewer correctly noted, our system does not exhibit mirror symmetry breaking in this strict physical sense.

      To avoid confusion, we have revised the manuscript and replaced the term left–right symmetry breaking with left–right asymmetry between the edge and the center of the biofilm. This asymmetry arises from frequency gradients across the biofilm and is not a trivial boundary effect. For circular colonies, this phenomenon is more accurately described as radial asymmetry. We have rewritten the relevant sections of the manuscript to clarify this distinction and prevent misinterpretation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Altin et al. examines the dynamics of bacterial assemblies, building on previously published work documenting mechanical spiral waves. The authors show that the emergent dynamics can be influenced by various factors, including the strain of bacteria and water content in the sample. While the topic of this paper would be of broad interest, and the preliminary results are certainly interesting, various aspects of this paper are underdeveloped and require further exploration.

      Strengths:

      One of the nice features of this system is the ability to transition between the different states based on the addition or withdrawal of water. The authors use a similar experimental model system and mathematical model to previously published work (Reference 49), but extend by showing that the behaviour can be modified through simple interventions. Specifically, the authors show that adding water droplets or drying the sample through heating can result in changes in the observed wave structure. This represents a possible way of controlling active matter.

      The mathematical model proposed in this paper involves a phase-oscillator model of Kuramotostyle coupling (similar to previously reported models). A non-reciprocal phase lag is introduced in order to facilitate the patterns seen in experiments. The qualitative agreement in the behaviour is quite striking, showing both spiral waves and travelling waves.

      We thank the reviewer for the positive assessment and for pointing out areas that required further development. The reviewer is correct that our work builds on previously reported bacterial spiral wave systems; however, there are several significant differences that we now emphasize more clearly in the revised manuscript.

      First, our study involves a different bacterial species and reveals a distinct dynamical process: the waves we report are strictly localized on the surface of the biofilm, in contrast to the bulk oscillations detected through density fluctuations in the earlier work (Ref. 49). The surface waves in our system resemble “Mexican wave”-like motions, in which surface bacteria periodically lift upward. To highlight this key distinction, we performed new imaging experiments that directly visualize this process. (New Video 5 and 6, Author response image 1).

      Second, we systematically compared different bacterial strains, including pathogenic species such as P. aeruginosa PA14 and PAO1, alongside our BSL-1 strain. This comparative approach demonstrates that the observed phenomenon spans strains with different pathogenicity levels, and genetic variations while also showing that our strain provides a safer and more broadly usable model system for laboratory investigations.

      Third, the modeling frameworks differ. Whereas the referred study relied primarily on phase models similar to those used in cilia systems, we combine a delayed Kuramoto-style oscillator model with an active solid model. This combination provides both a phenomenological description and a physical interpretation of the collective dynamics. We acknowledge that, in the original submission, the physical interpretation of the model in relation to our experimental system was underdeveloped. In the revision, we have now established this link explicitly through the elasticity and elasto active coupling of the biofilm. Specifically, we show that the transition from motile to biofilm states is accompanied by changes in elasticity, which directly influence the observed transitions between different types of wave defects. This connection is consistent with prior theoretical works and has even been only studied in robotic active matter systems.

      Together, these clarifications and new results reinforce the novelty of our findings and establish a stronger connection between the experiments and the modeling framework.

      Author response image 1.

      Comparison between the elastic biofilm core and the motile colony edge. Highresolution video recordings revealing individual bacterial motion highlight the key physical differences driving wave-generating. Time-lapse snapshots show that bacteria at the colony edge move freely and form fingering structures, whereas bacteria in the elastic central biofilm periodically lift vertically, producing a Mexican-wave–like collective motion across the surface. See new Video

      Weaknesses:

      The principal observation of the paper - that spiral waves emerge in these systems and can be controlled in various ways - is not linked to microscale dynamics at the cell level. It is recognised that hydrodynamics can introduce non-reciprocity, an essential ingredient of this model. However, in this work the authors have not identified a physical mechanism for the lag, e.g., either through steric interactions or hydrodynamic disturbances. This is also relevant in the phase oscillator modelling section. In low Reynolds number flows, dynamics are instantaneously determined. In this light, what does the phase lag term represent?

      The reviewer is correct that, at low Reynolds numbers, fluid dynamics are instantaneous and do not generate real temporal delays. However, nonreciprocity in hydrodynamic interactions can still emerge from the tensorial structure of the Blake–Oseen Green’s function. In this formalism, the effective asymmetry can be represented mathematically as a phase-lag–like term. This has been theoretically demonstrated in Ref.40. While this is not a literal time delay, it functions analogously by breaking odd symmetry in the coupling.

      In our system, strong long-range hydrodynamic interactions are absent, as the bacteria are embedded in an elastic biofilm matrix. Instead, the dominant interactions are active elastic couplings mediated by pili and biofilm structure. The elastic solid model behaves in a way that is conceptually similar to the hydrodynamic case: pili-induced deformations of the elastic medium produce anisotropic stresses that play a role analogous to the tensorial hydrodynamic Green’s function. Thus, the phase-lag term in our Kuramoto-based model can be interpreted as an effective representation of these nonreciprocal elastic interactions.

      We have clarified this point in the revised manuscript by explicitly connecting the phenomenological phase-lag term to the underlying elastic coupling in biofilms.

      What is the origin of the coupling term, b? Can this be varied systematically or derived from experimental measurements or parameters?

      The term b represents the enhanced elasto-active coupling of the pili process. The length of the Pili varies, and the elongated Pili has more potential to modulate the coupling between bacteria which is known to depend on a critical threshold. This process resembles the pinning dynamics and is driven by the activity of molecular motors within the pili machinery. However, the detailed mechanisms that set the effective coupling strength remain highly complex and are not yet fully understood.

      At present, we do not have a direct way to systematically manipulate b in experiments. A major technical limitation is the nanoscale nature of type IV pili: these protein assemblies are extremely small and difficult to monitor or manipulate directly. Even basic tools such as GFP-based labeling have proven challenging to implement, which restricts our ability to track the detailed dynamics of these structures in live biofilms.

      While we cannot currently derive b directly from experimental parameters, we emphasize in the revised manuscript that b should be understood as an effective parameter capturing the excitability of pili retractions. We also highlight this limitation and note that future advances in molecular imaging and manipulation of pili will be essential for quantitatively linking b to microscopic processes.

      Classification of wave properties is an important aspect of this paper, but is not accomplished in a quantitative sense. What is the method for distinguishing between travelling and spiral waves? There is a range of quantitative tools that could be used to investigate these dynamics (and also compare quantitatively with the models). For example, examining the correlation functions and order parameters could assist with the extraction of wave features (see extensive literature on oscillator models).

      We thank the reviewer for emphasizing this important point. In the revised manuscript, we have incorporated the classic Kuramoto order parameter (S) to characterize the dynamics in our model simulations. However, this metric is not directly applicable to our experimental system, because we cannot resolve the phase of individual bacteria at large scales.

      Instead, we have focused on a flux-based parameter, as previously used in Ref. 40, which can be measured experimentally from collective surface dynamics. Interestingly, we find that the directional flux extracted from our experimental movies closely matches the trends predicted by the model order parameter. We suspect that this similarity arises from the combination of our optical illumination method and the characteristic surface modulations of the biofilm. While we currently lack a rigorous theoretical justification for this correspondence, so we want to keep this discussion in the review document.

      In summary, we now use the classic Kuramoto order parameter in simulations and rely on the experimentally accessible flux measure for our experimental data. This dual approach allows us to compare model and experiment in a consistent manner.

      Author response image 2.

      Critical order parameters of the coupled biofilm system. (a) The Kuramoto global order parameter increases continuously as the system becomes globally synchronized. In contrast, in the nonreciprocally coupled system the order parameter saturates at a critical level. (b) In the experimentally observed biofilm, however the flux generated by the coupled oscillations provides a more appropriate measure of synchronization. Blue curves indicate directionally propagating planar waves, red curves correspond to spiral wave formation, and green curves represent the globally synchronized reciprocal system.

      Author response image 3.

      Comparison of flux profiles of the simulations with experimental measurements. Directional optical illumination enhances the flux term on the surface of the biofilm.

      The methodology of changing the dynamics through moisture content appears to be slightly underdeveloped, e.g., adding water involves a droplet, and removing water is accomplished by heating (which presumably could cause other effects). Could the dynamics not be controlled more directly by varying the humidity?

      We thank the reviewer for this valuable suggestion. Our results indicate that water content in the biofilm plays a key role in driving the transition to the biofilm state by modulating its elasticity. During the initial submission, we did not know how to systematically vary humidity without simultaneously altering temperature. Standard approaches typically involve water evaporation in controlled chambers, which inherently changes both parameters.

      Following the reviewer’s recommendation, we first measured the ambient moisture levels inside closed culture plates. To our surprise, the relative humidity was already ~98%, leaving virtually no room to increase it further. We then attempted to decrease humidity by flowing dry synthetic air, but even under these conditions we could not reduce it below ~85%, and achieving this required unrealistically high flow rates. Moreover, we noticed that in closed-lid NGM plates, evaporation is already substantial, and when the lid is left open the evaporation rate reaches ~1 µm/s. This rapid surface thinning severely limits the quality of long-term time-lapse imaging.

      Taken together, these technical constraints explain why we have to reliy on localized perturbations such as water droplets and heating rather than global humidity control. We have clarified this point in the revised manuscript and now explicitly discuss both the challenges and limitations of humidity-based approaches.

      At the same time, the authors also mention that temperature itself plays a role in shaping the behaviour. What is the mechanism for this? Is it just through evaporation? Since the frequency increases with temperature, could it just be that activity increases with temperature?

      We thank the reviewer for raising this critical point. We believe that temperature has two distinct impacts operating on different timescales.

      Short timescale (~minutes): We observed that biofilm oscillations respond to temperature changes very rapidly and in a reversible manner. This timescale is too short to be explained by modulation of water content or bulk elasticity of the biofilm. Instead, we attribute the immediate frequency increase to enhanced biological activity of the bacteria at elevated temperatures.

      Long timescale (~tens of minutes to hours): During processes such as the transition from planar to spiral waves, prolonged heating can significantly alter the biofilm structure. These changes are not reversible and likely involve modifications of elasticity and other structural properties.

      In the modeling framework, the short-timescale effect is represented as an increase in the active force term, while the long-timescale effect is captured by concurrent changes in both the active force and the elastic properties of the biofilm. We have clarified this mechanism and its representation in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a novel investigation into unidirectionally propagating waves observed on the surface of Pseudomonas nitroreducens bacterial biofilms. The authors explore how these waves, initially spiral in form, transition into combinations of spiral, target, and planar patterns. The study identifies the periodic extension-retraction cycles of type IV pili as the driving mechanism for wave propagation, which preferentially moves from the colony's edge to its center. Furthermore, the manuscript proposes two theoretical models-a phase-oscillator model and a continuum active solid model-to reproduce these phenomena, and demonstrates how external manipulations (e.g., water droplets, temperature, PEG) can control wave patterns and direction, often correlating with oscillation frequency gradients. The work aims to bridge the fields of activematter physics and bacterial biophysics by providing both experimental observations and theoretical frameworks for understanding these complex biological wave phenomena.

      We thank the reviewer for the positive assessment of our work and for highlighting both the novelty and the key contributions of our study.

      Strengths:

      The experimental discovery of unidirectionally propagating waves on bacterial biofilms is highly intriguing and represents a significant contribution to both microbiology and active-matter physics.

      The detailed observations of wave pattern transitions (spiral to target to planar) and their response to various environmental perturbations (water, temperature, PEG) provide valuable empirical data. The identification of type IV pili as the driving force offers a concrete biological mechanism. The observed correlation between frequency gradients and wave direction is a compelling finding with potential for broader implications in understanding biological pattern formation. This work has the potential to stimulate further research in the collective behavior of living systems and the physical principles underlying biological organization.

      We thank the reviewer once again for emphasizing the importance of wave directionality. We also believe that this phenomenon may provide insight into early symmetry-breaking processes observed in developmental biology, where oxygen or nutrient gradients in dense environments could play a similar role.

      Weaknesses:

      The manuscript attempts to link unidirectional wave propagation to non-reciprocal couplings but ultimately shows that the wave direction is determined by the gradient of the oscillation frequency. The couplings in the two theoretical models are both isotropic and thus cannot dictate the wave direction. A clear distinction should be made between non-reciprocity as a source of wave generation and non-uniformity as a controlling factor of wave direction.

      We greatly appreciate the reviewer’s careful evaluation, particularly for highlighting this important and often confusing distinction. The relationship between nonreciprocity, spontaneous symmetry breaking, and frequency gradients has also been a challenging concept for us and required significant effort to clarify.

      Recent theoretical studies have established that traveling wave formation requires nonreciprocity, which provides a framework for understanding phenomena ranging from spiral to target and planar waves. In our system, nonreciprocity arises between the displacement field (U) and the pili force vector (P): as a result in broken phase U effectively “chases” P, breaking PT symmetry locally and thereby enabling the generation of local directional flux and traveling waves. In this sense, nonreciprocity is essential for travelling wave generation and spontaneous symmetry breaking in either direction.

      However, we now agree that global directionality (always from right to left, or edge to center) is set by an independent factor—namely, the oscillation frequency gradient across the biofilm. Thus, while nonreciprocity determines whether waves can travel, frequency gradients determine the large-scale direction in which they propagate. Put differently, PT symmetry is already broken spiral waves due to nonreciprocity, but global asymmetry (frequency gradients) is required to align the overall propagation in one direction.

      We have clarified this distinction in the revised manuscript, emphasizing that nonreciprocity is a necessary ingredient for travelling wave generation, whereas global asymmetry controls global wave direction.

      Modification in the manuscript:

      “We should note that traveling waves indicate broken PT symmetry between these fields triggered by nonreciprocity, with spiral waves serving as a classic signature of this phenomenon. A further transition from spiral to planar waves reflects an overall asymmetry in the frequency profile, which is not directly related to PT-symmetry breaking.”

      The relationship between the phase oscillator model and the active solid model is unclear. Given that U and P are both dynamical variables evolving in three-dimensional space, defining the phase Φ precisely in the phase space spanned by U and P could be challenging. A graphical illustration of the definition of Φ would be beneficial. To ensure reproducibility of the numerical results, the parameter values used in the numerical simulations and an explicit definition of the elastic force in the active solid model should be provided.

      We agree with the reviewer that the relationship between the phase oscillator model and the active solid model can be confusing, but establishing this link is essential to connect different modeling approaches in the literature. As the reviewer notes, in a fully three-dimensional setting with freely moving bacteria, defining the oscillation phase (Φ) in the phase space spanned by U and P is indeed complicated.

      However, our recent imaging results show that bacteria within the biofilm do not undergo large translational motions but instead exhibit periodic “Mexican wave”-like oscillations. These oscillations are confined to a restricted phase space, which allows us to define Φ in a straightforward way. In this context, the phase oscillator model becomes a natural reduction of the dynamics.

      Similarly, in the active solid (or active gel) model, we can plot not only the displacement and force vectors but also the local phase, which shows strong agreement with the phenomenological Kuramoto-style model. To make this connection clearer, we have now included a schematic illustration in the revised manuscript that explicitly shows how Φ is defined in the reduced phase space, and we provide the parameter values used in the simulations as well as the explicit definition of the elastic force in the active solid model to ensure reproducibility.

      The link between the theoretical models and experimental results is weak. For example, the propagation of the kink from the lower to the higher part of the surface (Figure 1e) could be addressed within the framework of the active solid model. The mechanism of transition from spiral to target waves (Figure 3a), b)) requires clarification, identifying which model parameter is crucial for inducing this transition. The wave propagation toward the lower frequency side is numerically demonstrated using the phase oscillator model, but a physical or intuitive explanation for this phenomenon is missing. Also, the wave transitions induced by the addition of water droplets and temperature rise are not linked to specific parameters in the theoretical models.

      We thank the reviewer for highlighting this important weakness, which was also consistently noted by the other reviewers. We fully agree that the link between our theoretical models and experimental results required significant strengthening.

      With improved imaging in the revised study, we were able to uncover additional connections that help establish this link more clearly. We acknowledge that our ability to measure detailed biofilm parameters is limited, which restricts us from providing fully quantitative mappings. Nonetheless, based on the reviewers’ suggestions, we carried out additional imaging and simulations to compare bacterial dynamics at the colony edge and within the biofilm surface. These data confirm that cells within the biofilm undergo restricted, “Mexican wave”-like oscillations, emphasizing the critical role of elasticity in governing the collective dynamics.

      Experimentally, we found that adding water or PEG, or alternatively inducing drying, strongly modulates the effective elasticity of the biofilm. Within the active solid framework, elasticity and the elasto-active coupling are the key parameters controlling the system. By tuning these parameters in simulations, we could reproduce the qualitative transitions observed experimentally. Specifically, we observed that:

      At low elasticity, topological defects are mobile and can move, merge, or annihilate, leading to the emergence of planar waves.

      At high elasticity, defects remain pinned, across the biofilm surface, dominating the dynamics.

      These observations suggest that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves. Although we cannot independently manipulate each parameter in experiments, varying the moisture content provides an effective and experimentally accessible control.

      Finally, our simulations and new analyses reveal that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions, and we believe it warrants further theoretical exploration. We have clarified this connection and its implications in the revised manuscript.

      First, we compare defect dynamics in both Kuramoto-based simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in the review , pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime Supplementary Figure 11.

      This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs.

      Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation. To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced Supplementary Figure 12. Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Author response image 4) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs. We have updated the details of the defect dynamics in the revised manuscript to clarify the transition between these waves.

      Author response image 4.

      Experimental observation showing that small surface nonuniformities on the biofilm surface trigger the formation of closely separated defect pairs. Arrows indicate the position of the nonuniformities

      Modification in the manuscript:

      Defect dynamics controlling the transition between spiral to target waves

      “To better understand the dynamics of the transition between different form of the waves we focused on numerical simulations. We noticed that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves varying the moisture content provides an effective and experimentally accessible control this motility. Our analyses revealed that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions. First, we compare defect dynamics in both Kuramotobased simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in Supplementary Figure10, pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime (Supplementary Figure11). This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs. Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation (Supplementary Video9). To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced (Supplementary Video12-13). Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Supplementary Video9) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs.”

      All the recommended points have been addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      Sullivan and colleagues examined the modulation of reflexive visuomotor responses during collaboration between pairs of participants performing a joint reaching movement to a target. In their experiments, the players jointly controlled a cursor that they had to move towards narrow or wide targets. In each experimental block, each participant had a different type of target they had to move the joint cursor to. During the experiment, the authors used lateral perturbation of the cursor to test participants’ fast feedback responses to the different target types. The authors suggest participants integrate the target type and related cost of their partner into their own movements, which suggests that visuomotor gains are affected by the partner’s task.

      Strengths

      The topic of the manuscript is very interesting, and the authors are using well established methodology to test their hypothesis. They combine experimental studies with optimal control models to further support their work. Overall, the manuscript is very timely and shows important findings - that the feedback responses reflect both our and our partner’s tasks.

      We thank the reviewer for the positive comments regarding our work.

      Weaknesses

      However, in the current version of the manuscript, I believe the results could also be interpreted differently, which suggest that the authors should provide further support for their hypothesis and conclusions.

      Major Comments

      (1) Results of the relevant conditions:

      In addition to the authors’ explanation regarding the results, it is also possible that the results represent a simple modulation of the reflexive response to a scaled version of cursor movement. That is, when the cursor is partially controlled by a partner, which also contributes to reducing movement error, it can also be interpreted by the sensorimotor system as a scaling of hand-to-cursor movement. In this case, the reflexes are modulated according to a scaling factor (how much do I need to move to bring the cursor to the target). I believe that a single-agent simulation of an OFC model with a scaling factor in the lateral direction can generate the same predictions as those presented by the authors in this study. In other words, maybe the controller has learned about the nature of the perturbation in each specific context, that in some conditions I need to control strongly, whereas in others I do not (without having any model of the partner). I suggest that the authors demonstrate how they can distinguish their interpretation of the results from other explanations.

      We thank the reviewer for the thoughtful comment. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner-irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (2) The effect of the partner target:

      The authors presented both self and partner targets together. While the effect of each target type, presented separately, is known, it is unclear how presenting both simultaneously affects individual response. That is, does a small target with a background of the wide target affect the reflexive response in the case of a single participant moving? The results of Experiment 2, comparing the case of partner- and self-relevant targets versus partner-irrelevant and self-relevant targets, may suggest that the system acted based on the relevant target, regardless of the presence and instructions regarding the self-target.

      We thank the reviewer for bringing up another valid point, which we discussed at length as a group when designing the experiment. The reviewer is correct in pointing out the lack of difference in the involuntary epoch between the partner-relevant/self-relevant and partner-irrelevant/self-relevant could potentially suggest that the sensorimotor system acted based on only relevant targets, irrespective if it was a self or partner relevant target. While the effect of the simultaneous presentation of a narrow and wide target on an individual’s response by themselves is unknown, comparing the differences between our other experimental conditions control for this potential confound. Participants viewed a wide target and a narrow target on the screen, in both the partner-irrelevant/self-relevant condition and the partner-relevant/self-irrelevant condition. Crucially, we found that the visuomotor feedback responses were greater in the partner-irrelevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition in both Experiment 1 and 2. That is, participants were able to distinguish between the self-target and partner target and appropriately modify their feedback responses in both Experiment 1 and 2, despite there being both a wide and narrow target on the screen in both conditions. Given that we found different visuomotor feedback responses between the two conditions that had both a narrow and wide target, this rules out the alternative hypothesis that the sensorimotor system acted based just on a relevant target being present. We have added to our discussion to clarify this point.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed feedback response differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (3) Experiment instructions:

      It is unclear what the general instructions were for the participants and whether the instructions provided set the proposed weighted cost, which could be altered with different instructions.

      Our instructions explicitly informed participants that their performance bonus was only based on them stabilizing within their own self-target within the time constraint. We have added the following in the methods to emphasize this instruction.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (4) Some work has shown that the gain of visuomotor feedback responses reflects the time to target and that this is updated online after a perturbation (Cesonis & Franklin, 2020, eNeuro; Cesonis and Franklin, 2021, NBDT; also related to Crevecoeur et al., 2013, J Neurophysiol). These models would predict different feedback gains depending on the distance remaining to the target for the participant and the time to correct for the jump, which is directly affected by the small or large targets. Could this time be used to target instead of explaining the results? I don’t believe that this is the case, but the authors should try to rule out other interpretations. This is maybe a minor point, but perhaps more important is the location (&time remaining) for each participant at the time of the jump. It appears from the figures that this might be affected by the condition (given the change in movement lengths - see Figure 3 B & C). If this is the case, then could some of the feedback gain be related to these parameters and not the model of the partner, as suggested? Some evidence to rule this out would be a good addition to the paper - perhaps the distance of each partner at the time of the perturbation, for example. In addition, please analyze the synchrony of the two partners’ movements.

      (1) Time to target and forward position

      The reviewer raises an interesting point. In our task, the cursor/target jump occurs once the center cursor crosses 6.25 cm from the start. We analyzed the time it took for the center cursor to intercept the targets from perturbation onset (Supplementary D). In Experiment 1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47] = 2.45, p = 0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment 2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F[1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p = 0.006). Given that only one comparison in Experiment 2 showed a difference in time-to-target, we do not believe that time-to-target was a significant driver of the change in involuntary visuomotor feedback responses observed between conditions. While time-to-target is likely a metric the nervous system modifies feedback gains around, our results suggest that the nervous system can also use a partner model to modify feedback gains. We have added a supplemental analysis on time to target

      “Previous work by Česonis and Franklin (2020) showed that time to-target is a key variable the sensorimotor system uses to modify feedback responses. In their experiment, they manipulated the time-to-target of the participant’s cursor, while controlling for other movement parameters (e.g., distance from goal) [1]. When compared to classical optimal feedback control models, they showed that a model that modifies feedback responses based on time-to-target best predicted their results. In our task, it’s possible that the time-to-target could have influenced visuomotor feedback responses, since the distance to the center of the target is greater for a narrow target than a wide target on perturbation trials.”

      “We calculated the time from perturbation onset to the center cursor reaching the forward position of the targets (Supplementary Fig. S5). In Experiment1, an ANOVA with center cursor time-to-target as the dependent variable showed no main effect of self-target (F[1,47]=2.45,p=0.124) or partner target (F[1,47] = 2.50, p=0.120), nor any interaction (F[1,47] = 1.97, p = 0.166). In Experiment2, an ANOVA with center cursor time-to-target as the dependent variable showed a significant interaction (F [1,47] = 5.87, p = 0.019). Post-hoc mean comparisons showed that only the difference between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant condition was significant (p=0.006). Although time-to-target and hand position are important variables for the control ofmovement,[1,2,3] they are likely not driving factors of the different in voluntary visuomotor feedback responses between our experimental conditions.”

      However, it is possible that the participant forward position at perturbation onset could also influence the involuntary feedback response. We show the forward positions at perturbation onset in Supplementary D. Statistical analysis of the forward positions in Experiment 1 showed a main effect of self-target (F[1,47] = 12.72, p < 0.001), main effect of partner target (F[1,47] = 12.82, p < 0.001), and no interaction (F[1,47] = 0.00, P = 0.991). We see the same trend in experiment 2, showing a main effect of self-target (F[1,47] = 12.11, p < 0.001), main effect of partner target (F[1,47] = 12.04, p < 0.001), and no interaction (F[1,47] = 0.00, p = 0.986). The fact that there was no interaction implies that the results could not solely be due to forward position. Nevertheless, given there were main effects, we proceeded to run an ANCOVA on the involuntary visuomotor feedback responses with forward position as a covariate. For experiment 1, we still observed a significant interaction between self and partner target (F[1,47] = 43.14, p < 0.001). Further, we also observed no significant main effect of forward position on the involuntary visuomotor feedback responses. The ANCOVA for Experiment 2 also showed that there was still a significant interaction of self and partner target on the involuntary visuomotor feedback responses (F[1,47] = 9.80, p = 0.002). However, here we did find a significant main effect of the forward position (F[1,47] = 5.06, p = 0.026). Therefore, we ran follow-up mean comparisons with the covariate adjusted means. We found the same statistical trend as reported in the main results. We found significant differences between the partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant conditions (p = 0.003), partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions (p < 0.001), partner-relevant/self-irrelevant and partner-relevant/self-relevant conditions (p < 0.001). We found no significant difference between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.381). Given that there was no main effect of forward position in Experiment 1, and that our adjusted mean comparisons in Experiment 2 showed the same trends as the unadjusted mean comparisons in the main manuscript, our results show that the forward position of the participants is not a significant factor in explaining the differences in involuntary visuomotor feedback responses between conditions.

      “Supplementary Fig. 6 shows the participant hand forward position at perturbation onset time for Experiment 1 (A) and Experiment 2 (B). It is possible that the participant forward hand position at perturbation onset time could influence their visuomotor feedback responses. Therefore, we ran an ANCOVA with self-target and partner target as factors, and participant forward hand position at perturbation onset time as a covariate. In Experiment 1, we found no main affect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 1.466, p = 0.228). Further, when including the covariate, we still found a significant interaction between self-target and partner target on in voluntary visuomotor feedback responses (F[1,47]=43.2, p<0.001).”

      “In Experiment 2, we found a significant main effect of participant forward hand position on involuntary visuomotor feedback responses (F[1,47] = 6.73, p = 0.010). We still found a significant interaction between self-target and partner target (F[1,47] = 9.78, p = 0.002). Since we found a main effect of participant forward hand position, we calculated the adjusted means of the involuntary visuomotor feedback responses. We then performed follow-up mean comparisons on the adjusted means of the involuntary visuomotor feedback responses (using emmeans in R). We found the same significant trends as the unadjusted means in the main manuscript. Specifically we found involuntary visuomotor feedback responses to be: significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition (p =0.003),significantly greater in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-relevant condition (p<0.001), significantly greater in the partner-relevant/self-relevant condition compared to the partner-relevant/self-irrelevant condition (p<0.001),and not different between the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions (p = 0.824).”

      We have also included in the discussion how time-to-target and participant forward hand position are important control variables to consider, and their potential relationship to our findings.

      “Finally, we also considered whether time to target [1,2]. (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation. Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses parsimoniously accounts for the differences observed between all conditions.”

      (2) Synchrony

      In our task, participants movements were not self-initiated. We had them begin the movement as soon as they hear an audible tone so that they would begin their movements at as similar a time as possible. We have analyzed the movement onset synchrony between participants within a pair, shown in Supplementary F.

      Supplementary: “We calculated movement onset times at the time that the participants left the start target [8]. We then took the absolute value of the difference between the participants within a pair as a measure of movement onset synchrony. For Experiment 1, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 1.38, p = 0.252), no main effect of partner target (F[1,47] = 0.057, p = 0.813), and no interaction (F[1,47] = 0.45, p = 0.508). For Experiment 2, an ANOVA with movement onset synchrony as the dependent variable showed no main effect of self-target (F[1,47] = 0.07, p = 0.788), no main effect of partner target (F[1,47] = 2.75, p = 0.111), and no interaction (F[1,47] = 2.31, p = 0.142).”

      Further, we have modified our methods to emphasize that participants within a pair generally began their movement at the same time.

      “Instead of self-initiating their movements, we specifically had participants move at the sound of a tone so that the movement onset between participants in a pair was as synchronous as possible (see Supplementary F for movement onset synchrony analysis).”

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 291-292: One study extensively examined cursor and target jump visuomotor on set times and found no difference (Franklin et al., 2016; J Neuroscience), which strongly argues against this interpretation.

      We thank the reviewer for pointing out this work. We have modified the following lines:

      “However, other work by Franklin and colleagues (2016) found no difference in visuomotor feedback response latencies between cursor and target jumps [6].”

      (2) Line 411: What were the instructions regarding partner performance in terms of the reward? Did you explain that individual performance alone will determine the reward?

      As addressed above, we have made the following changes to emphasize the instructions given to participants.

      “In other words, we ensured participants had a clear understanding that their performance in the task was only based on stabilizing the center cursor in their own self-target within the time constraint. Therefore, the instructions and timing constraints did not enforce participants to work together.”

      (3) Line 506: Ten probe trials in each direction is very low. Can this still be in the transition state of the feedback response, rather than at steady state? There are many studies done looking at the learning of visuomotor responses in which changes are still occurring after several hundred trials (e.g., Franklin et al., 2017 J Neurophysiol; Franklin et al., 2008; J Neuroscience). In this experiment, each block only lasts 151 trials total if my calculations are correct. How certain are you that the results are at a steady state and not continuously changing? Perhaps with further experimental experience, the feedback responses would approach the predictions of a different model.

      The reviewer raises an important point. We had run these analyses prior to submitting the manuscript and did not see anything. However, we believe this information is important to include since both we and yourself asked the same question. Specifically, we have analyzed the visuomotor feedback responses over the trials (Supplementary G), which shows little to no learning over time. Additionally, we also found no difference in the visuomotor feedback response trends between the first and second half of trials in each condition (Supplementary H). Therefore, it appears that the sensorimotor system was at steady state behaviour very quickly and we do believe that the feedback responses would approach the predictions of a different model if participants performed more trials. We have added the following

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different involuntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “In Experiment 2, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 9.42, p = 0.004) and second half (F[1,47] = 17.40, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (Fig. S10C-D).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system quickly formed a model of a partner and considered their costs to modify rapid motor responses.”

      We have also added to the discussion:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      (4) The authors should also discuss some of the prior work which is very relevant to the tasks studied: (Knill, Bondata & Chhabra, 2011, J Neuroscience). There may also be other papers that use this task for visuomotor feedback responses and therefore, should be included.

      We have included the Knill 2011 paper and also Cross 2019 in our discussion:

      “This modification of feedback responses based on a relevant/irrelevant task goal has also been shown in response to visual perturbations [7,8].”

      (5) Lines 301-303: The terms ’relevant’ and ’irrelevant’ here describe different concepts than the ones used in this study. I suggest making a distinction to avoid confusion for the reader.

      We thank the reviewer for pointing out that this is confusing. We’ve made the following changes to improve the clarity:

      “Further, Franklin and colleagues (2008) designed a visual perturbation to be relevant or irrelevant when reaching to the same target, showing greater involuntary visuomotor feedback responses to a relevant visual perturbation compared to an irrelevant visual perturbation [9].”

      (6) Line 459: The reaching movement was quite slow (25cm in about 1.2 seconds). Is this needed to ensure that both participants can complete the movements, given potentially very different start times? Please comment as this is different than many previous studies.

      Participants needed to stabilize the cursor for 500ms in their target within a time constraint of 1400 - 1600 ms. Therefore, they had to reach the target between 900 - 1100 ms (before stabilizing). Additionally, participants did not perform self-initiated movements, but were required to begin their movement as soon as they heard an audible tone. Given that reaction times are ~200ms, participants had ~700 - 900 ms to reach the target, which aligns with previous research (Franklin et al. (2008), Franklin et al. (2012), Nashed et al. (2012)). We have clarified the time constraints of the task in our Methods:

      “They therefore had 700 - 900 ms to first reach the target, since humans generally have response times ~200 ms, and they needed to stabilize within the target for 500 ms (i.e., 1400 - 200 - 500 = 700 ms and 1600 - 200 - 500 = 900 ms). Movement times of 700 - 900 ms are thus consistent with previous human reaching studies [4,9,10].”

      (7) Reference [25] is incomplete

      Thank you for catching this.

      And thank you for the thoughtful and clear review. We feel it has greatly improved the quality and clarity of our manuscript!

      Reviewer #2 (Public review):

      Summary

      Sullivan and colleagues studied the fast, involuntary, sensorimotor feedback control in interpersonal coordination. Using a cleverly designed joint-reaching experiment that separately manipulated the accuracy demands for a pair of participants, they demonstrated that the rapid visuomotor feedback response of a human participant to a sudden visual perturbation is modulated by his/her partner’s control policy and cost. The behavioral results are well-matched with the predictions of the optimal feedback control framework implemented with the dynamic game theory model. Overall, the study provides an important and novel set of results on the fast, involuntary feedback response in human motor control, in the context of interpersonal coordination.

      We thank the reviewer for the kind words!

      Review:

      Sullivan and colleagues investigated whether fast, involuntary sensorimotor feedback control is modulated by the partner’s state (e.g., cost and control policy) during interpersonal coordination. They asked a pair of participants to make a reaching movement to control a cursor and hit a target, where the cursor’s position was a combination of each participant’s hand position. To examine fast visuomotor feedback response, the authors applied a sudden shift in either the cursor (experiment 1) or the target (experiment 2) position in the middle of movement. To test the involvement of partner’s information in the feedback response, they independently manipulated the accuracy demand for each participant by varying the lateral length of the target (i.e., a wider/narrower target has a lower/higher demand for correction when movement is perturbed). Because participants could also see their partner’s target, they could theoretically take this information (e.g., whether their partner would correct, whether their correction would help their partner, etc.) into account when responding to the sudden visual shift. Computationally, the task structure can be handled using dynamic game theory, and the partner’s feedback control policy and cost function are integrated into the optimal feedback control framework. As predicted by the model, the authors demonstrated that the rapid visuomotor feedback response to a sudden visual perturbation is modulated by the partner’s control policy and cost. When their partner’s target was narrow, they made rapid feedback corrections even when their own target was wide (no need for correction), suggesting integration of their partner’s cost function. Similarly, they made corrections to a lesser degree when both targets were narrower than when the partner’s target was wider, suggesting that the feedback correction takes the partner’s correction (i.e., feedback control policy) into account.

      The strength of the current paper lies in the combination of clever behavioral experiments that independently manipulate each participant’s accuracy demand and a sophisticated computational approach that integrates optimal feedback control and dynamic game theory. Both the experimental design and data analysis sound good. While the main claim is well-supported by the results, the only current weakness is the lack of discussion of limitations and an alternative explanation. Adding these points will further strengthen the paper.

      Reviewer #2 (Recommendations for the authors):

      (1) While the current version is already well-written, it would be helpful for readers to further discuss the relationship between the current study and some potentially relevant studies, such as Braun et al. (2009), Ganesh et al. (2014), and Takagi et al. (2017) (2019).

      Thank you for pointing out these papers that we missed, which we now cite appropriately in light of our own work. In particular, we have added the following to our discussion, including Braun et al. (2009) and Takagi et al. (2017) (2019). However, Beckers et al. (2020) showed conflicting results from Ganesh et al. (2014), and since these works are about learning, we feel it is outside the scope of our work.

      “Further, others have shown that the sensorimotor system modifies movement selection according to game-theoretic predictions, [11] and that the sensorimotor system modifies movements using an estimate of the joint goal during human-human interactions [12,13].”

      (2) For an alternative interpretation of the results, one could consider, for instance, that the target’s visual appearance could have served as a contextual cue for learning different movement gains in the lateral direction (e.g., whether the partner corrects the shift might be approximated as a gain change). Although less likely, this alternative account could be tested by simulation and would strengthen the argument.

      This a thoughtful comment, also brought up by Reviewer 1. Here we provide our previous response that addresses this concern. While it is possible that the change in the visuomotor feedback responses could be just from a scaling factor. This hypothesis could explain the difference between two conditions, but would fail to explain differences between two other conditions. Specifically, this hypothesis could explain a decrease in involuntary visuomotor feedback responses between partner-irrelevant/self-relevant and partner-relevant/self-relevant. Critically, this hypothesis could not explain the difference between partner-irrelevant/self-irrelevant and partner-relevant/self-irrelevant. That is, there is no reason to scale a response to correct for a partner’s relevant target when your own target is irrelevant. However, our finding that there is a greater involuntary visuomotor feedback response in partner-relevant/self-irrelevant compared to partner irrelevant/self-irrelevant is predicted by the notion that humans form a representation of others and consider their movement costs.

      We have added a paragraph in the discussion to justify our hypothesis over the scaling factor hypothesis.

      “Our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses can parsimoniously explain all of our experimental findings. There are a few alternative hypotheses that could explain a subset of results. One alternative hypothesis is that participants simply learned the hand to center cursor mapping in each experimental condition. That is, instead of using a model of their partner, participants simply adapted to the dynamics of the center cursor. However, this hypothesis would not predict an increased involuntary visuomotor feedback response in the partner-relevant/self-irrelevant condition compared to the partner-irrelevant/self-irrelevant condition. If participants did not form a model of their partner nor consider their partner’s costs, then they would not display an increased feedback response when they had an irrelevant target and their partner’s target was relevant. An increased feedback response to help a partner achieve their goal is captured by our hypothesis that the sensorimotor system uses a representation of a partner and considers the partner’s costs to modify involuntary visuomotor feedback responses.”

      (3) Another (maybe unlikely) alternative interpretation is that the targets’ visual appearances might have been confusing. One might find that the closed square is common to both targets for the “Partner Relevant Self Irrelevant” and the “Partner Relevant Self Relevant”, and that this might have elicited the response to perturbation in “Partner Relevant Self Irrelevant”. Related to this point, it would be informative to describe how the “cooperative” fast feedback response developed over the course of the experiment, for instance, by comparing behaviors across experimental blocks.

      We have partitioned this question into two responses, relating to visual appearance of the targets and the development (i.e., learning) of visuomotor feedback responses over the course of the experiments.

      (1) Participants confused by visual appearance of the targets.

      We were also concerned that participants might be confused by the targets, and therefore confirmed with participants after the experiment that they correctly understood that the light grey filled rectangle was their own target and the dark grey hollow rectangle was their partners. Furthermore, in the partner-relevant/self-irrelevant, partner-irrelevant/self-relevant, and partner-relevant/self-relevant conditions, there is a small square target in each of the conditions. However, we found that the partner-irrelevant/self-relevant and partner-relevant/self-relevant conditions both elicited significantly greater involuntary visuomotor feedback responses than the partner-relevant/self-irrelevant condition. Thus, participants involuntary visuomotor feedback responses suggest that they correctly formed different representations based on an accurate understanding of the self vs partner target. The other reviewer had related comments about the visual stimuli, which we also address within the discussion.

      “Another alternative hypothesis would be that the sensorimotor system was responding only to the relevant target displayed on the screen. Again, this hypothesis would only explain a subset of our results. In particular, this relevant target hypothesis cannot explain the observed differences between the partner-relevant/self-irrelevant and partner-irrelevant/self-relevant conditions in both Experiments 1 and 2.”

      (2) Comparing feedback responses over time

      We have included the visuomotor feedback responses over each experimental condition in Supplementary G. Notably, we did not find any learning effect, suggesting that the sensorimotor system quickly developed a model of a partner’s behaviour and used that model to modify feedback responses. We have also added a paragraph on learning to our discussion.

      We’ve addressed how learning did not play a role in this study:

      “Finally, we also considered whether time to target [1,2] (Supplementary D), participant forward hand position (Supplementary E), or learning [4] (Supplementary G-H) influenced feedback responses, but found that none impacted the observed differences between experimental conditions nor changed our interpretation.”

      Supplementary: “Given there were 151 trials and 10 left/right probe trials for each experimental condition, it is possible that completing more trials may have lead to different in voluntary visuomotor feedback responses. Therefore, we analysed the in voluntary visuomotor feedback responses over the course of each experimental condition. Visually, involuntary visuomotor feedback responses in neither Experiment 1 (Fig. S8) nor Experiment 2 (Fig. S9) show any consistent learning (see Fig. S10 for statistical analysis). Therefore, it appears participants rapidly formed a partner model based on knowledge of their movement goal to modify their involuntary visuomotor feedback responses.”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p < 0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Supplementary Fig. S10 shows the involuntary visuomotor feedback responses in the first half (A,C) and second half (B,D) for each experimental condition. In Experiment 1, we observed the same statistical results in the first half and second half of trials as the analysis of all trials. That is, we observed a significant interaction between self-target and partner target in the first half (F[1,47] = 37.09, p < 0.001) and second half (F[1,47] = 48.68, p <0.001) of trials. Follow-up mean comparisons showed the same significant trends as our analysis of all trials in the main manuscript (see Fig. S10A-B).”

      Supplementary: “Showing the same involuntary visuomotor feedback response trends across the experimental conditions for the first half, second half, and all trials suggests that the sensorimotor system used a model of a partner based on their goals and considered their costs to modify rapid motor responses.”

      (4) It looks slightly counter intuitive (and therefore interesting) that the participant shows some amount of fast feedback responses in the “Partner Relevant Self Irrelevant” condition, since they were instructed to only consider the self-target. Based on the results, the authors suggest an altruistic feature of the motor system (lines 333-340). It would be helpful to clarify the basis for this interpretation, whether it is formally derived from the game-theoretic framework or represents a more conceptual interpretation. Providing additional explanation that translates the game-theoretic reasoning into more accessible, intuitive terms would help readers better understand and evaluate this claim.

      We are glad the reviewer also finds this result interesting. The reviewer raises an important point that there needs to be a more clear explanation for why we believe this result was found. We have made the following changes to the discussion:

      “Furthermore, this result is predicted by our dynamic game theory models that include the partner’s costs in the self cost function. In other words, a dynamic game theory model that selects feedback gains to minimize both the self and partner cost reflects an altruistic control policy.”

      (5) Please check whether all references are displayed correctly. Some of them (e.g., 25, 65) seemed not correctly shown in the References section.

      We have fixed the citation.

      We thank the reviewer for providing a clear and insightful review. Their comments have significantly improved the manuscript.

      References

      (1) Česonis, J., & Franklin, D. W. (2020). Time-to-Target Simplifies Optimal Control of Visuomotor Feedback Responses. eneuro, 7 (2), ENEURO.0514–19.2020.

      (2) Česonis, J., & Franklin, D. W. (2022). Contextual Cues Are Not Unique for Motor Learning: Task-dependant Switching of Feedback Controllers. PLOS Computational Biology, 18 (6), ed. by Haith, A. M.: e1010192.

      (3) Crevecoeur, F., Kurtzer, I., Bourke, T., & Scott, S. H. (2013). Feedback Responses Rapidly Scale with the Urgency to Correct for External Perturbations. Journal of Neurophysiology, 110 (6), 1323–1332.

      (4) Franklin, S., Wolpert, D. M., & Franklin, D. W. (2012). Visuomotor Feedback Gains Upregulate during the Learning of Novel Dynamics. Journal of Neurophysiology, 108 (2), 467–478.

      (5) Liu, Y., Leib, R., Dudley, W., Shafti, A., Faisal, A. A., & Franklin, D. W. (2025). Partner-Sourced Haptic Feedback Rather than Environmental Inputs Drives Coordination Improvement in Human Dyadic Collaboration. Scientific Reports, 15 (1), 40347.

      (6) Franklin, D. W., Reichenbach, A., Franklin, S., & Diedrichsen, J. (2016). Temporal Evolution of Spatial Computations for Visuomotor Control. The Journal of Neuroscience, 36 (8), 2329–2341.

      (7) Knill, D. C., Bondada, A., & Chhabra, M. (2011). Flexible, Task-Dependent Use of Sensory Feedback to Control Hand Movements. The Journal of Neuroscience, 31 (4), 1219–1237.

      (8) Cross, K. P., Cluff, T., Takei, T., & Scott, S. H. (2019). Visual Feedback Processing of the Limb Involves Two Distinct Phases. The Journal of Neuroscience, 39 (34), 6751–6765.

      (9) Franklin, D. W., & Wolpert, D. M. (2008). Specificity of Reflex Adaptation for Task-Relevant Variability. The Journal of Neuroscience, 28 (52), 14165–14175.

      (10) Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2012). Influence of the Behavioral Goal and Environmental Obstacles on Rapid Feedback Responses. Journal of Neurophysiology, 108 (4), 999–1009.

      (11) Braun, D. A., Ortega, P. A., & Wolpert, D. M. (2009). Nash Equilibria in Multi-Agent Motor Interactions. PLoS Computational Biology, 5 (8), ed. by Friston, K. J.: e1000468.

      (10) Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., & Burdet, E. (2017). Physically Interacting Individuals Estimate the Partner’s Goal to Enhance Their Movements. Nature Human Behaviour, 1 (3), 0054.

      (11) Takagi, A., Hirashima, M., Nozaki, D., & Burdet, E. (2019). Individuals Physically Interacting in a Group Rapidly Coordinate Their Movement by Estimating the Collective Goal. eLife, 8 , e41328.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on fig share can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We have now revised the manuscript to include a link to our dataset.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. We have now added a paragraph: “It is also important to clarify that we use the terms…… that lead to these meta-mechanisms arising remain an open question.” found in lines 120-129 in our Introduction to make this clarification.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points, we have now expanded our Discussion to include a paragraph: “Our results highlight the need for more…..range of task types and cognitive abilities.” found in lines 420-433 to highlight these key questions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not have any major objections, but I am clarifying my points as major or minor depending on the effort required to address (mostly via rewriting and clarifications).

      Major comments:

      (1) A schematic summary of the original study: Since the current manuscript builds directly on Sasaki & Biro (2017), it would greatly help readers if you included a concise schematic figure summarizing the original experiment. For instance, a simple panel could depict the chain design (experienced + naïve replacements), the control treatments, and the key empirical findings (improvements in route efficiency across generations, and route similarity within vs. between chains). Presenting this visually would save readers the effort of reconstructing the design and main results from text alone, especially for those unfamiliar with the original paper. It would also clarify exactly what empirical patterns your simulations are intended to reproduce.

      We thank the reviewer for this comment. We have now revised the manuscript with a schematic illustration adapted from the original study by Sasaki and Biro (2017). We hope this clarifies the experimental design and results we aimed to highlight in our work.

      (2) Reproducibility: Code and data are only "available on request." I believe eLife has strong policies on open science; a lack of immediate open access to analysis would be a barrier. I find it jarring that a paper intending to reproduce and improvise a previously published paper does not make the codes and data available for peer review or to readers without an explicit request.

      We have taken the feedback into consideration and updated the Data Availability section with a link to our Fig share dataset.

      (3) One huge drawback of the current format of the manuscript, where Methods come after Results, is that one has to really struggle to understand and appreciate Figures 2 and 3. I would strongly urge authors to have a shorter methods section embedded either as a subsection before the Results, or within the results section, as described in each figure. Perhaps a lot of my confusion also comes from not having known the previous paper, but it may be true for other readers, too. More specifically, for Figure 3, how is social weight for the experiments inferred? Figure 3 caption talks of mean difference, but one has to check the manuscript at multiple places throughout to really understand what this difference is (the definition) and how it is computed.

      While we agree that our manuscript includes the Methods section at the end, we tried to structure our text to tell a story (as stated in our manuscript title). To this end, we organized the text into short titled subsections that briefly convey the relevant background, identify the knowledge gap and outline our approach. We chose this structure to reserve the indepth details about model implementation and statistical analysis for the Methods.

      Additionally, we made sure to include references to methodological details in relevant segments of the Introduction and Results section so as to not bog down the reader by model complexities and keep a coherent narrative that delivers the message of our study. To further address the background of our work, we have now added a schematic of the original study in response to a previous comment by the reviewer, which we hope helps the reader better understand our work. We hope this explanation clarifies the intention behind our writing choice and decision to retain the current structure.

      (4) The introduction of the 'effective group size' concept is a potentially valuable and intuitive way to interpret chain dynamics, but the explanation is somewhat buried in the Results/Methods; I suggest highlighting it more prominently (e.g., in the Discussion or with a schematic in the Results) so readers can readily grasp this useful idea.

      We thank the reviewer that they found our concept of ‘effective group size’ useful. However, we do believe that we introduced the idea and rationale behind using this method in the Results: “We asked to what extent……to an equivalent group size” found in lines 305-314. We reserved a detailed description of this method in the Methods section. However, to further emphasize the importance of the concept we have now added a text: “This is further supported….. slightly better than two individuals.” found in lines 389-394 in the Discussion. 

      Minor comments:

      (1) Line 12: "what is the navigation mechanism(s)" - the (s) is a bit awkward. Either remove (s) or ask what the mechanisms are.

      We have fixed the typo to clarify the statement.

      (2) Line 78: "Such 'ratchet'-like improvements is referred to..." → "are referred to."

      We have fixed the typo to clarify the statement.

      (3) Figure 3 caption: "color scheme in the plots are same" → should be "is the same."

      We have fixed the typo to clarify the statement.

      (4) Clarification on reporting confidence intervals: The manuscript reports confidence intervals (CIs) for the model-based comparisons (e.g., Figures 2-3). This might seem unnecessary for simulation studies, since running more iterations can arbitrarily shrink uncertainty. However, in your case, the CIs are justified because the simulations are anchored to a finite empirical dataset (only 9 solo trajectories), sampled with replacement, and analyzed with mixed-effects models that incorporate bird identity as a random effect. Thus, the intervals reflect biological sample variability rather than simulation noise. This must be clarified.

      We have added a clarifying statement: “...and reflect the biological uncertainty in the empirical dataset, not simulation noise” found in lines 241 and 293 in the captions of Figures 2 and 3 in accordance with the reviewer’s comment. 

      (5) One part of the issue is that details of methods come much later in the manuscript, perhaps following journal style. Therefore, I recommend explicitly highlighting this rationale in the Results, so readers do not misinterpret the CIs as simply reflecting simulation error.

      We believe that the clarifying statements we have now added in the captions of Figures 2 and 3 should convey this interpretation of CIs and further changes in the Results may not be required.

      With these proposed changes we hope that we improved upon the clarity of our manuscript.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports a very interesting, novel and important research angle to add to the now enormous interest in how pesticides can be toxic to beneficial insects like the honey bee. Many studies have reported on how pesticides in standard use formulations show both lethality as well as sublethal negative effects on behavior and reproduction. The authors propose to use machine learning algorithms to identify new volatile compounds that can be tested for repellency. They use as input chemical structures that are derived from chemicals that have known repellent effects as identified in their initial behavioral assays.

      Strengths:

      The conclusion is that such chemicals specific to repelling bees and not pest insects (using the fruit fly as a model for the latter) can be identified using the ML approach. Have a list of such chemicals that can be rotated among in any field application would be a benefit because of the honey bees' ability to learn its way around any kind of stimulus designed to keep it from nectar and pollen, even when they may be tainted by pesticide.

      Weaknesses:

      The use of machine learning seems well-executed and legitimate. But this is beyond my expertise. So other reviewers can maybe comment more on that.

      The behavioral data report on the use of a two-choice assay for bees in small Petrie plates. Bess can feed from two small wells place of filter paper impregnated with control or the control containing a chemical. The primary behavior, for ex in Fig 2C, is the first choice by one of the five bees in the plate of which well to feed from. For some chemical compound, there seems to be a 50:50 choice, indicating no repellent effects. In other cases the first bee making the choice chose the control, indicating possible repellent effects of the test chemical. Choices in this assay were validated in a free flying assay.

      Concerns with the choice assay:

      50-70 microliters amounts to what one hungry bee will drink. Did the first bee drink most of it, such that measures of bait consumed reflect a single bee or multiple bees?

      The measure of lure consumed reflects multiple bees. We observed that the first bee did not empty the 70 ul of honey, allowing us to estimate honey consumption by several bees.

      How many bees were repelled to the control side? Was it just the one bee?

      All the bees in a group were repelled to the control side for repellents. Evaluating lack of honey consumption, also allowed us to repellency as well. As an example: if 100% honey is consumed on the control side meant that the bees were hungry, but if 0% honey was consumed on the repellent side, this meant that the bees were not hungry enough to drink from the honey on the repellent side.

      Were other measures considered? E.g. time to first approach; the number of bees feeding at different time points; the total number of bees observed feeding per unit time.

      Bees were cooled down to place them in the plates for the experiments. Therefore, time to first approach could also depend on how long it took the bees to warm up, which was not as relevant for our research question. Because bees can communicate where to find food sources to each other, we restricted ourselves to first choice, only, to get independent data points for each plate. However, we investigated whether the first cup the first bee chose was also the one it drank from, which was the case.

      Reviewer #2 (Public review):

      Summary:

      The search for new repellent odors for honey bees has significant practical implications. The authors developed an iterative pipeline through machine learning to predict honey bee-repellent odors based on molecular structures. By screening a large number of candidate compounds, they identified a series of novel repellents. Behavioral tests were then conducted to validate the effectiveness of these repellents. Both the discovery and the methodological approach hold value for related fields.

      Strengths:

      The study demonstrates that using molecular structures and a relatively small training dataset, the model could predict repellents with a reasonably high success rate. If the iterative approach works as described, it could benefit a wide range of olfaction-related fields.

      The effectiveness of the predicted repellents was validated through both laboratory and field behavioral tests.

      Weaknesses:

      The small size of the training dataset poses a common challenge for machine learning applications. However, the authors did not clearly explain how their iterative approach addresses this limitation in this study. Quantitative evidence demonstrating improvements achieved in the second round of training would strengthen their claims. For instance, details on whether the success rate of predictions or the identification of higher-affinity components would be helpful. Furthermore, given that only 15 new components were added for the second round of training, it is surprising that such a small dataset could result in significant improvements.

      The original repellency dataset was collected from multiple older studies, each with differences in assays for bee behavior, and using differing delivery and chemical concentrations. Moreover, the number of strong repellents were limited in number, and because they varied structurally from non-repellents in the dataset, the AUC appeared high. A smaller dataset result in unusual AI/ML model performance trends, as any algorithm is just a reflection of its training data. As a result, we found that the Round 1 predictions had a low success rate in behavior assays (~20%). Subsequently, even small amounts of data collected using one standard concentration and assay, could dramatically change the quality of the dataset, not just for structures of repellents, but also related structures that were not repellent. What we observe is a more complete representation of how repellents and non-repellents are distributed when adding just 15 chemicals. And the prediction success of Round 2 is more than doubled in repellent behavior assays at >50%. The initially observed performance gains with even small additions to the training dataset will stabilize and ultimately plateau due to the limits of the ML algorithm and/or chemical featurization technique. A more complex model, trained on a large dataset, may not be expected to benefit from a handful of additional examples, it is because the chemical feature distributions are already better approximations of the real world. To put simply, smaller datasets imply there is more to learn.

      It is also true that the size of the training dataset is important for AI/ML algorithms, Artificial neural network, for instance, are highly sensitive to noise and generalize poorly with limited data; the noise is amplified in these cases, and the solution—reducing the complexity of the model—impedes learning. Many algorithms like the decision trees and support vector machines featured in our paper can handle noise more efficiently and are suitable for smaller datasets in that they can still make reasonably successful predictions.

      Reviewer #3 (Public review):

      The manuscript of Kowalewski et al. titled "Machine learning of honey bee olfactory behavior identifies repellent odorants in free flying bees in the field" did machine learning to predict potential candidates for honeybee repellents, which may keep foraging bees from pesticides. This is a pilot research with strong significance in the research of olfactory behavior and in pest control. However, some major issues need to be addressed to enhance the manuscript's clarity, strength, and overall coherence.

      (1) Drosophila melanogaster is not considered as a true agricultural pest. The manuscript would be more compelling if using true pests, for example, Drosophila suzukii or others.

      Honeybees face a critical risk of lethal pesticide exposure when they drift from their designated orchards into adjacent blooming crops or honeydew-coated fields, where they encounter chemical treatments intended for insects like Citrus Thrips, Asian Citrus Psyllid, Alfalfa Weevil, Peach Twig Borer, Oriental Fruit Moth, Lygus Bugs , Cotton Aphids, Whiteflies, Corn Rootworm, Sunflower Head Moth, Vine Mealybug, Cucumber Beetles, and Sugarcane Aphids. Unfortunately, testing such pest species is outside the scope of this paper, but would deserve further research.

      (2) For repellency test, the result relies on dosage. An attractant may become a repellent at high concentration. Test a range of concentrations for each chemicals and compare responses between honeybees and pests.

      Testing freely flying honey bees in the field is an extremely challenging undertaking. Nevertheless, we added extra tests for two strong repellents, BR4.5 and BR3.81, at half dose of 0.05 mg/cm<sup>2</sup>. As expected, we found that there was a reduction in repellency. Testing more concentrations was not within the scope of this paper.

      (3) Be more clear about bee behavior data and their scores (as in Page 4 Results "184 training chemicals and later for 203 chemicals" and Page 10 Methods). I suggest that authors add a supplemental table with each chemical and its behavioral score, feature and reference - which ones were used for training, and which ones for testing. Also add your own behavioral test data (second input) to this table

      We have added the training chemical lists as Supplemental Tables S3 and S4.

      (4) The AUC in the first validation was 0.88 (Page 4), and in Page 5, "As expected, the computational validation results based on the AUC values, show an improvement." However, there were no other AUC values to show improvement.

      (5) Show plots of ROC AUC curves from Round 1 and Round 2.

      The round one ROC curve is shown in Figure 1. The round two ROC curves obtained from 3 different approaches (Author response image 1). The manuscript shows direct behavioral validation of chemicals identified, which is more important.

      Author response image 1.

      (6) In the Discussion, the authors mentioned olfactory receptors in honeybees. It would be useful to provide a general review of the current understanding of these receptors and their (potential) functions.

      We have expanded the discussion and pointed to a review on honey bee olfaction.

      (7) I suggest combining Fig. 1 and Fig. 3A as one pipeline for this work.

      (8) Figure 2C, some sample sizes are very small, such as 2-piperidone: 1 first-choice control vs 0 first-choice repellent? Increase sample size and do statistical analysis.

      Most compounds except the one pointed out, have small sample sizes because of the low percentage of bees participating in the trials. Consequently, we improved methods in round 2 and were able to increase participation from 68% to 81%, as described in the methods. However since the compound was included in the second round of training, we would like to report it anyway. This compound had the highest rate of non-participating plates compared to the others and there is a possibility that it it may affect both the stimuli.

      (9) In general, to assist reviewers, include line numbers to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other factors about the newly identified chemicals:

      Is there a toxicity index for these chemicals that can be listed? This would be important obviously for any humans around the repellents

      While toxicity index determination is outside the scope of this manuscript, it is possible to predict Rat LD50 values using the EPA Suite’s toxicity prediction tool. In a pilot test, the software predicted an average oral toxicity is ~3064mg/kg for the 18 repellents in Round 2, which is considered “Practically non-toxic” by the EPA.

      Was there any indication of bees being behaviorally impaired or dying when exposed to the chemicals in a confined space? Even exposure to intense floral perfumes in a confined space and be toxic over a longer period.

      Less than 5% of the 2225 honey bee died after the experiments, and none of the compounds showed a significantly higher level of dying, suggesting that the minor effect was not due to chemicals, but possibly due to handling steps (starving, chilling, recovery, etc).

      The 'plates not participating' measure indicates plates in which no bees fed on either choice. Is that correlated to the choice index? That is, when bees showed some repellency was it the case that often that led to no choice?

      Yes, non-participating plates were those, in which the bees did not drink any honey at all. The reason for this could have been that the bees were too cold and unable to heat up enough to participate in the trials, or that the chemical was so repellent, the bees did not want to drink any honey at all. Because we were not able to distinguish between these two reasons, we excluded plates in which the bees did not drink any honey at all from our dataset.

      It is unclear why the McNemar test was used.

      The McNemar test is used for hypothesis testing for paired dichotomous data. In our data file, we created two columns to report our first-choice results: “Control side first” and “Repellent side first”. When the first bee in a plate drank from the control side first, we added a 1 to the “Control side first” column and a “0” to the “Repellent side first” column. Because one control and one repellent-side honey pot were in the same Petri dish, the bees could only choose one side first, this meant it could not choose the other side at the same time. Consequently, our dataset consisted of paired samples, which were dependent from each other. We therefore split the dataset by Repellent candidate, and we used the paired -sample McNemar tests for non-parametric data. (Lachenbruch P.A. McNemar Test, Wiley StatsRef: Statistics Reference Online)

      The statistical result is not discussed in the text, only shown in the figure. And it looks to be significant only for one chemical and DEET. Yet on page 4 the end of the second paragraph, the authors write "For many of the tested compounds the bees preferred to visit the honey-water pots on the control side versus the repellent side,". That implies that they are not really using the test as a meaningful means for showing differences. If they are arguing only from trends, then that should be clearer in the text.

      We reported the p-values for each test we had used in tables in Figure 2C and S2. In the methods section we report which statistical tests were used to evaluate the data.

      There is no mention of attractant chemicals:

      Slessor and Winston used queen pheromone to attract bees to fields and improve pollination. Honey bees use the Nasonov pheromone to attract other bees to feeding locations. Could the addition of their chemical features change ML outcomes? This should be at least discussed.

      We thank the referee for the suggestion; however the focus this manuscript is repellents and therefore we restricted the background to that area of knowledge.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Releasing the dataset and code will benefit the readers interested in this study.

      The behavioral data are reported within the figures, tables, and supplementary. The computational code will be available upon request from the communicating author for non-commercial use.

      Figure 1, AUC curve, "AUC = 0.XX", should there be an actual value from the experiment?

      Added

      Page 4, "(Talbe S1)" should be placed in the next sentence, as "From the initial training set we identified 45 features that were considered important for predicting aversive valence (Table S1)."

      We have added this in the appropriate spot.

      Page 5, "As expected, the computational validation results based on the AUC values, show an improvement.". Please list the AUC values.

      Author response image 2.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Page 3: "they sense using a sophisticated olfactory system of >180 odorant receptor genes in the genome". In the cited Robertson & Wanner's paper, there are around 160 receptors, and 170 if pseudogenes are included.

      We thank the referee and have updated the numbers.

      (2) Page 4: "initially for 184 training chemicals and later for 203 chemicals (Table S1)." Table S1 is about features, not chemicals?

      We have moved the reference to an appropriate location.

      (3) Figure 2A: What is the control? Acetone or another solvent?

      Acetone, but it rapidly evaporates before the time of experiment.

      (4) Figure 2A: What does asterisks mean?

      Statistically significant.

      (5) Figure 3: When you added your own testing data as a second input for Round 2, put details about these data: chemical names, preference scores... Also, are Round 2 data (Round 1 plus your own) were also split as 90:10 into training and testing partitions?

      Yes, the validation was performed on the updated data set including the new chemicals.

      (6) Figure 3D: Is asterisk at correct location? What does it mean?

      Means that BR3.15 was significantly different from BR4.5

      (7) Figure 4D: "4D" in legend is missing. Also, "... tested at the regular dose (0.1mg/cm2) and half dose (0.05mg/cm2)". In the panel, it is only 0.05mg/cm2.

      Added

      (8) Table S2 is the same as Fig. 2C? Remove one.

      We have deleted Table S2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) While the manuscript is written for a scientific audience, the authors are likely aware that findings like this will be of broad appeal to the field of neurology, where treatments for memory loss are desperately needed. For this reason, the authors could consider including a statement regarding an interpretation of this meta-analysis from a clinical standpoint. Statements such as 'safe and effective' imply a clinical indication, and yet the manuscript does not engage with clinical trials terminology such as blinding, parallel arm versus crossover design, and trial phase. While the authors might prefer not to engage with this terminology, it can be confusing when studies delivering intervention-like five days of consecutive TMS (e.g., Wang et al., 2014) are clustered with studies that delivered online rhythmic TMS, which tests target engagement (e.g., Hermiller et al., 2020). While the 'sessions' variable somewhat addresses the basic-science versus intervention-like approach, adding an explicit statement regarding this in the discussion might help the reader navigate the broad scope of approaches that are utilized in the meta-analysis.

      We appreciate the suggestion to enhance interpretability of our report by broader audiences. First, to avoid confusion, we have eliminated “safe” and “effective” descriptors from the main summary of findings in the Abstract (pg. 1) and Discussion (pg. 6). Second, we now describe that reviewed studies included those categorized as traditional clinical trials, as well as non-clinical studies that generally follow clinical trial designs (i.e., multi-day intervention-like studies), in addition to more basic-oriented studies that are geared towards target engagement (Introduction, pg. 2). Third, we now clarify that the Design and Control factors (Figure 3) correspond to fairly standard distinctions in the clinical trials literature and were intended to capture major study design factors choices that are used in both clinical-trial and non-trial studies (Methods, pg. 9; Table S1). Finally, we now clarify that future clinical trials would be needed to evaluate HITS for any specific indication, and that our findings motivate such investigations but do not conclusively indicate efficacy for any given indication (Abstract, pg. 1; Discussion, pg. 7).

      Reviewer #1 (Recommendations for the authors):

      (1) The color scheme of Figure 1 was a bit confusing. All of the colors used for the flagged regions were incredibly similar. At first glance, it looks like the hippocampus was targeted directly due to the subtle color difference. Could the authors use colors that are more different? Similarly, zooming into the specific locations shows blue dots encompassed by teal. I am not sure what I am looking at here.

      We have updated the figure for clarity.

      (2) Given the broad appeal of the current study, I would encourage the authors to include a brief visual depiction of "HITS." This could help the more casual reader to understand the general approach.

      We have included this in Figure 1A.

      Reviewer #2 (Public review):

      (1) While the introduction centers on the role of the hippocampus in episodic memory and posits hippocampal neuromodulation by TMS as causative, the true mechanism may be more complex. Clean hippocampal lesions in primates cause focal loss of spatial and place memory, and I am aware of no specific evidence that the hippocampus does more than this in humans. Moreover, there is evidence that lateral parietal TMS also reaches neighboring temporal lobe regions, which contribute to episodic memory. The hippocampus may, therefore, be a reliable deep seed for connectivity-based targeting of the episodic memory network, but might not be the true or only functional target.

      We regret to have implied that we think the hippocampus is the true or only functional target. We agree with the reviewer that the hippocampus is “a reliable deep seed for connectivity-based targeting of the episodic memory network” and that the specific locus/loci of the HITS effects and mechanisms are not yet clear. We now emphasize that although hippocampus is used to define the targeted network, effects of TMS are likely distributed throughout the network, citing relevant studies that have shown that brain activity changes due to HITS are certainly not restricted to the hippocampus (Introduction, pg. 2).

      (2) The meta-analysis combines studies with confirmation of targeting and target-network engagement from fMRI and studies without independent evidence of having stimulated the putative target (e.g., Koch et al). That seems like a more important methodological distinction than merely the use of any individual targeting method. In my experience, atlas-based estimates are at least as accurate as eyeballing cortical areas in individuals. Hence, entering individual functional targeting as a factor might reveal an effect on efficacy.

      Our current definition of the “Targeting” factor appears to satisfy this concern. That is, we distinguish studies that used “individual functional targeting” (i.e., resting-state fMRI or DTI connectivity in each individual to select the target) from those that did not (i.e., atlas or other group-average approach). Notably, the Targeting factor modulation effect failed to survive correction for multiple comparisons. We think this satisfies the reviewer criticism, unless the reviewer is suggesting that we categorize studies based on whether they included evaluation of target engagement (e.g., tested for change in fMRI activity or connectivity of the network due to HITS) versus those that measured only behavioral outcomes. We did not include this distinction as a factor, as our analysis focuses on behavioral effects of HITS, and it is not clear what the neural effects would have been in studies in which they were not measured. Notably, we are providing the full raw dataset of effect sizes in a public repository with our final version of record, such that any other categorization schemes could be assessed by others.

      (3) The funnel plot and Egger's regression for episodic memory outcomes suggested possible bias, and the average sample size of 23 is small, contributing to the likelihood of false positive results. It would be informative, therefore, to know how many or which studies had formal power estimates and what the predicted effect sizes were.

      Regarding the average sample size of 23, we note that we used Hedges’ g for the effect size measure because it corrects for bias associated with small samples (pg. 10). Further, small sample sizes contribute to noisy estimates of true effects, allowing outliers to contribute to false positives and low power to contribute to false negatives, but without any reason to systematically yield bias towards false positives. Regarding potential publication bias, although we cannot rule this out based only on the statistics, we think that bias against publication of negative results is unlikely. First, HITS experiments are time consuming and expensive, and most in the field seem to be motivated to publish, whatever the outcome. Second, the notion of memory enhancement via brain stimulation is controversial, and groups have certainly been motivated, if not overly eager, to publish “failure to replicate” studies for HITS (e.g., the failure-to-replicate publication by Hendrikse et al. 2020, which was then re-analyzed by many of the original authors to arrive at different conclusions in Cash et al. 2022). Given these considerations, we think that it is very unlikely that publication bias had any major impact on our conclusions, but of course it cannot be conclusively excluded. Finally, we note that our finding of HITS selectivity for recollection enhancement is likely not affected by publication bias, as this selectivity versus other memory and non-memory outcomes was found only within published studies (i.e., it is very unlikely that publication bias would have led researchers to withhold publication of studies that found effects of HITS on recognition but not on recollection).

      (4) In the Discussion, the authors might provide a comparison between the effect size for memory improvement found here with those reported for other brain-targeted interventions and behavioral strategies. It may also be worthwhile pointing out that HITS/memory is one of the very few, or perhaps the only, neuromodulatory effects on cognition that has been extensively reproduced and survived rigorous meta-analysis.

      We now emphasize that this is, to our knowledge, the only neuromodulatory effect on cognition that is selective, has been extensively reproduced, and survived rigorous meta-analysis (Discussion, pg. 6). However, we wish to avoid the clinical overinterpretation of our findings that might result if we were to compare directly to effect size estimates for other current therapies, which have been evaluated for specific clinical indications. For example, antibody and pharmacological interventions for Alzheimer’s dementia typically have been associated with similar effect sizes to our estimate for HITS. However, those estimates derive from systematic review of randomized controlled trials measuring clinically relevant outcomes at relatively long delays, whereas the HITS studies we review include a mix of controlled and uncontrolled trials, vary in whether clinical outcomes were assessed, and mostly assessed outcomes at shorter delays. Thus, it could be misleading to directly compare the effect sizes. We instead continue to highlight that the HITS effects are promising and warrant rigorous testing for any given clinical indication.

      (5) The section of the Discussion on specificity compares HITS to transcranial electrical stimulation without specifying an anatomical target or intended outcome. A better contrast might be the enormous variety of cognitive and emotional effects claimed for TMS of the dorsolateral prefrontal cortex.

      We now also note that TMS of lateral frontal cortex has not been associated with similarly high specificity (Discussion, pg. 6). Note however that we cannot exclude anti-depressant or other psychological effects of HITS, as such outcomes were not consistently assessed in HITS studies and so were not included in our analyses.

      (6) With reference to why other nodes in the episodic memory network have not been tested, current flow modeling shows TMS of the medial prefrontal cortex is unlikely to be achievable without stronger stimulation of the convexity under the coil, in addition to being uncomfortable. The lateral temporal lobe has been stimulated without undue discomfort.

      We now additionally indicate that medial prefrontal stimulation may be ineffective given conventional TMS (Discussion, pg. 7). However, we are aware of no studies that have stimulated the portion of middle temporal gyrus that shows strong connectivity with hippocampus. We have tried this location, which positions the coil on or slightly above the ear and bordering on the temple area that is very sensitive to most. We were not able to minimize pain/discomfort for most subjects in pilot experiments, and so had to abandon it. Perhaps others have succeeded? If the reviewer has any specific references that could be included we would be happy to add them and update this section accordingly.

      (7) Finally, a critical question hanging over the clinical applicability of HITS and other neuromodulation techniques is how well they will work on a damaged substrate. Functional and/or anatomical imaging might answer this question and help screen for likely responders. The authors' opinion on this would be informative.

      We appreciate this point but don’t think there are enough data to assess the level of substrate damage needed to frustrate any stimulation benefits. The only thing we can say is that HITS was equally effective for mild to moderate Alzheimer’s dementia as it was for other non-neurodegenerative groups (nonsignificant effect of the Population factor, Figure 3B), suggesting that whatever degree of damage present in that group is insufficient to prevent the stimulation effects. We now highlight this point and raise the issue that, presumably, some level of damage would render HITS ineffective (Discussion, pg. 8).

      Reviewer #3 (Public review):

      (1) My only significant concern is how studies are categorized in the 'Timing' factor (when stimulation is applied). Currently, protocols in which TMS is administered across days are categorized as 'pre-encoding' in the Timing factor. This has the potential to be misleading and may lead to inaccurate conclusions. When TMS is administered across multiple days, followed by memory encoding and retrieval (often on a subsequent day), it is not possible to attribute the influence of TMS to a specific memory phase (i.e., encoding or retrieval) per se. Thus, labeling multi-day TMS studies as 'pre-encoding' may be misleading to readers, as it may imply that the influence of TMS is due to modulation of encoding mechanisms per se, which cannot be concluded. For example, multi-day TMS protocols could be labeled as 'pre-retrieval' and be similarly accurate. This approach also pools results from TMS protocols with temporal specificity (i.e., those applied immediately during encoding and not on board during memory testing) and without temporal specificity (i.e., the case of multi-day TMS) regarding TMS timing. Given the variety of paradigms employed in the literature, and to maximize the utility/accuracy of this analysis, one suggestion is to modify the categories within the Timing factor, e.g., using labels like 'Temporally-Specific' and 'Temporally Non-specific'. The 'Temporally-Specific' category could be subdivided based on the specific memory process affected: 'encoding', 'retrieval', or 'consolidation' (if possible). I think this would improve the accuracy of the approach and help to reach more meaningful conclusions, given the variety of protocols employed in the literature.

      We agree in principle with this criticism and think that the most straightforward way to address it is to relabel the “Pre-Encoding” category as “Pre-Task”. The issue with labeling/considering single-session stimulation delivered immediately before encoding as “Pre-encoding” is that this makes the assumption that this stimulation doesn’t also affect retrieval (i.e., is temporally specific). We do not have certainty about the timecourse of how a single session of stimulation affects brain activity. We think the “Pre-Task” label and interpretation is the best way to address this, to avoid suggesting that we are confident about the timecourse/selectivity of stimulation effects. Notably, the “Sessions” factor directly compares among designs that delivered stimulation in a single session versus in multiple consecutive sessions, and was a nonsignificant modulator. Thus, our analyses already compare studies that are relatively temporally specific versus those that, likely, are less so. In addition to relabeling, we have also added clear caveats to address the interpretive constraint imposed by the unknown timecourse of stimulation effects (Discussion, pg. 6-7) and revised the Abstract to reflect this change.

      (2) As the scope of the meta-analysis is limited to TMS applied to parietal or superior occipital cortex, it is important to highlight this in the Introduction/Abstract. The 'HITS' terminology suggests a general approach that would not necessarily be restricted to parietal/nearby cortical sites.

      This was previously highlighted only in the Methods and Discussion (with a Discussion paragraph dedicated to the issue of target selection; see also Comment 6 from Reviewer 2). We now also note this in the Introduction (pg. 2) and Abstract.

      Minor:

      (1) To reduce the number of study factors tested, data reduction was performed via Lasso regression to remove factors that were not unique predictors of the influence of TMS on memory. This approach is reasonable; however, one limitation is that factors strongly correlated with others (and predict less unique variance) will be dropped. This may result in a misrepresentation, i.e., if readers interpret factors left out of this analysis as not being strongly related to the influence of TMS on memory. I do see and appreciate the paragraph in the Discussion which appropriately addresses this issue. However, it may be worth also considering an alternative analysis approach, if the authors have not already done so, which explicitly captures the correlation structure in the data (i.e., shown in Figure S2) using a tool like PCA or an appropriate factor analysis. Then, this shared covariance amongst factors can be tested as predictors of the influence of TMS - e.g., by testing whether component scores for dominant PCs are indeed predictive of the influence of TMS. This complementary approach would capture rather than obfuscate the extent to which different factors are correlated and assess their joint (rather than independent) influence on memory, potentially resulting in more descriptive conclusions. For example, TMS intensity and protocol may jointly influence memory.

      We argue that feature selection via Lasso regression is a better approach for our research question than PCA, factor analysis, or other latent variable methods. The main reason is that PCA would sacrifice the interpretability of our findings with respect to the design of future experiments using or testing HITS. That is, because PCA creates composite components that are linear combinations of multiple variables, we would lose the ability to provide clear, actionable guidance to researchers about which specific study design choices (e.g., stimulation intensity, protocol type, timing) influence memory outcomes. Given that a major goal of our meta-analysis is to inform future experimental design, we believe that it is essential to maintain interpretability of the individual factors that must be decided when designing a study. Regarding factor analysis, this approach would require making a priori theoretical decisions about how to group individual moderators, which could introduce subjective bias into the analysis and would introduce other complications such as a need for validation of the resulting factor scores. We believe that the exploratory nature of our investigation, examining which among many possible study design factors substantially determine TMS efficacy, is better suited to a data-driven selection approach like Lasso. While the reviewer correctly notes that Lasso may drop factors that are correlated with stronger predictors, this feature can be considered advantageous in terms of identifying factors for inclusion in future study designs. That is, this can help identify the most parsimonious set of independent predictors, such that researchers can focus on the study design elements that matter most when controlling for other factors. Notably, we provide the table of factor relationships (Figure S2) so that interested readers can inspect how dropped factors were related to those that were retained.

      It is also important to note that we have provided the full dataset with our resubmission, which has been deposited in Dryad with a link in the Data Availability section (pg. 15). Thus, others are free to explore alternative analytical approaches should they wish to examine the data from different perspectives or to answer different questions.

      (2) Given the specific focus on TMS applied to parietal cortex to modulate hippocampal and related network function, it would be fruitful if the authors could consider adding discussion/speculation regarding whether this approach may be effectively broadened using other stimulation methods (e.g., tACS, tDCS), how it may compare to other non-invasive brain stimulation methods with depth penetration to target hippocampal function directly (transcranial temporal interference, or transcranial focused ultrasound), and/or how or whether other stimulation sites may or may not be effective.

      We briefly discuss a meta-analysis of tACS studies which reported nonspecific effects, including for parietal targets overlapping those used for HITS (Discussion, pg 6). We briefly speculate about how tES effects remain mechanistically uncertain. We are afraid that further speculation about other stimulation modalities and targets would be beyond the scope of this focused meta-analysis, given especially the few datapoints for newer approaches such as TI or tFUS.

      (3) Studies were only included in the meta-analysis if they contained objective episodic memory tests. How were studies handled that included both objective and subjective memory, or other non-episodic memory measures? For example, Yazar et al. 2014 showed no influence of TMS on objective recall, but an impairment in subjective confidence. I assume confidence was not included in the meta-analysis. Similarly, Webler et al. 2024 report results from both the mnemonic similarity task (presumably included) and a fear conditioning paradigm (presumably excluded). Please clarify in the methods how these distinctions were handled.

      Studies were included in our meta-analysis if they included at least one objectively scorable test of episodic memory. We only included objectively scorable test performance in our analysis, excluding scores from any other subjective measures if they were also reported. This is now clarified in Methods (pg. 9).

      (4) The analysis comparing memory to non-memory measures is important, showing the specificity of stimulation. Did the authors consider further categorizing the non-memory tasks into distinct domains (i.e., language, working memory, etc.)? If possible, this could provide a finer detail regarding the selectivity of influences on memory vs. other aspects of cognition. It is likely that other aspects of cognition dependent on hippocampal function may be modulated as well, i.e., tasks with high relational/associative processing demands.

      This is an interesting idea, but it is beyond our expertise to categorize these other tasks based on the nature of processing demands that they capture. Note that the task names are provided in the data table that we are making available online with our submission of record (via Dryad), such that other groups could address this question if interested.

      (5) In the analysis of the Intensity factor, how were studies using Active (rather than resting) MT categorized? Only resting MT is mentioned in Table S1. This is important as the original theta-burst TMS protocol from Huang et al. 2005 determines intensity based on Active Motor Threshold.

      MT was resting/passive in all reviewed studies except for one (Tambini et al. 2018), which used 80% of active MT. We categorized this as <100% MT for the Intensity factor, as it was <100% of MT as defined in that study. Although one could make the argument that 80% AMT might instead correspond to 100+% RMT, this change would have very little influence on our results or conclusions. We now clarify this in Table S1.

      (6) Is there a reason why the study by Koen et al. 2018 (Cognitive Neuroscience) was not included? TMS was performed during encoding to the left AG, and objective memory was assessed, so it would seemingly meet the inclusion criterion.

      The failure to include Koen et al. 2018 was our error. Koen et al. 2018 is the only study that used “online” stimulation, delivered during the trials when memoranda were displayed for encoding in the task. In contrast, all other reviewed studies delivered “offline” stimulation either before the memoranda was presented (“Pre-Task”) or after the encoding period but before retrieval (“Post-Encoding”). Therefore, categorization for the “Timing” factor would be problematic for its inclusion in the main analysis. We therefore now include Koen et al. 2018 in the “Supplementary Results” section as well as the corresponding main Results section on “Similar outcomes in studies that were excluded from meta-analysis”. We also note in the relevant discussion that “online” stimulation, as done in Koen et al. 2018, is typically considered disruptive (e.g., Beynel et al. 2019 Neuroscience & Biobehavioral Reviews; Yeh & Rose 2019 Frontiers in Psychology), which should be taken into account when considering the findings of Koen et al. 2018 relative to other reviewed studies that used “offline” designs.

      (7) It would be helpful to briefly differentiate the current meta-analysis from that performed by Yeh & Rose (How can transcranial magnetic stimulation be used to modulate episodic memory?: A systematic review and meta-analysis, 2019, Frontiers in Psychology) (other than being more current).

      Beyond being more current and therefore including many more studies in which stimulation targets were based on hippocampal connectivity (which tend to have been published more recently), the differences with Yeh & Rose 2019 are subtle. Our review focuses on assessment of network targeting and whether effects were specific to episodic memory versus other tasks, which differs somewhat from the focus of Yeh & Rose 2019. The main difference in conclusions likely derives from there being more network-focused memory TMS experiments now than were available for Yeh & Rose’s review. We also differentiate episodic memory into recollection versus other components to test specificity and analyze modulation by many study design factors relevant to HITS studies that were not emphasized in Yeh & Rose’s review. Note that we now cite Yeh & Rose for those interested in potential differences.

      (8) For transparency and to facilitate further understanding of the literature and potential data re-use, it would be great if the authors consider sharing a supplementary table or file that describes how individual studies/memory measures were categorized under the factors listed in Table S1.

      As promised in our original submission, we are providing the full data table, including how individual studies and memory measures were categorized, as an open dataset in Dryad. The Dryad dataset is cited in “Data availability” (pg. 15).

      Reviewer #3 (Recommendations for the authors):

      Please explicitly state in the Methods (Meta-analysis of effect modifiers section) that the criteria used for categorizing each measure into a factor (e.g., probing Recollection, Recognition, etc.) are fully described in Table S1; this will help readers to find these details (it took me a while!).

      This is now emphasized (pg. 10).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      (1) "The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place."

      We thank the Reviewer for this valuable feedback and we agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating a hypothesis of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We contextualized our statements about stable binders and limited our claims to stating that the protein-peptide complex is stable within 1 µs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 24 µs without peptides and 66 µs with peptides. Additionally, we included a plot showing the distribution of groove width across all replicas.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) The title was changed from “Unfolded polypeptides can stably bind to hIRE1α cLD dimer” to “Unfolded polypeptides bind to hIRE1α cLD dimer surface”

      Addition to the text. (Figure 15 A legend) “(A) Distributions of the groove width of peptide-bound cLD dimers throughout all simulations performed. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (2) Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, used orthogonal methods, specifically MM/PB(GB)SA calculations, to calculate binding free energies from existing trajectories. We added predictions of all the peptides using AlphaFold 3, to confirm the binding region. Importantly, we now provide experimental results to assess the binding affinity of cLD dimer mutants E102R and Y161R.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. 16A). We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Figure 16 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT). (B) Difference in enthalpy (enthalpy of binding, ∆H) as an estimate of the binding free energies of unfolded polypeptides to hIRE1α cLD dimer derived from MM/PBSA calculations of our peptide simulations.”

      Addition to the text. (Figure 4 G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Results section: Point mutations destabilize unfolded peptide binding to cLD) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102 K<sub>1/2</sub>= 6.35 µM and Y161R K<sub>1/2</sub>= 5.4 µM, Supplementary Table 3) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 3), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary Table 3)

      Reviewer 2 (Public review):

      (1) Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We added more references to the methods for computational information in the main text.

      (2) More quantitative analysis in addition to visual structures.

      We added an uncertainty estimate for the HDX calculations using bootstrapping and included additional information on bond distances for E102 and Y161. We also incorporated time-series data showing the distance of the peptide from the groove across all replicas.

      Addition to the text. (Figure 1C legend) “(C) The deuterated fraction obtained from experimental results (dashed line, shaded area indicates the error we calculated from bootstrapping) published by Amin-Wetzel et al. and the fraction computed from MD simulations (solid lines, blue for TIP3P water and orange for TIP4PD water) for the PDB and AF model at incubation time point 0.5 min. This time point corresponds to experimental incubation times, not MD simulation time. Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping. Below each absolute value plot, we report the discrepancy, which is defined as the difference between the simulated and experimental deuterated fractions, with the shaded area indicating the corresponding error.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      Reviewer 3 (Public review):

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations, so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer’s thoughtful comment. As noted in our response to Reviewer 1, we addressed the concern about sampling by applying orthogonal methods and experimental techniques. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Some enhanced sampling or path sampling simulations may be carried out to identify the peptides’ binding and unbinding mechanisms to the protein. This can show whether the disordered peptides studied in this work do indeed bind to the protein.

      We thank the Reviewer for this constructive criticism. We acknowledge the limitations associated with investigating binding and unbinding mechanisms of disordered peptides within the time scales accessible to our equilibrium simulations. However, the primary objective of our study was to sample and characterize a plausible binding pose at the center of the cLD dimer. We wanted to understand if unfolded model peptides require an open groove able to contain them to bind to IRE1’s core luminal domain or if binding also in the absence of an open groove.

      Enhanced sampling is, of course, an important strategy to overcome the limits of equilibrium simulations. However, we note that implementing enhanced sampling approaches in this system poses significant challenges due to its large size and the complexity of peptide–protein interactions, which cannot be easily captured using a limited set of collective variables. We decided that a thorough application of enhanced sampling would therefore constitute a separate study. Instead, we decided to validate our simulations in two ways: 1) we ran a new set of free energy calculations, and 2) we tested key predictions in experiments, adding significant new data to strengthen the conclusions of our manuscript.

      To evaluate whether the binding free energies of MPZ-derived peptides to human IRE1α cLD dimers are consistent with experimentally reported binding constants, we employed the MM/PBSA (Molecular Mechanics/Poisson–Boltzmann Surface Area) method. Calculations were performed over the final 250 ns of each simulation replica using the Single Trajectory Protocol (STP), which avoids the need for additional simulations. This approach provides an estimate of the effective binding free energy (i.e., enthalpy of binding) by accounting for bonded and non-bonded interactions, as well as solvation contributions. The entropic contribution, being computationally more demanding and subject to additional approximations, was not included. Binding enthalpies were obtained for MPZ1-N (in different initial orientations), MPZ1-C, MPZ1-N-2X, and MPZ1-N-2X-RD. The results indicated small differences in effective binding energies between the shorter peptides (MPZ1-N and MPZ1-C), whereas MPZ1-N-2X exhibited the lowest binding energy and MPZ1-N-2X-RD the highest, consistent with experimental trends. These findings support the reliability of our model and sampling strategy as a framework for analyzing peptide binding conformations to cLD.

      We identified residues E102 and Y161 as key contributors to the binding of unfolded peptides in our simulations. Contact analysis revealed these residues as binding hotspots, centrally located within the observed interaction regions. To probe their relevance, we conducted simulations of cLD dimers with single arginine mutations in these residues, aimed at disrupting these hotspots through charge repulsion. These simulations revealed increased instability of the MPZ1N2X on the cLD dimer surface. We further validated these findings experimentally using fluorescence anisotropy assays. Fluorescently labeled MPZ1N-2X was titrated with purified cLD mutants (E102R and Y161R), and anisotropy measurements were fitted to derive  K<sub>1/2</sub> values. Both mutations resulted in approximately a two-fold reduction in binding affinity relative to the wild-type cLD, confirming the importance of these residues in stabilizing peptide binding.

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “Thus, we investigated how the point mutations of two key residues, E102R and Y161R, would affect peptide binding by simulating the cLD mutant in complex with MPZ1N-2X (Fig. 4C-E). We initialized the systems in the pose described for the other peptide-cLD systems described earlier (Fig. 3B, t = 0 µs). In simulations of the wild-type (WT) cLD dimer, the peptide generally remained near the center (Fig. 4C,F). By contrast, MPZ1N-2X displayed reduced binding to E102R, fully dissociating in one TIP4P-D replica (Fig. 4E,F). A similar trend was observed for Y161R, where one partial dissociation event occurred (Fig. 4D,F). Comparative analysis of MPZ1N-2X contact sites on the WT and mutant cLD dimers (Supplementary Fig. 17B-D) revealed that, in the presence of mutations, the peptide engages a broader surface region rather than remaining centrally localized, while forming fewer contacts with the specific residues (Supplementary Fig. 18A-B).”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102  K<sub>1/2</sub>= 6.35 µM and Y161R  K<sub>1/2</sub>= 5.4 µM, Supplementary Table 1) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 1), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4 legend) “(E) Side view snapshot after 1 µs of simulation of E102R hIRE1α cLD dimer (gray) in complex with MPZ1N-2X (orange). The amino acid R102 on both monomers is represented in magenta sticks. (F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Methods section: Binding free energy calculations (MM/PBSA)) “The binding free energy of noncovalently bound complexes of human IRE1 cLD and peptides was calculated with MM/PBSA (Molecular mechanics/PoissonBoltzmann Surface Area) method via gmx_MMPBSA (version 1.6.4)[1, 2]. The Poisson-Boltzmann method was used to estimate the electrostatic contribution to solvation free energy as recommended for data obtained with the CHARMM force field. The contribution of the entropic term was omitted, obtaining effective binding free energy values, or enthalpy of binding (∆H). We used the Single Trajectory Protocol (STP), using the cLD-peptide simulations as input. The calculations were performed on the last 250 ns of each replica. Single-term total non-polar solvation free energy (inp = 1) was used. The charmm_radii (PBRadii= 7) was used to build amber topology files [3]. The default parameters were applied for other terms.”

      Addition to the text. (Methods section: Protein purification) “To express hIRE1α LD (24-443) human cDNA sequences were cloned into pET47b(+) to create a coding sequence with N-terminal His6-tag. Mutations of hIRE1α LD were introduced by overlap extension PCR and restriction cloning into pET47b(+). For expression of the proteins, the plasmid of interest was transformed into Escherichia coli strain BL21DE3* RIPL (Agilent Technologies). Cells were grown in Luria Broth until OD600=0.6-0.8. Protein expression was induced with 0.6 mM IPTG, and cells were grown in 20°C overnight. For purification, cells after harvesting were resuspended in Lysis Buffer (50 mM HEPES pH 7.2, 400 mM NaCl, 20 mM imidazole, 5% glycerol, 5 mM β-mercaptoethanol) and were lysed in Constans Systems cell disruptor at 25 000 psi. The supernatant was collected after centrifugation for 45 minutes at 48000×g in 4°C. Supernatant was loaded onto Ni-NTA column (Cytiva) and the protein eluted with a linear gradient of imidazole from 20 to 500 mM. Fractions containing the protein were diluted 1:8 with anion exchange wash buffer (50 mM HEPES pH 7.2, 5 mM β-mercaptoethanol), loaded onto HiTRAP-Q ion exchange column (Cytiva) and eluted with a linear gradient from 50 mM to 1 M NaCl. Afterwards, the His6tag was removed by cleavage with Precission protease (GE Healthcare, 1 µg of enzyme per 100 µg of protein). The cleavage was performed overnight in 4°C. The protein sample after cleavage was loaded onto a Ni-NTA column, and the flow-through containing protein without the tag was collected. The protein was further purified on a Superdex 200 10/300 gel filtration column equilibrated with Buffer A (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT). Protein concentrations were determined using extinction coefficient at 280 nm predicted by the Expasy ProtParam tool (http://web.expasy.org/protparam/).”

      Addition to the text. (Methods section: Fluorescence anisotropy) “For fluorescence anisotropy measurements, the MPZ1-N-2X peptide attached to 5 carboxyfluorescein (5-FAM) at its N-terminus was obtained from GenScript at >95% purity. Binding affinities of hIRE1α LD mutants to FAM-labeled peptides were determined by measuring the change in fluorescence anisotropy on a Tecan CM Spark Micro Plate Reader with excitation at 485 nm and emission at 525 nm with increasing concentrations of hIRE1α LD variants. Measurements were performed in Buffer A supplemented with Tween 20 (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT, 0.025% Tween 20). Fluorescently labeled peptides were used in a concentration of 90 nM. The reaction volume of each data point was 25 µL and the measurements were performed in 384-well, black flat-bottomed plates (Corning) after incubation of peptide with hIRE1α LD variants for 30 min at 25◦C. Binding curves were fitted using Prism Software (GraphPad) using the following equation: F<sub>bound</sub> = r<sub>free</sub> +( r<sub>max</sub>r<sub>free</sub>)/(1+10((Log K<sub>1/2</sub> −x)·n<sub>H</sub>)), where F<sub>bound</sub> is the fraction of peptide bound, r<sub>max</sub> and r<sub>free</sub> are the anisotropy values at maximum and minimum plateaus, respectively. n<sub>H</sub> is the Hill coefficient and x is the concentration of the protein in log scale. Curve-fitting was performed with minimal constraints to obtain K<sub>1/2</sub> values with high R<sup>2</sup> values. However, as this equation does not consider the equilibria between hIRE1α LD dimers/oligomers, these apparent K<sub>1/2</sub> values do not reflect the dissociation constant.”

      (2) Wherever possible, conclusions related to binding affinity should not be drawn from single unbinding events. For example, the title of Figure 4, "Single point mutation of cLD alters the binding affinity of unfolded peptide," should be softened. Similar changes should be made throughout the manuscript where such claims have been presented.

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have adjusted the text to remove or soften conclusions related to binding affinity that were based on single unbinding events in the MD simulations.

      Addition to the text. (Figure 4 title) “Single point mutations of cLD alter the binding of unfolded peptide MPZ1N-2X.”

      Addition to the text. (Results section title: Unfolded polypeptides can stably bind to hIRE1α cLD dimer) “Unfolded polypeptides bind to hIRE1α cLD dimer surface.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1αα cLD dimer surface) “Our goal was to elucidate a potential binding pose and identify the relevant features of unfolded proteins and the cLD that affect the binding.”

      Reviewer #2 (Recommendations for the authors):

      (1) A table of all simulated trajectories, including simulation conditions, number of replicas, box size, number of atoms, equilibration length, recording time step, number of frames for further analysis.

      We thank the Reviewer for this helpful suggestion. We have added a summary table of all simulations, including the requested details, to the Supplementary Information (Table 1).

      Addition to the text. (Supplementary figures and tables: Table 2)

      (2) The current NVT equilibration time was 0.125ns, and then no productive NPT simulations were mentioned as equilibration. Even though this is a simulation of mostly folded structures, it still takes some time for these amino acids to relax within the force field.

      We thank the Reviewer for this constructive comment and acknowledge the validity of the concern. However, our simulations were extensively sampled, and equilibration was achieved within the first 50 ns of the production runs. Therefore, the segments of the trajectories from which we draw conclusions correspond to equilibrated states (see RMSD analysis, Figure 1). Additionally, binding free energy calculations (MM/PBSA) were carried out on the last 250 ns of the simulation replicas.

      (3) At least three histograms were presented in Figure 2C, which I guess is from multiple simulations, and does not seem to be discussed.

      We thank the Reviewer for pointing out the lack of reference to Figure 2C. We added the correct reference to the text where the groove width of luminal domains of human and yeast is discussed.

      Author response image 1.

      RMSD analysis of human IRE1_α_ cLD dimer simulated in complex with unfolded peptides.

      Addition to the text. (Results section: The putative groove of human IREα cLD is dynamic but unable to contain peptides ) In simulations of the dimeric structures, the average groove width was 7.3 ± 0.1 Å for the human cLD and 8.9 ± 0.1 Å for the yeast cLD, averaged over three TIP3P and three TIP4P-D replicas per system (Fig. 2C).

      (4) The comment regarding the CHARMM force field on Page 6 is not justified. Actually the force field the authors used (CHARMM36m, Jing et al Nat Methods 2016) did include scaling of TIP3P LJ parameters to correctly capture the dimensions of the intrinsically disordered proteins (IDPs). However, the authors cited a couple of examples of literature of previous versions of CHARMM force fields and commented that it cannot capture IDP dimensions with TIP3P.

      We thank the Reviewer for pointing out this source of confusion. We cited the main papers of CHARMM as [4, 5], which were misleading, and following the Reviewer’s advice, we removed these citations.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “Current all-atom force fields used in MD simulations are mainly designed to reproduce the dynamics of folded and globular proteins [6].”

      (5) I am fine that the authors used TIP4PD with CHARMM36m, but caution should be taken for such a combination of protein and water force fields. Note that when optimizing force fields for IDPs, one often has to balance protein-water interactions by either enhancing protein-water interactions, enhancing water dispersions, or reducing protein-protein interactions. So, all such optimization is dependent on both protein and water force fields. TIP4PD was designed to pair with Amber99sb-ildn or, most recently, Amber99sb-disp instead of CHARMM36m. This could result in rescaling of LJ parameters.

      We thank the Reviewer for raising this issue. We argue that the TIP4P-D water model has been used in combination with the CHARMM36m force field [7] and has been shown to yield satisfactory results for disordered regions.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “The TIP4P-D water model was developed to address limitations of existing force fields in reproducing the structural ensembles of intrinsically disordered proteins and regions. It incorporates enhanced dispersion and moderately stronger electrostatic interactions to improve the balance between water dispersion and electrostatics [8]. Zapletal et al. [7] showed that for proteins containing both folded and disordered regions, the CHARMM36m force field [9] in combination with the TIP4P-D water model provides a robust framework, preventing collapse of disordered regions while preserving folded regions. Acknowledging that the behavior of disordered regions can be case-specific, we conducted molecular dynamics simulations of the two cLD dimer models using the CHARMM36m force field with both TIP3P and TIP4P-D water models.”

      (6) I suggest referring to the methodology part for simulation details as much as possible when presenting the story.

      We thank the Reviewer for this suggestion. In the revised manuscript, we now refer the reader to the Methodology section for detailed descriptions of the HDX-MS data analysis and the MM/PBSA free energy calculations.

      Addition to the text. (Results section: Hydrogen-deuterium exchange experimental data validate the cLD dimer structure) “From our simulations, we calculated the theoretical deuterated fraction using the method by Bradshaw et al.[10] and compared it to the experimental data (Fig. 1C-D and Supplementary Fig. 10) (see Methods).”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      (7) Error bars and methodology of error analysis should be provided for all cases of all-atom simulations if possible, since convergence is always an issue when considering these conformational changes within microseconds of all-atom simulations.

      We thank the Reviewer for the important observation. We agree and added error methodology for the estimation of theoretical deuterated fractions (Fig. 1C).

      Addition to the text. (Figure C legend) “Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping.”

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To reproduce the time points after incubation in deuterium (D<sub>2</sub>O), we computed deuterated fractions separately for each of the two monomers constituting a dimer for the time points 0.5 min (30 s) and 5 min (300 s). Then, we computed the mean and standard deviation over the data coming from replicas of the same cLD dimer model (AF or PDB model) and the same water model (TIP3P or TIP4P-D). To estimate the uncertainty of the mean values obtained from our datasets and the dataset from Amin-Wetzel et al. ([11] Figure 3—source data 1), we applied a non-parametric bootstrap resampling procedure. For each sequence range from HDX-MS analysis, we treated the measurements from the N=6 independent datasets as independent samples, accounting for 3 replicas each with two monomers (6 monomers total). We then generated 10,000 bootstrap replicates by sampling the datasets with replacement, maintaining the same number of samples N in each resample. For each replicate, we calculated the mean at each sequence position. The resulting distribution of bootstrap means was used to compute the standard deviation as an estimate of the standard error. We computed the difference between simulation and experimental data (deuterated fraction discrepancy), and for each residue, we selected as the ‘best structure’ the model with the discrepancy closest to zero among PDB-TIP3P, PDB-TIP4P-D, AF-TIP3P, and AF-TIP4P-D systems.”

      (8) Technically I would call DR1 and DR2 linker regions within a folded structure. Their motions are quite restrained by the fold part. I therefore, am not sure how much TIP4PD really helps in contrast to a scaled TIP3P. A plot of structures colored with PLDDT score or b-factor within the PDB should be provided. Quantitative metrics of these regions (e.g. chi chi-squared) might help justify the choice of the AF model against the PDB model. Currently, the two models look very similar in Figures 1c and 1d. Similarly, quantitative metrics as a function of different simulation time windows will help justify the convergence of the simulation and indicate the flexibility of these regions.

      We thank the Reviewer for this thoughtful comment. In response, we analyzed the AlphaFold2 and AlphaFold3 predictions, which consistently assign very low pLDDT values (<50) to the DR2 region, while DR1, is predicted with higher but still low confidence (50 < pLDDT < 70). These scores indicate intrinsic uncertainty in the structural definition of both regions, supporting their flexibility despite being located within a folded context.

      Addition to the text. (Results section: The hIRE1_α_ cLD forms a stable dimer) “All five AlphaFold 2 predictions closely resembled the top-ranked model used for our simulations (Supplementary Fig. 7C). In contrast, the five AlphaFold 3 predictions yielded greater variability in DR2 organization and longer helices in DR2, but still consistently maintain low pLDDT scores in this region, indicating disorder (Supplementary Fig. 7D).”

      Addition to the text. (Figure 7 C-D legend) “(C) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT). (D) Superposition of the 5 structures predicted by AlphaFold 3 for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (9) Fluorescence anisotropy seems to be an important set of experimental data to justify the binding of multiple unfolded peptides to IRE. I suggest the authors include a bar plot of binding affinity of different variants in Figure 3. The raw titration curves should also be included in SI.

      We thank the Reviewer for this valuable suggestion. The binding affinities reported in previous studies are summarized in Table 2; the reader is referred to those works for the corresponding raw titration curves. The binding affinities for the cLD mutants analyzed in the present study are provided in Table 3, and the associated titration curves are shown in Figure 4G.

      Addition to the text. (Figure 4G legend) “Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary figures and tables: Table 3) See Tab. 1

      (10) The authors should discuss the dependence of initial orientations of unfolded peptides on the final results. The authors claimed that after 1 microsecond simulations, the orientation of these peptides to IRE changed. Quantitative metrics showing both the binding (e.g., number of contacts) and binding orientation (contact region or angles) should be provided to tell whether the simulation is converged. The comparison to the experimental data lacks quantitative metrics. The authors mentioned the dissociation of MPZ1N-2X-RD in half of the simulations; they might want to provide such a metric for all peptides. Technically, 1 microsecond brute-force simulation is quite short for observing such a binding event, and enhanced sampling methods (e.g. metadynamics) might be necessary for investigating binding. However, at least the presentation and interpretation of the current results should be improved for comparing simulations and experiments.

      We thank the Reviewer for the insight. We expanded the discussion of the peptide orientation and added an analysis of the peptide angle with respect to the cLD central groove and contacts. Additionally, we inserted AlphaFold 3 predictions of all the simulated complexes.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0◦ orientation", as the peptide forms a 0 ◦ angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0◦) orientation (Supplementary Fig. 14). We refer to these poses as the "90◦ orientation" and "270◦ orientation".”

      Addition to the text. (Supplementary Figures and Tables Fig. 14) “(A) Peptide orientation with respect to the central groove principal axis. The angle was computed as the dihedral angle described by the Cα atoms of Y161 residues (groove principal axis) and the C_α_ atoms of residues L1 and A12 of the MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 10 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between hIRE1α cLD dimer and MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 50 frames, while the shaded lines indicate the value per frame. The analysis were performed on three sets of simulations: "90 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis; "270 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis but flipped 180 degrees with respect to the 0 degree; "0 degrees" orientation, the peptide is placed parallel to the groove principal axis.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. ??A).”

      Addition to the text. (Supplementary Figures and Tables Fig. 16A) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT).”

      (11) I also have a couple of questions regarding the point mutant Y161R. a) The motivation of mutating Y161 to R is more speculative (Figures 4a,b) than quantitative. The authors might want to show an intermolecular contact map between IRE and unfolded peptides or IRE contact probability along residue indexes to show the interaction hotspots. Figure S11 only showed the structure instead of any metrics for such a purpose. b) It might be better to also show a histogram of the distances of Figure 4e and 4f. Figure 4f actually suggested 1 microsecond simulation is quite short to observe the dissociation event. c) Testing the mutation within the experiment, if possible, would clearly strengthen this part of the manuscript.

      We thank the Reviewer for these constructive suggestions. We have added an analysis of intermolecular contacts for the Y161R and E102R mutants (Fig. 18A–B), which highlights the interaction hotspots between IRE1 residues and the unfolded peptides. To further characterize peptide–groove interactions, we now provide minimum peptide–groove distance time series for all peptides (Fig. 15B). Moreover, to experimentally support our simulations, we performed fluorescence anisotropy measurements on the MPZ1N-2X peptide with cLD WT and mutant constructs. These experiments confirm our computational observations (Fig. 4F–G and Fig. 18C).

      Addition to the text. (Figure 18 legend) “(A) Number of contacts between residues 102 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between residues 161 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (C) Protein purification of WT hIREα LD and mutants E10R and Y161R.”

      Addition to the text. (Figure 4F-G legend) “(F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (12) Similar comments of quantitative analysis (e.g. contact map as a function of simulation time) apply to the last part of results when discussing the intermolecular interactions. Observations such as "the interface predicted by AlphaFold showed stability across MD simulation replicas lasting 200 ns" were provided, but there is no quantitative analysis. How consistent was this observation across multiple replicas of simulations, and how many replicas were used?

      We thank the Reviewer for this valuable suggestion. To provide a quantitative assessment, we performed new triplicate simulations of the BiP–cLD monomer complex and plotted the fraction of native contacts over time. These results, which demonstrate the consistency of the interface across replicas, are now included in the Supplementary Material.

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Figure 20 legend) “Fraction of native contacts between BiP and cLD monomer in simulations of the structures predicted by AlphaFold 3 without ligands or in complex with ADP or ATP. The dark lines indicate the rolling average of the fraction of native contacts over 100 frames, while the shaded lines indicate the value per frame. The fraction of native contacts (Q) was calculated according to the definition of Best et al. [12]: . For N pairs of native contacts (i, j), where is the distance of the pair in the initial configuration (here the AlphaFold 3 prediction), r<sub>(i,j)</sub>(X) is the distance at frame X, β is a smoothing parameter (β = 50 nm<sup>−1</sup>), λ is the tolerance of the reference distance (λ \= 1.8) and the cutoff used to define a contact between heavy atoms was 0.45 nm.”

      (13) The figure legends are noted using lowercase letters but are described using uppercase.

      We thank the Reviewer for pointing that out, and we changed everything to capital letters.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: I am confused about the HDX-MS results shown in Figure 1. Here, I must also mention that I am not familiar with comparing HDX-MS experiments with MD simulations. The authors mention that they show the deuterated fraction computed from MD simulations for the PDB and AF model at time points 0.5 min and 5 min. However, this time certainly does not correspond to the MD simulation time, thus, it is unclear to me where the difference between the results comes from. Are the two time points some input parameters to the script used to calculate the deuterated fraction? Thus, I would ask the authors to better explain what is the difference in the results between the two time points. Especially, since the general reader might not be familiar with comparing HDX-MS experimental results to MD simulations. Furthermore, I would ask the authors to clarify in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      We thank the Reviewer for pointing us to this possible source of confusion. The time points are effectively input parameters to the calculations of theoretical deuterated fractions from MD simulations. We expanded the explanation of the method in the method section and clarified in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To determine the deuterated fraction of a peptide segment from simulations, the protection factor for each residue i, Pi, must be computed from the simulation snapshots, following the approach of Best and Vendruscolo [13]: . Here, N<sub>C,i</sub> and N<sub>H,i</sub> are the number of H-bonds and heavy-atom contacts of the backbone amide of residue i, and the scaling factors β<sub>C</sub> and β<sub>H</sub> are set to 0.35 and 2.0, respectively. The simulated deuterated fraction of a peptide segment, , defined by residues m<sub>j</sub> +1 to n<sub>j</sub>, was then calculated at any exchange time point t as:

      Where m<sub>j</sub> and n<sub>j</sub> are the first and last residue numbers of the j-th protein fragment, respectively. The intrinsic exchange rate constants for each residue type () were obtained from Bai et al. with updated acidic residues and glycine [14, 15].”

      Addition to the text. (Figure 1 legend: ) “This time point corresponds to experimental incubation times, not MD simulation time.”

      Addition to the text. (Figure 10 legend: ) “Time points correspond to experimental incubation times, not MD simulation time.”

      (2) For AlphaFold 2 Multimer prediction, the authors only considered the top predicted structure. However, AF2-M, one generally obtains 5 structures, and it is also possible to obtain more structures by using an additional random seed. Thus, it would be interesting if the authors would consider the difference between the 5 structures they obtained from the AF2-M prediction. Are they all very similar? (Especially considering the DR1 and DR2 segments, that is the main difference between the PDB and AF2 structures). Analyzing the different predicted AF2 structures would give more insight into the accuracy of the AF2-M predicted model.

      We thank the Reviewer for this insightful suggestion. All AF2-M predicted structures were found to be highly similar, and we now include them in Figure 7E for comparison.

      Addition to the text. (Figure 7E legend) “(E) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (3) On Page 6, the authors talk about a "an early PDB model". First, I find the nomenclature "early" confusing here; perhaps it would be better to talk about "an initial PDB model", but I leave it up to the authors to think about if they want to change that. More importantly, reading the Comp. detail on Page 23, it is not so clear what the difference is between the "early" and "final" PDB models, and how the difference in their setups leads to different results. The information is somewhat there on Page 6 and Page 23, but it can be made much clearer. Thus, I would ask the authors to better explain the difference between the early and final PDB models.

      We thank the Reviewer for this helpful comment. In the revised manuscript, we have clarified the terminology and provided a more explicit explanation of the differences between the two IRE1 models, both in the Results section and in the Methods.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “An initial PDB model with modified side chain orientations in residues L116 and Y166 due to the modelling of neighbouring missing DR1, caused the dimer to dissociate in one-third of the replicas. [...] The final PDB model, with correctly oriented L116 and Y166 (Supplementary Fig. 9B), was stable in simulations in both TIP3P and TIP4P-D water (Supplementary Fig. 7B).”

      Addition to the text. (Methods section: IRE1_α_ core Luminal Domain (cLD) structural models - Human PDB dimer) “An initial PDB model was briefly equilibrated in NPT, and a conformation with a groove width of approximately 0.6 nm was selected. This snapshot was used as the initial structure for the initial “PDB model” simulations, in which the dimer dissociates.”

      (4) Page 12: "In early simulations", again, I find the nomenclature "early" confusing here. Perhaps it would be better to talk about "In initial simulations" or "In preliminary simulations", but I leave it up to authors to think about this.

      We thank the Reviewer for pointing out this possible source of confusion. We improved the text by referring to these simulations based on the different orientations of the peptide on the cLD dimer in the modeled complex.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0° orientation", as the peptide forms a 0° angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0°) orientation (Supplementary Fig. 14). We refer to these poses as the "90° orientation" and "270° orientation".”

      Here, we provide a detailed description of the additional changes made to the manuscript.

      Additional edits to the manuscript

      Following discussions with Prof. Dr. David Ron, we refined our BiP model by removing the signal peptide (residues 1–18). Using AlphaFold 3, we predicted BiP–cLD heterodimeric complexes in the presence of ADP, ATP, or without nucleotide. Each of the three complexes was simulated in TIP3P water, in three independent replicas of 1 µs each.

      Addition to the text. (Results section: hIRE1α cLD intermolecular interactions guide the activation process) “We used AlphaFold 3 to model the interaction between a cLD monomer and BiP (residues E19–L654) in the presence of ATP and ADP (Fig. 5B, Supplementary Fig. 19A). Prediction quality was limited in the apo and ADP-bound states (pTM = 0.48, ipTM = 0.59; pTM = 0.49, ipTM = 0.61, respectively), whereas ATP binding improved accuracy (pTM = 0.66, ipTM = 0.72). The predicted interfaces involved DR2, particularly residues 314PLLEG-318, forming a short parallel β-sheet with the substrate-binding domain (SBD) of BiP through two hydrogen bonds. All AlphaFold 3 models were stable across three 1-µs simulations (Supplementary Fig. 19B), with cLD–BiP interfaces retaining 60–80% of initial contacts (Supplementary Fig. 20). In the apo and ADP-bound states, the nucleotide-binding domain (NBD) showed high Predicted Aligned Error (PAE) relative to the cLD, indicating uncertain positioning of the two domains relative to each other. Notably, in the ADP-bound state, which is thought to interact with hIRE1α cLD, the NBD remained mobile but proximal to the αB-helices, thereby restricting access to this region. Together, the AlphaFold 3 predictions suggest that BiP engages hIRE1α cLD by sterically hindering the oligomerization interface defined by DR2 and the αB-helices [16].”

      Addition to the text. (Figure 5 legend) “(B) BiP-cLD monomer complex as predicted by AlphaFold (BiP in shades of purple, cLD in orange) before the simulation (t = 0 µs) and at the end of the simulation (t = 1 µs). The SBD (residues E19-D408) is colored in light purple, and the NDB (residues C420-E650) in dark purple, and the interdomain linker (residues D409-V419) and KDEL motif (residues K651-L654) in light purple.”

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Methods section: cLD monomer in complex with BiP) “The BiP-cLD heterodimer systems were predicted with AlphaFold 3 using the AlphaFold server[17] at https://alphafoldserver.com/. The hIRE1α cLD sequence used is the same used for predicting the dimer: the PDB 2HZ6 sequence, Uniprot identifier O75460 with mutations C127S and C311S, and residues P29-P368. The BiP sequence used is taken from UniProt identifier P11021, residues E19L654. We predicted three complexes: one without any nucleotide, one containing ADP, and another containing ATP. Simulations of the BiP-cLD complex were run in TIP3P water.”

      We have updated the Zenodo repository with additional data and calculations, and the corresponding link is provided in the manuscript.

      References

      (1) Mario S. Valdés-Tresanco, Mario E. Valdés-Tresanco, Pedro A. Valiente, and Ernesto Moreno. gmx_mmpbsa: A New Tool to Perform End-State Free Energy Calculations with GROMACS. Journal of Chemical Theory and Computation, 17(10):6281–6291, October 2021. Publisher: American Chemical Society.

      (2) Bill R. III Miller, T. Dwight Jr. McGee, Jason M. Swails, Nadine Homeyer, Holger Gohlke, and Adrian E. Roitberg. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. Journal of Chemical Theory and Computation, 8(9):3314–3321, September 2012. Publisher: American Chemical Society.

      (3) Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, and Luhua Lai. Target-Specific De Novo Peptide Binder Design with DiffPepBuilder. Journal of Chemical Information and Modeling, 64(24):9135–9149, December 2024. Publisher: American Chemical Society.

      (4) Alexander D. MacKerell Jr., Bernard Brooks, Charles L. Brooks III, Lennart Nilsson, Benoit Roux, Youngdo Won, and Martin Karplus. CHARMM: The Energy Function and Its Parameterization. In Encyclopedia of Computational Chemistry. 2002.

      (5) Bernard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J. States, S. Swaminathan, and Martin Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4(2):187–217, 1983.

      (6) Junxi Mu, Hao Liu, Jian Zhang, Ray Luo, and Hai-Feng Chen. Recent Force Field Strategies for Intrinsically Disordered Proteins. Journal of Chemical Information and Modeling, 61(3):1037–1047, March 2021.

      (7) Vojtech Zapletal, Arnošt Mládek, Kateˇ ˇrina Melková, Petr Louša, Erik Nomilner, Zuzana Jasenáková, Vojtˇ ech Kubᡠn, Markéta Makovická, Alice Laníková, Lukᚡ Žídek, and Jozef Hritz. Choice of Force Field for Proteins Containing Structured and Intrinsically Disordered Regions. Biophysical Journal, 118(7):1621–1633, April 2020.

      (8) Stefano Piana, Alexander G. Donchev, Paul Robustelli, and David E. Shaw. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. Journal of Physical Chemistry B, 119(16):5113–5123, April 2015.

      (9) Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L. de Groot, Helmut Grubmüller, and Alexander D. MacKerell. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nature Methods, 14(1):71–73, January 2017.

      (10) Richard T. Bradshaw, Fabrizio Marinelli, José D. Faraldo-Gómez, and Lucy R. Forrest. Interpretation of HDX Data by Maximum-Entropy Reweighting of Simulated Structural Ensembles. Biophysical Journal, 118(7):1649–1664, April 2020.

      (11) Niko Amin-Wetzel, Lisa Neidhardt, Yahui Yan, Matthias P. Mayer, and David Ron. Unstructured regions in IRE1 specify BiP-mediated destabilisation of the luminal domain dimer and repression of the UPR. eLife, 8, December 2019.

      (12) Robert B. Best, Gerhard Hummer, and William A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences, 110(44):17874–17879, October 2013. Publisher: Proceedings of the National Academy of Sciences.

      (13) Robert B. Best and Michele Vendruscolo. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins: Characterization of the Native State Fluctuations of CI2. Structure, 14(1):97–106, January 2006.

      (14) Yawen Bai, John S. Milne, Leland Mayne, and S. Walter Englander. Primary structure effects on peptide group hydrogen exchange. Proteins: Structure, Function, and Bioinformatics, 17(1):75–86, 1993. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.340170110.

      (15) David Nguyen, Leland Mayne, Michael C. Phillips, and S. Walter Englander. Reference Parameters for Protein Hydrogen Exchange Rates. Journal of the American Society for Mass Spectrometry, 29(9):1936–1939, September 2018. Publisher: American Society for Mass Spectrometry. Published by the American Chemical Society. All rights reserved.

      (16) G Elif Karagöz, Diego Acosta-Alvear, Hieu T Nguyen, Crystal P Lee, Feixia Chu, and Peter Walter. An unfolded protein-induced conformational switch activates mammalian IRE1. eLife, 6:e30700, 2017.

      (17) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile Žemgu-˙ lyte, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey˙ Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn<sup>2</sup>-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe2+, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn2+ cannot fulfil that role. Fe2+ could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. All were highlighted as blue color. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments both from the rodent and the human literature such as splitter cells, lap cells, the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over/under representation of context information.

      My general assessment of the work is unchanged, and I still have some questions requesting methodological clarification

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, action selection. The model also nicely links ideas from reinforcement learning to a neuronally interpretable mechanisms, e.g. learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be heavily improved. Judgment of generality and plausibility of the results is severely hampered but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is impossible to judge whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work on the field.

      Thank you for pointing this out.

      In the revised text, we clarified the definition of “time step” and how hippocampal neurons behaved in each time step (see individual comments below). Also, we clarified the implementation of disorder conditions in our model by indicating the exact neuron numbers of the stimulus domain in H module as below. (Other parameters were common in all conditions.)

      “𝑋 consists of two domains: stimulus domain 𝑋 and context domain 𝑋. The neuron ratio in the stimulus domain over the whole neurons dim 𝑋/𝑁 is 16.7% (200 neurons) for the control condition, 2.5% (30 neurons) for the SZ condition, and 50% (600 neurons) for the ASD condition.”

      Comments:

      The authors have made strong efforts to improve on their description of the methods, however, it is still very hard to understand. As a result of some of their clarifications, new issues appeared that I was not able to extract in the previous version.

      (1) Particularly I had problems figuring out how the individual dynamical systems are interrelated (sequences, attractor, action, learning). As I understand it now (and I still might be wrong) there is one discrete time dynamics, where in each time step one action takes place as well as the attractor and sequence dynamics are moved one step forward. Also, synaptic updates happen in every one of those time steps. The authors may verify or correct my interpretations and further improve on their description in the manuscript. It is also confusing that time in the figure panels is given in units of trials, where each trial may consist of (maybe different amounts of) multiple time steps. Are the thin horizontal red ad blue lines time steps?

      Thank you for raising the confusing point.

      The reviewer’s understanding is correct. In our model, at each time step the agents transition to the next environmental state (which also corresponds to the contextual state). During this step, each processing stage proceeds in order: Context selector performs attractor selection, Sequence composer performs sequence selection, followed by action selection and synaptic updates. As learning progresses and hippocampal sequences begin to predict longer futures, reducing the need for step-by-step planning. However, at least at the beginning of each task, all processes are conducted at each time step (see Fig. 1G).

      In all tasks, trials are reset when the agents visit the reward sites (i.e., S4 or S5). n Fig. 2C, for example, one trial consists of three time steps (i.e., three state transitions), and the red and blue shaded regions indicate individual trials. During each time step, two types of hippocampal neurons are activated: a state-coding neuron and a transition-coding neuron. (In contrast, in X, one contextual state is active during one time step). Therefore, in Fig. 2E, two neuronal activities correspond to a single time step.

      For clarification, we have revised Fig. 2 and related descriptions in the manuscript as follows.

      “Here, we simplified this task by using an environment with five discrete states (S1-S5), i.e., five discrete external stimuli (Figure 2A), where agents transition to the next state at each time step.”

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent, with each trial resetting upon visiting the reward sites (S4 or S5). ”

      “At each time step, one state-coding neuron and one transition-coding neuron are active in this order.”

      “At each time step, the agents transition between environmental states.”

      “The model’s computational dynamics are fundamentally synchronized with the environmental (behavioral) time step, and at each time step, the agents transition to the next environmental state. Upon a state transition, the agents first perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron.”

      (2) As a consequence of my new understanding of the model dynamics, I have become doubts about the interpretation of the attractor network as context encoding. Since the X population mainly serves to disambiguate sequence continuation, right before the action has to be taken (active for only two time steps in Figure 1C?) they could also be considered to encode task space (El-Gaby et al. 2024; doi: 10.1038/s41586-024-08145-x).

      We thank the reviewer for this insightful comment.

      First of all, we would like to clarify that Figure 1C shows the following process: the activity of H at time step t−1 and the external stimulus at time step t jointly provide input to X module, and the activity of X settles into a contextual state at the time step t. As explained in our response to comment (1), the activity of X remains constant during each time step.

      The primary function of X module in our model is to disambiguate the environmental states defined by the external stimuli based on the history information. It is true that, in practice, whether an ongoing sequence is maintained or remapped depends on whether the observed stimulus is consistent with the predicted stimulus. However, this is a consequence of the predictive sequence obtained from scratch rather than the primary computational role of X module. In contrast, X module becomes particularly important when past experience does not uniquely determine the next state. In this situation, the agent must infer the contextual state by associating the current situation with previously experienced contexts, rather than relying solely on temporal continuity.

      We also add that, in most successful cases, the contextual states learned by the agent often correspond to the hidden states of each task as a result of disambiguation. In this sense, the resulting representation may resemble a “task space” encoding, as suggested by the reviewer. However, an important aspect of our model is that the agent does not assume the existence or number of hidden states a priori. Instead, we considered the situation where the agent initially underestimates the number of contextual states, and through remapping it incrementally increases the number of contextual representations. When the number of contextual states matches the number of hidden task states, the task is typically solved.

      (3) Also technically, I wonder why the authors introduce the criterion of 50(!) time steps to allow the attractor to converge, if the state of the attractor network is only relevant in one time step to choose the appropriate continuation of the sequence of actions. Is attractor dynamics important at all? What would happen if just the input and output weights to the X population are kept and the recurrent weights are set 0?

      We thank the reviewer for raising this confusing point.

      First, we would like to clarify that the “50 iterations” mentioned in the manuscript does not refer to 50 environmental time steps. We implemented multiple iterations of attractor updates (typically until convergence) by Context selector within each behavioral time step.

      We clarify this point in the Method section as below.

      “After history-based or landmark-based initialization, X iteratively updates its contextual state at the beginning of each time step according to the associative memory dynamics:”

      The recurrent connectivity within the X population is essential for attractor updates. If the recurrent weights were removed (i.e., set to zero), the network would lose the ability to retrieve distinct contextual states for the same stimulus. In that case, the model would be unable to solve the context-dependent task as we showed in this manuscript.

      (4) Figure 3E: How many time steps are the H cells active (red bars?) Figure 4J: What are the units of the time axis?

      Thank you for pointing this out.

      In Figure 3E, each time step is indicated in the X-axis ticks (i.e., each environmental state). As we explained in the comment (1), two hippocampal neurons’ activity (red bars) corresponds to each time step.

      Similarly, in Figure 4J, each time step is indicated in the X-axis ticks. To better represent the results, we added descriptions of the environmental states in our model to the X-axis tick labels in Figure 4J.

      We added the following texts below in Figure captions.

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the sorted activity of H module.”

      “The x-axis represents each time step (corresponding to environmental states), and the y-axis shows the decoding accuracy of each context based on hippocampal activity.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

      We sincerely thank the reviewer for the comprehensive summary and the highly positive and encouraging comments on our manuscript.

      Weaknesses:

      (1) Title and Abstract Emphasis:

      The title and abstract are effective but could be slightly sharpened to emphasize the primary message. Consider a title like "Fully computational design of a PAM-relaxed SaCas9 variant with UniDesign demonstrates power to match directed evolution." The abstract could more explicitly state upfront that the design was achieved without any experimental iteration.

      Thank you for this valuable suggestion. We have revised the title and abstract accordingly to better reflect your feedback.

      (2) Figure 1, Panel M:

      The data points in panel M are currently presented at a font size that makes them difficult to read, particularly the labels for the many triple-mutant variants. This density obscures the clear identification of the top-performing designs, such as the KRH variant selected for experimental validation. I recommend that the authors increase the font size of all text elements within this panel, including axis labels, tick marks, and data point labels, to improve legibility. If necessary, the panel dimensions can be adjusted or the layout reorganized to accommodate the larger text without compromising clarity. Ensuring this figure is readable is important, as it visually communicates the energetic convergence that led to the selection of KRH.

      Thank you for this helpful suggestion. We have increased the font size the Figure 1M, as well as in Figure 1C and Figure 1E, to improve the readability in the revised manuscript.

      (3) Generality of the Design Strategy for Other PAM Positions:

      The design strategy focused on relaxing specificity at the highly constrained third position of the PAM (the guanine in NNGRRT). How transferable is this specific strategy (i.e., disrupting a key specific contact and compensating with non-specific backbone binders) to relaxing other positions in the PAM or to other Cas enzymes with different PAM-interaction architectures? A short discussion on this point would help readers understand the broader applicability of the "fine-tuning the balance" principle.

      Thank you for this insightful question and suggestion. The current study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which eight Cas9 proteins and two Cas12 proteins (each has a different PAM) were investigated. Our computational results demonstrated that UniDesign can effectively capture the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs). For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform similar PAM relaxation designs for other Cas9 or Cas12 proteins, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have included additional discussion to clarify this point and highlight the broader applicability of our design strategy.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) reducing positional bias at PAM position 3;

      (2) restoring DNA binding through nonspecific interactions;

      (3) combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

      Strengths:

      The design pipeline is entirely computational and does not rely on experimental data for pretraining or iterative optimization.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, which may reflect insufficient exploration of the available sequence space.

      Thank you for this insightful critique. In the present study, our strategy was not to allow UniDesign to freely explore all 27 mutable positions simultaneously, but rather to constrain the search to point mutations (e.g., double or triple mutants) within the full sequence space (approximately 20<sup>27</sup>). Even with this constraint, UniDesign effectively samples a substantially large design space compared to traditional protein engineering approaches.

      Through iterative design, we observed that only certain residue types became enriched at a subset of positions when identifying effective double mutants. These enriched residues were then systematically combined to generate performance-enhancing triple mutants in an automated manner. Although we ultimately selected the KRH mutant for experimental validation due to its high similarity to the known KKH variant, UniDesign also proposed additional multi-mutants that are distinct from KKH (see Figure 1M).

      Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The benchmarking of the UniDesign method is insufficient. How its performance compares to other protein design algorithms, whether the energy function parameters were systematically optimized, and if the design strategy can be generalized to other Cas9 orthologs or genome engineering tasks.

      Thank you for this valuable critique. The present study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which many of these concerns were systematically addressed. In that study, UniDesign was benchmarked against Rosetta, a well-established protein design platform, across eight Cas9 proteins and two Cas12 proteins, each recognizing distinct PAM sequences.

      Our results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs) across these CRISPR–Cas systems. For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform analogous PAM relaxation designs for other Cas9 or Cas12 proteins in this work, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We have incorporated additional discussion in the revised manuscript to address these points and clarify the broader applicability of our approach.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) SaCas9 is highlighted for its AAV compatibility, but the manuscript does not further discuss how the KRH variant may benefit AAV-based genome editing applications. A brief discussion on how expanded PAM compatibility could facilitate target selection in AAV-constrained therapeutic settings would strengthen the translational relevance of the work, potentially reducing the need for split-Cas9 or dual-vector strategies.

      Thank you for your helpful suggestion. We have added a brief discussion in the revised manuscript highlighting how the KRH variant’s expanded PAM compatibility may enhance AAV-based genome editing applications. Specifically, this property can broaden the range of targetable genomic sites and may reduce the need for split-Cas9 or dual-vector delivery strategies in size-constrained AAV therapeutic contexts.

      (2) The study shows that a fully computational workflow can recapitulate the performance of an evolution-derived variant. A short discussion comparing the scalability and practical advantages of computational design versus directed evolution for future PAM engineering would help emphasize the broader methodological significance of UniDesign.

      Thank you for your valuable suggestion. We have added a brief discussion in the revised manuscript comparing the scalability and practical advantages of computational design with directed evolution for PAM engineering. Specifically, we highlight that UniDesign enables rapid and scalable exploration of sequence space without requiring iterative experimental screening, thereby offering a complementary—and in some cases more efficient—approach to directed evolution for future protein engineering applications.

      (3) The noticeable variation in editing efficiency across cell types, particularly the lower activity in A549 cells. Could the authors explain why the differences in editing efficiency are so large?

      Thank you for this insightful comment. We agree that the variation in editing efficiency across cell types—particularly the lower activity observed in A549 cells—warrants clarification, and we have added a corresponding discussion in the revised manuscript. We attribute this observation to two main factors. First, transfection efficiency varies substantially across cell lines; in our experiments, A549 cells exhibited lower transfection efficiency compared to HEK293T, HeLa, and U2OS cells, which likely contributes to the reduced editing efficiency. Second, the intrinsic performance of genome editing systems can differ across cellular contexts due to variations in DNA repair pathways, including chromatin accessibility and the expression levels of key repair-related genes. Importantly, despite this cell-type-dependent variability in absolute editing efficiency, the KRH variant consistently outperformed wild-type SaCas9 across all tested cell lines, underscoring the robustness and general applicability of our design.

      (4) Given that the computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, it would be valuable to discuss whether R968 (or saturation mutations at this site) has previously been explored experimentally, and to elaborate on strategies for further expanding the diversity of mutations identified through the computational design framework.

      Thank you for your suggestion. We have added a brief discussion in the manuscript noting that, to the best of our knowledge, R968 has not been experimentally characterized prior to this study. It was identified solely through our computational design workflow, highlighting the strength of our approach.

      Reviewer #3 (Recommendations for the authors):

      (1) During the protein amino acid conformational sampling process in UniDesign, were nucleic acid conformational changes taken into consideration?

      Thank you for this question. Nucleic acid conformational changes were not explicitly considered during the protein sequence design stage in UniDesign after the four specific PAM variants (e.g., TTAGGT, TTCGGT, TTGGGT, and TTTGGT) were defined. We consider this assumption reasonable, as the base conformations in these PAM sequences are expected to remain largely stable, with minimal structural variation due to preserved base-stacking interactions.

      (2) The authors used a mutation-count penalty to control the number of mutations generated during the design process, which appears to occasionally yield results that exceed the intended limit. Is this an efficient approach? Could the count be controlled more directly by imposing constraints within the design procedure itself?

      Thank you for these insightful questions. You are correct that the design process may occasionally yield variants exceeding the intended mutation limit. This occurs because the mutation-count penalty is implemented as a soft constraint, where violations incur a penalty rather than being strictly excluded. Based on our benchmarking, this strategy—combined with the duplicate-design penalty—has been effective in generating multimutant variants with mutation counts close to the desired range. However, we acknowledge that this approach may not achieve optimal efficiency. We are currently developing improved strategies in UniDesign to more directly control mutation counts by incorporating explicit constraints during the sequence simulation process, which we expect will further enhance design precision and efficiency.

      (3) Is the new version of UniDesign developed specifically for the Cas9 design task in this study? What are its advantages and disadvantages compared to other state-of-the-art protein design algorithms?

      Thank you for this important question. The new version of UniDesign (v1.2) was not developed specifically for Cas9 engineering. Rather, it is intended as a general framework for protein engineering tasks that focus on introducing point mutations to improve protein properties, as opposed to de novo design. Compared to current state-of-the-art protein design methods—many of which are deep learning–based—UniDesign offers distinct advantages and limitations. Deep learning approaches are often highly efficient and powerful but may lack interpretability in their predictions. In contrast, UniDesign is a well-benchmarked, lightweight, physics-based method that provides greater interpretability, allowing users to better understand the underlying basis of the design decisions. On the other hand, a limitation of UniDesign is that it is less straightforward to incorporate experimental feedback for iterative refinement, such as fine-tuning the scoring function for specific design tasks.

      (4) The study employed a three-round design process to obtain the mutants. Is there a conformational correlation between the mutation sites identified in these three rounds? Could this have been accomplished in a single computational run instead of three separate calculations?

      Thank you for these insightful questions. We adopted a multi-round design strategy for SaCas9 PAM relaxation because this task inherently involves multi-objective optimization: enhancing PAM compatibility—particularly relaxing base recognition at the third PAM position—while preserving editing activity comparable to wild-type SaCas9. In our view, identifying the key mutations (e.g., E782K, N968R, and R1015H) in a single UniDesign run would be highly challenging due to competing energetic requirements. In the first round, R1015H emerged from single-site mutational scanning as the most favorable PAM-relaxing mutation based on its minimal MAD score. However, this mutation also significantly increased the binding energy relative to wild-type SaCas9 with its native PAM, suggesting a likely reduction in editing activity due to weakened binding. To address this, the second round focused on compensatory mutations. Variants such as E782K and N968R (along with several additional candidates) were identified in the context of R1015H to reduce binding energy and partially restore affinity. In the third round, we further combined compatible mutations from the second round, resulting in variants that more effectively lowered binding energy and restored it to levels comparable to wild-type SaCas9 with its native PAM. Notably, the design objectives in rounds one and two drive binding energy in opposite directions, making it unlikely that all key mutations could be identified simultaneously in a single run. During the design process, we also observed conformational correlations among mutation sites. For example, R1015H can form hydrogen-bonding interactions with residue E993, and we observed multiple alternative mutations at position 993 (e.g., E993S, E993P, E993A, E993G, E993K, and E993R), suggesting local structural coupling between these positions.

      (5) In Figure 4D, for the FANCF-1 site, there appears to be a noticeable difference in editing efficiency between KKH-ABE and KRH-ABE. Is this difference statistically significant? If so, please provide an explanation for this observation.

      Thank you for this question. For the FANCF-1 site shown in Figure 4D, we performed statistical analyses and found that the differences in editing efficiency between KKH-ABE and KRH-ABE are not statistically significant: P(A4) = 0.1239, P(A10) = 0.0671, P(A12) = 0.0942, and P(A13) = 0.1349 (two-tailed unpaired Student’s t-test). These results indicate that KRH-ABE and KKH-ABE exhibit comparable editing efficiencies at this site, supporting our overall conclusion that the computationally designed KRH variant achieves performance on par with the KKH variant.

      (6) Does the evolutionary term within the UniDesign scoring function bias the designed sequences towards pre-existing protein features?

      Thank you for this question. In this study, as well as in our previous work on Cas9 PAM recognition modeling (PMID: 37078688), the evolutionary term in the UniDesign scoring function was completely disabled. Therefore, it does not introduce any bias toward pre-existing protein features in the designed sequences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their thoughtful and constructive feedback. We appreciate that all reviewers recognized the value of our study in linking adult neurogenesis and synaptic plasticity to representational drift in the olfactory system. They described the model as elegant and well-motivated, and agreed that it provides new theoretical insight into how stability and adaptability can coexist in sensory representations. The reviewers also identified areas where our manuscript could be strengthened, and as outlined in our revision plan we have:

      (1) Refined our description of mitral/tufted cell stability and expand on within-session and across-day variability.

      (2) Substantially expanded the Discussion to compare our modeling assumptions with experimental findings and recent anatomical evidence. Additionally, we have included the limitations of the study and areas for future investigation.

      (3) Included a clearer description of the STDP implementation, plastic synapses, and their functional effects.

      (4) Add a short section outlining model-based predictions that can guide future experiments. We also made minor textual edits to improve precision and flow, including citing prior conceptual work and clarifying model procedures.

      These changes have strengthened both the conceptual framing and technical clarity of the paper. We are grateful for the reviewers’ careful reading and valuable suggestions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      We agree with the reviewer and believe this is a critical discussion point. Indeed, both in Shani-Narkiss et al, Kay and Laurent, 1999, and in our lab, we observe trial-to-trial variability that occurs in the same recording session; as the reviewer correctly points out, this cannot be due to neurogenesis. These fluctuations may be trial to-trial noise, or reflect dynamics associated with other behaviors such as running (Chockanathan, et al. 2021) and decision making (Kay and Laurent, 1999). There is growing repertoire of literature showing that neural variability in early sensory coding appears to depend on behavioral fluctuations and internal states (Niell and Stryker for example). This variability that happens within a session in the Shani-Narkiss et al work may reflect some of these behaviorally relevant features of early olfactory coding, something that our model cannot account for. This is an excellent discussion point and we have included text (line 153-157, and line 321-330) in the manuscript to note this aspect of the data and how one can think of it in the context of our results.

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      Thank you for raising this important point regarding the findings of Kato et al. (2012). We agree that their results suggest increased sparsening and stability in M/T cell odor responses with repeated exposure. However, as noted in Yamada et al. (2017), the experimental literature on this question remains mixed. Yamada and colleagues reported a “drastic reorganization of ensemble odor representation” across days and emphasized that “sensory experience does not necessarily cause a major sparsening of the odor response,” explicitly contrasting their findings with those of Kato et al. (2012).

      Our model captures the dynamics observed in Yamada et al. (2017), providing a mechanistic explanation for how significant reorganization can emerge in M/T ensembles despite stable low-dimensional population structure. In both Yamada et al (2017) and Kato et al (2012) the investigators have nuanced differences in experimental design (method of head fixation, behavioral paradigm used, training etc.), all of which are known to affect olfactory responses and therefore the degree of sparsity and overlap in population codes. Our model does not include any of these behavioral features that may differentially engage the olfactory circuit and thus affect population responses. Notably, in previous work, we highlight how even simple changes to top down feedback that reflect one phenomenological manipulation to functional connectivity in the olfactory circuit could have disparate effects on the degree of sparsity in neural representations over time whereby this manipulation would be activated by some behavior broadly. In our current model, there is no behavior that would allow us to study the critical features of the neural activity code in the M/T cells. Instead we focus on one specific aspect, adult neurogenesis which we can explicitly manipulate and affect in a biologically meaningful way. The review’s point however is well taken and important, and we have added text to the Discussion (line 336-344) to highlight the differing experimental outcomes and to clarify how our model aligns with the Yamada et al. results.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      Thank you for raising this important point regarding the lifespan of granule cells (GCs). We agree that developmentally born GCs are not fully replaced. Indeed, multiple studies indicate that some developmentally born GCs can survive for very long periods, up to 18-24 months, essentially the lifetime of the animal (Kaplan, 1985; Petreanu & Alvarez-Buylla, 2002). However, the fraction of total GCs that such long lived GCs constitute remains an open question, in part because of challenges to measure the lifetime survival of newborn neurons. What there is consensus on is the significant size of the granule-cell population undergoing continuous turnover through adult neurogenesis (reviewed in Lepousez et al., 2013).

      We should clarify that we do not assume that 100% of the granule cell population turns over in an 11 day period. We use “day” to represent a static epoch over which we can implement plasticity rules across two time scales. Critically, we also randomize the turnover treating every cell in the GC population as equally likely to be replaced. Prior experimental evidence suggests that some GCs are more likely to persist (possibly as a result of experience, Magavi et al., 2005) which may in some regards make our result on stabilization following repeated sensory exposure more dramatic (as the GCs that show the largest change following STDP may also be the ones that are the most stable, and therefore least likely to turnover). We do not include this in our model as we could not identify a framework for “selecting” which GCs would persist that would not be tautological. The point the reviewer raises is critical, and a discussion of these points is warranted - which we now include in the manuscript (line 352-361).

      Additionally, there is some evidence that behaviors, such as novelty, can increase the rate of adult neurogenesis (Kamimura et al., 2022, H.van Praag et al.,1999, Gheusi and Lledo., 2014) , suggesting a complex reciprocal relationship between the mechanisms that generate the cells shaping how olfactory stimuli are encoded for and the encoding process itself; our model also does not include any of these dynamic features which represent an additional layer of complexity, which may further provide an intermediate time scale, one of behavioral selection and action, that is slower than the milliseconds on which spike time dependent plasticity happens, but faster than the time scale of neurogenesis. We include this point in the discussion also (line 352-361). 

      Our 11-day simulation however is designed to uncover how plasticity across multiple timescales (STDP and adult neurogenesis) at the network level shapes odor representations as multiple rounds of GC turnover occur. Changing the timescale and magnitude replacement in the simulations (either in terms of days or percent cells replaced) would affect the degree to which drift happens, but not phenomenon. Additionally, the representational structure in our model at intermediate time points (e.g., days 8~10) would correspond well to scenarios in which some fraction of developmentally born GCs persists in the circuit. Thus, our simulations span a range of possible empirical regimes, from high turnover to partial preservation. We have added discussion to the revised manuscript (line 352-361) clarifying this point and acknowledging the biological heterogeneity in GC lifespans.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      Thank you for pointing us to these important studies. We fully agree with the reviewer that the structure of the olfactory system might not be purely random, but we do not believe these papers contradict the level of abstraction used in our model.

      Zeppilli et al. (2021) map molecularly defined projection neuron subtypes and their preferential targeting of different cortical and subcortical regions, but they do not report any fine-scale topographic organization of bulb → piriform connectivity that would contradict a view of randomly distributed input to piriform cortex. Studies from our lab using retrograde tracers in the blub show some spatial clustering of piriform cortical neurons whose axons project to the bulb (Padmanabhan et al., 2016, 2019), but these studies do not identify any “functional organization” or structure. Chae et al., (2022) focus on distinct long-range functional loops (mitral ↔ piriform vs tufted ↔ AON) and the differential role of cortical feedback, but again, at the level of cortical regions rather than individual cells and connectivity. Notably, our model does not consider AON.

      Finally, Fink et al. (2025) reports a “like-to-like” excitatory connectivity motif within the piriform cortex and an experience-dependent reorganization of inhibitory synapses. As the authors note, “... this like-to-like motif is unlikely to reflect common input from the olfactory bulb”, so it does not conflict with our assumption of broadly random bulb → piriform input. This “like-to-like” motif is reflected in our model by wiring a certain subpopulation of piriform cells. On the other hand, we agree that the experience dependent changes in inhibitory connectivity within PCx are highly relevant for learning related plasticity but fall outside the scope of our study. We intentionally omitted piriform plasticity to isolate the contributions of adult neurogenesis in the bulb and plasticity acting on adult-born granule cells. But incorporating such cortical plasticity is an important direction for future work. We added a discussion (line 395-405) on this important point raised by the reviewer in the revised manuscript.

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      Thank you for these thoughtful questions. We clarify the logic and purpose of the low-dimensional analyses and address each point below.

      (1) Which representation is relevant for brain function, the high-dimensional or low-dimensional one?

      We believe both representations are meaningful, with each capturing different aspects of the neural code. The high-dimensional activity reflects the full variability of individual cell responses, while the low-dimensional projection captures the dominant population level components that downstream areas are most likely to use for readout. We found that the low-dimensional representations are more stable in the bulb than in PCx, suggesting that information is used differentially between the two areas. The bulb provides a stable, sensory-anchored population code that reliably represents odor identity over time, consistent with both electrophysiological and behavioral studies (Nagayama et al., 2004, Chen et al., 2009, Davison and Katz, 2007, Cavaretta et al., 2018). This is consistent with its role as the first stage of information processing in the olfactory system which provides faithful representations that downstream circuits receive. The piriform cortex, by contrast, transforms this stable input into a more flexible representation. Drift in its low-dimensional space may reflect ongoing plasticity (Schoonover et al., Nature, 2021), integration of contextual signals, or higherdimensional computations characteristic of PCx (Fink et al., bioRxiv, 2025), suggesting its role more as an associative cortex instead of a pure sensory cortex.

      (2) What fraction of variance is included in the low-dimensional space, and how was the cutoff chosen?

      In our simulations, these PCs captured the majority of variance relevant for odor identity (~60–70% for M/T cells and ~55–65% for piriform cortex). We now report these fractions explicitly in Methods (line 937-939).

      (3) Why does STDP cause more drift in piriform-cortex ensembles than in M/T ensembles? Does this reflect higher dimensionality in piriform cortex?

      In our model, STDP does not cause more drift in PCx. It actually reduces drift and stabilizes PCx representations relative to the condition without STDP (as shown in Fig. 4C2). STDP has a much smaller effect in the bulb because: (1) M/T cells continue to receive stable odor input from the glomeruli and (2) the low-dimensional M/T representation is already stable even without plasticity. We have edited the manuscript to reiterate this point in both the results and discussion.

      The reviewer is correct that the piriform cortex naturally exhibits more drift than the bulb, and their comment that this is due to its substantially higher representational dimensionality is spot on. The PCx contains many more neurons, receives highly divergent OB → PCx inputs, and has dense recurrent connectivity, all of which create many more degrees of freedom through which representations can drift. Additionally, because individual PCx neurons are sampling from a substantially more diverse combinatorial space of inputs (include feedback to piriform from an array of regions, Illig, 2005, Majak et al., 2004, Chapuis et al., 2013), the “dimensionality” of the population code is likely higher dimensional. While STDP stabilizes the dimensions of the PCx representation that are reinforced during plasticity, due to the large number of orthogonal dimensions available, some residual drift remains. Additionally, as the reviewer notes, there are some forms of plasticity, such as inhibitory plasticity in PCx that are not included in the model, that may also have an impact on both the representations, and the underlying dimensionality of those representations. We include these points in the discussion (line 381-394).

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      We thank the reviewer for raising this question. As the reviewer pointed out, several studies have shown that abGCs integrate into the bulb circuit in an activity dependent manner. They preferentially form synapses onto mitral/tufted cells that respond to behaviorally important odors, this “selection of surviving cells” is not included in our model. Instead, we use STDP at the synaptic level. This is of course not analogous, but provides a computational framework wherein the selection of surviving abGCs could be incorporated in future studies. It is perhaps notable that in our large scale simulations, synaptic changes at the population level may reflect some of this activity-dependent selection.

      To that end, our model provides a new insight and suggests a broader function for adult neurogenesis. For example, when certain odors are reinforced in an activity dependent manner, abGCs born during that period may stabilize the circuits that respond to those odors. The resulting reduction of drift would help keep the representation of those odors stable over time, even while other parts of the circuit continue to change. We now highlight this idea in the Discussion (line 366-373).

      For the second part of the question: in our model, STDP acts on two sets of connections. It applies to the synapses onto abGCs from M/T cells, GC/SAC cells, and PCx neurons. It also applies to the synapses that abGCs project to, including those onto M/T cells and GC/SAC cells. We have clarified this in the revised Methods (line 10011004).

      (6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.

      How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

      We appreciate the reviewer’s suggestion and formalize the following two predictions from our model:

      Prediction 1: Suppressing adult neurogenesis will reduce spontaneous representational drift in the PCx. Increasing spike-timing-dependent plasticity during periods of experience with a specific odor will selectively stabilize representations of that odor.

      Prediction 2: Adult neurogenesis will not affect AON representations of odor identity or concentration in the same way that PCx representations are altered and drift.

      We include these two ideas in the discussion as experimentally testable predictions.

      Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      We appreciate the reviewer’s suggestion and added discussion on this point in the revised manuscript (line 431-435).

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

      We agree with the reviewer. The fan-out from the bulb to the piriform cortex is essential for the combinatorial coding that allows PCx neurons to represent many odor features and mixtures. This architecture gives the piriform cortex great coding capacity, but it also makes the system sensitive to small changes in its inputs. As a result, drift that originates in the bulb can spread more easily in PCx. A stabilizing mechanism is therefore needed downstream. In our model, STDP provides this stabilization by reinforcing the dimensions that carry meaningful odor structure. This allows the piriform cortex to keep a stable population code even when its inputs change over time. Neurogenesis supplies the flexibility, the fan-out supplies the expressive power, and STDP supplies the stability. All three elements work together to support a system that must recognize odors reliably while still adapting to new sensory experiences. We have added discussion on this point in the revised manuscript (line 395-405).

      Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odorevoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights).

      Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      We appreciate the reviewer’s comment and thank them for their thoughtful feedback.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time.

      This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      We thank the reviewer for highlighting this important issue. We agree that the interpretation of our model requires care to avoid implying that the olfactory bulb exhibits spontaneous drift. As the reviewer points out, the empirical literature shows that M/T-cell tuning is highly stable for infrequently experienced odors, but can change with daily, persistent odor exposure (e.g., Kato et al., 2012; Yamada et al., 2017).

      We thank the reviewer for highlighting the Bhalla and Bower paper, as it is foundational and actually raises a number of interesting and important points. As the authors noted, there was significant variability in trial-to-trial responses over sessions and days in single neurons. This is likely due to on-going dynamics (Laurent, 1999), the impact of behaviorally relevant top-down feedback (Chen and Padmanabhan, 2022), decision making (Kay and Laurent, 1999), and an array of factors that our model does not include. In that manuscript, the authors note “the variability of the same neuron recorded over different days…was not statistically different from the within day comparisons.” While these results appear prima facie to be different from our results, there are several reasons why they may not be the case.

      First, different metrics are used for measuring neuronal stability, which may contribute to some of the differences. Second, and perhaps more importantly and interestingly, the authors in that study noted the significant trial-to-trial variability within day, which is not present in our study because our model has none of the richness of behavior that Bhalla and Bower found in the freely behaving rat. This variability within day (which is much higher than what we report) would reduce the impact of drift across days - a result that would complicate how plasticity across multiple timescales occurs. We thank the reviewer for the insights on this critical study and include these points in our discussion (line 321-330).

      Neural responses to odor representations are incredibly variable across different time scales (Padmanabhan and Urban 2010, Angelo et al 2011, Kapoor and Urban 2006, Friedrich and Laurent, 2001, Smear et al 2011, Wesson et al 2008). In our model, none of this selection of survival related to behavior is included, nor are there specific rules about which synapses may be preferentially strengthened (due to neuro modulation corresponding to behavioral choice and reinforcement learning). Instead, we aimed to recapitulate the experimental design of a few studies (Kato et al 2012, Yamada et al, 2017) to understand how neurogenesis and drift are related. Over the simulated 10 days, the odor is presented every day, and the network is otherwise frozen between sessions—meaning the model lacks mechanisms that would normally support recovery during intervals without odor exposure. Under these conditions, adult neurogenesis effectively interacts with repeated experience, producing gradual changes in individual M/T-cell tuning. Thus, our results should be interpreted as modeling experience dependent changes over the timescale of neurogenesis, not as evidence for spontaneous drift in the bulb. We now state this explicitly in the Discussion to prevent confusion and expand the discussion to incorporate some of these critical ideas (line 321-330).

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      We thank the reviewer. As the issue raised here is related to the previous comment, we have clarified this in the revised text to avoid any misleading comparison and specify what aspects of our computational model map onto experimental studies and what aspects we cannot recapitulate and as a result, the places where our comparisons are limited.

      (2) The authors show that in a reduced-space correlation metric, the correlation of lowdimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      We agree with the reviewer that some of the cells in Shani-Narkiss Figure 8B showed relatively stable responses (while others did not). However, there is a clear monotonic increase in the “Average differences” over time, from “Same day” to “1 month” to “6 month”, as quantified in their Figure 8B. Although the author concluded that they "find a relatively stable response of single neurons”, we would argue that their data also provided evidence for what we would term “relatively unstable responses” as found in our model. But per reviewer’s suggestion, we better clarify it in the text now (line 194197).

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L3146). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

      We appreciate the reviewer’s suggestion and edited the text to make it more accurate (line 319-320).

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Line 28 "a graduate alteration in sensory perception". We do not know if drift results in changes in perception. If anything, behavioral evidence suggests that perception remains stable in spite of drift. For example, in Driscoll et al. (2017) mice are able to successfully navigate a virtual T maze despite drift, and in Schoonover et al. (2021), mice maintain aversive responses following fear conditioning, despite drift in the piriform. Finally, spatial navigation appears unimpaired despite pronounced drift in the hippocampus (e.g., Climer et al., 2025). It would be more appropriate to say "stimulusevoked activity patterns" than "sensory perception" or other words that refer to neuronal activity rather than cognition or behavior.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 27).

      (2) In the introduction, the authors state: "This representational drift has led to the hypothesis that PCx, rather than being a primary sensory area, may be more like an association cortical region." (L76-78). However, the hypothesis that PCx operates as an association cortex comes originally from Haberly's work and thinking (e.g., Haberly and Bower, 1984, elaborated in extensive detail in Haberly, 2001). I think it would be appropriate to acknowledge that here.

      We added the references to make acknowledge that per the reviewer’s suggestion (line 77).

      (3) In the methods, the authors elegantly describe how they induce neurogenesis in their model using weight reshuffling (L805-814). I think it could really help the reader understand the model if this idea were also included in the results section. As the results section currently reads, it seems as if their model implemented neurogenesis in a different fashion: "To do this, following elimination of 10% of the GCs in the network, we added new cells and randomly assigned synaptic weights between these abGCs and M/Ts". I appreciate that in their model, shuffling all the weights of a given GC randomly is akin to "elimination", but I feel like at first blush the results section risks giving an impression a bit different than that actually used in the model.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 110-112).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work develops a simple, rapid, low-cost methodology for assembling combinatorially complete microbial consortia using basic laboratory equipment. The motivation behind this work is to make the study of microbial community interactions more accessible to laboratories that lack specialized equipment such as robotic liquid handlers or microfluidic devices. The method was tested on a library of Pseudomonas aeruginosa strains to demonstrate its practicality and effectiveness. It provided a means to explore the complex functional interactions within microbial communities and identify optimal consortia for specific functions, such as biomass production.

      The primary strength of this manuscript lies in its accessibility and practicality. The method proposed by the authors allows any laboratory with standard equipment, such as multichannel pipettes and 96-well plates, to readily construct all possible combinations of microbial consortia from a given set of species. This greatly enhances access to full factorial designs, which were previously limited to labs with advanced technology.

      Another strength of the manuscript is the measurement and analysis of the biomass of all possible combinations of 8 strains of P. aeruginosa. This analysis provides a concrete example of how the authors' new methodology can be used to identify the best-performing communities and map pairwise and higher-order functional interactions.

      Notably, the authors do exceptionally well in providing a thorough description of the methodology, including detailed protocols and an R script for customizing the method to different experimental needs. This enhances the reproducibility and adaptability of the methodology, making it a valuable resource for researchers wishing to adopt this methodology.

      We thank the reviewer for their thoughtful comments and positive assessment of our work. Below we detail the changes we have introduced in the manuscript to clarify issues raised by the reviewer.

      While the methodology is robust and well-presented, there are some limitations that should be acknowledged more thoroughly. First, the method's scalability is an important factor. The authors indicate that it should be effective for up to 10-12 species, but there is no discussion of what sets this scale: time, amount of labor, consumables, the likelihood of error, sample volume, etc.

      The 10-12 species estimation is based on our own experience implementing the protocol, and set primarily by time, labor, and consumables (as rightly pointed out by the reviewer) rather than conceptual limitations of the approach. We have added clarifications in the Discussion (lines 401-405) regarding these scalability-limiting factors.

      Second, this methodology is tailored to construct communities where the abundance of each strain is identical in each combination. Therefore, combinations with a different number of strains also differ in the total initial amount of microbial cells. Second, variations in the initial proportions of the same set of strains cannot be readily explored.

      Note that the “density homogenization” step is optional and it could be skipped entirely, which would result in a same species being present at variable densities across consortia: specifically, skipping this step would make the density of a species in a consortium inversely proportional to the number of species in that consortium. Further variations in initial abundance could be explored by treating a same strain at two (or more) starting abundances as distinct inputs of the protocol – though this would naturally increase the number of combinations to test.

      We have included a paragraph in the Discussion (lines 416-423) describing how we can, in principle, extend our protocol to explore abundance effects.

      Third, the manuscript only discusses how to construct the combinations, and not how to assay them afterward (e.g. for community function, interspecific interactions, etc.). While details on how to achieve these goals are clearly outside the scope of this work, the use of biomass as an example function may obfuscate this caveat, which should be stated more explicitly.

      We agree that the manuscript focuses exclusively on the construction of microbial communities and does not address how these communities should be assayed afterward. This is an intentional scope decision. The proposed protocol is fully compatible with a wide range of functional, interaction-based, or omics-based assays. Absorbance is mentioned as an illustrative example of a possible readout, rather than as a recommended or exclusive parameter. We have revised the text to explicitly state that the assessment of community function or interspecific interactions lies outside the scope of this work and must be tailored to the specific biological question being addressed.

      Reviewer #1 (Recommendations for the authors):

      A few specific technical notes and notes about clarity:

      (1) It may be worth being more explicit about how to produce replicates. For example, producing technical replicates by inoculating multiple times from the same set of combinations, while biological replicates require making the combinations multiple times.

      We have updated the main text to clarify this point (line 780-781).

      (2) Figure 2C: May be worth adding some context to these performance numbers. What are typical accuracies? What would they be in a liquid handler?

      Assessing typical accuracies is nuanced since the error depends not only on the assembly steps, but also on potential intrinsic variation of the specific community function being tested and the method used to quantify it. One of the main reasons for including the experiment using colorant combinations was precisely to minimize these other sources of variation. In this experiment, we find that the error we quantify is consistent with cumulative pipetting variation (as a reference, a typical lab micropipette has an error of 0.5-1%). This is now explicitly mentioned in the manuscript.

      (3) Figure 5A: I realize it is unlikely that strains go extinct in these experiments. But it is still worth clarifying that the number of strains is the number inoculated, rather than the one present at the time of measurement.

      We updated the caption of Figure 5A as recommended by the reviewer.

      (4) Figure 5B: I realize this is just for illustration purposes, but you should provide more information about the magnitude of the difference in performance of these combinations and the confidence in their ranking (or variability in performance across replicates).

      Following this suggestion, we have added a paragraph where we report the variation across replicates for the highest-performing consortia (lines 318-323). Indeed, while variation across replicates is small, it is enough to produce an overlap between the confidence intervals of the function of some of the highest-performing consortia. This is now explicitly acknowledged in the manuscript.

      (5) Figure 5C: I believe the bold black lines indicate the combinations shown in panel D, but that is not explicitly stated.

      We have updated the caption of Figure 5C.

      Reviewer #2 (Public review):

      A simple and effective method for combinatorial assembly of microbes in synthetic communities of <12 species.

      Overall, this manuscript is a useful contribution. The efficiency of the method and clarity of the presentation is a strength. It is well-written and easy to follow. The figures are great, the pedagogical narrative is crisp. I can imagine the method being used in lots of other contexts too.

      The authors could better clarify what HOIs mean. They could address challenges with assaying community function. However, neither of these “weaknesses” affects the primary goal of the paper which is methodological.

      We thank the reviewer for the positive assessment. With respect to HOIs, we recognize that defining and quantifying them is a non-trivial subject within the broader field of microbial ecology (see e.g. ref. 24 within the manuscript). Since our aim with this manuscript is methodological, as the reviewer notes, here we have done our best to avoid introducing new or ambiguous definitions. For this reason, we simply adopt a definition given in previous works (including refs. 10, 19, 24, 29, 37, and 38 in the manuscript), where the context-dependence of pairwise interaction terms is taken as a signature of HOIs. With respect to the challenges in assaying community function, please see our responses below.

      Reviewer #2 (Recommendations for the authors):

      Overall, this manuscript is a useful contribution, I appreciate the authors taking the time to write it up! I have a few relatively minor comments.

      (1) It would be nice in the introduction to address why we might want the full factorial construction of communities in the first place. This is an especially relevant question in light of the authors' 2023 Nat E&E paper where they showed that the function of communities can often be learned even when only a fraction of all possible communities is measured. This is addressed in part in the paragraph on line 34, but I think it might be worth expanding a bit given the focus on the paper.

      We sincerely appreciate the reviewer’s feedback. In fact, one of the reasons that make full factorial construction desirable is precisely to test theoretical and computational models of community function, including (but not only) the statistical models developed in our 2023 Nature E&E paper. In that work, we showed that low-order models can explain a substantial fraction of the variation in community function in previously-published datasets, but we also predict that the same models could fail under complex structures of microbial interactions (e.g., strong high-order interactions). The protocol we present here enables the empirical quantification of such interactions, making this prediction (and others) directly testable. We have included that clarification in the revised manuscript (lines 56-58).

      (2) Around line 74, I think it is worth mentioning that even this elegant design will face insurmountable practical challenges (time, liquid handling operations, number of plates will explode) for full factorial design with 20, 30, 40 species or more. This is relevant for some very complex synthetic consortia that some microbiome groups are constructing (e.g. hCom2 from Huang/Fishbach groups) https://www.sciencedirect.com/science/article/pii/S0092867422009904.

      We agree with the reviewer that full factorial designs become impractical for very large species pools. These limits are now more clearly mentioned in the revised manuscript. We refer the reviewer to our response to comment #1 by Reviewer 1 for further details.

      (3) The binary construction is a really nice clean way to explain the protocol. Appreciate the pedagogy!

      We thank the reviewer for the appreciation.

      (4) In the experiment with pseudomonas strains the consortia are grown in LB. This medium will support growth to relatively high OD (>1). At these densities, the change in OD with density is almost certainly not linear with cell density, and this nonlinearity likely depends on strain identity. In this case, the assumption of additivity may not hold. As a result, some of the observed "interactions" may simply be non-linearity in the assay and not the abundance of bacteria in the communities. Of course, this does not affect the assembly protocol in any way, but it does complicate the interpretation of interactions via this assay. I think this is worth pointing out since other researchers may have to think carefully about the assay they use when constructing these synthetic consortia. I think in this methods paper it is important to emphasize this so other researchers do not mistakenly identify interactions due to issues with the assay.

      We thank the reviewer for pointing out this important aspect. In our experiment, we use Abs<sub>600</sub> simply as an example of a measurable community-level function. The reviewer is absolutely correct in that mapping absorbance to biomass is nuanced at large OD values, where this relationship becomes non-linear. While this is not an issue from the perspective of the protocol itself, it is indeed an important consideration for users who may want to obtain reliable quantifications of biomass. We have updated the manuscript to explicitly mention this potential issue (lines 307-313). We have also emphasized the fact that our focus on Abs<sub>600</sub> is strictly for illustrative purposes, and we have removed all instances where a direct mapping from Abs<sub>600</sub> to biomass was implied in the text.

      (5) Subtle point regarding HOIs. HOI (or pairwise) statistical interactions need not quantitatively be the same as interactions in a lotka volterra sense. I realize the authors do not explicitly use the term "interaction" in an gLV model formalism but this is how the majority of readers will interpret this term. I believe it is a research question as to how pairwise gLV interactions manifest themselves in terms of functional interactions. For example, a purely pairwise LV model could easily have HOI "functional interactions" if the function is total abundance since abundances depend nonlinearly on LV interactions. I think this part of the manuscript could be confusing to readers for this reason. I think the term "functional interaction" really helps with this issue, but just asking the authors to make sure this is clear.

      I say this because ref: 37 is focused on HOIs in an LV sense. Here, as the authors are aware, they are computing statistical "interactions" in the sense of epistasis. Given that they are computing this epistasis averaged across all community compositions a more appropriate citation might be [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004771] where the same quantity is computed in a protein context.

      We thank the reviewer for pointing out this important issue. Indeed, we use the term “interaction” in a statistical sense (as the deviation of the observed community function from a null, additive expectation) rather than in a Lotka-Volterra sense. We agree that the reference suggested by the reviewer is more appropriate in this context. We have updated the reference list accordingly.

      (6) Figure 5G - a little hard to see. Any way to show this data more clearly? It looks like all interactions have a mean of 0 because of the way the data are presented.

      The reviewer is indeed correct in that, as defined, the interactions that we quantify are back ground dependent, and their average across backgrounds lies near zero for all species. More than an issue with the representation, we think that this is an important empirical observation: it indicates that a same species pair may interact positively or negatively depending on its ecological context. We believe that the current representation is most appropriate for making this clear, but we would be open to discussing alternatives if the reviewer had a specific suggestion in mind.

      Reviewer #3 (Public review):

      The authors developed a useful methodology for generating all combinations of multiple reagents using standard lab equipment. This methodology has clear uses for studying microbial ecology as they demonstrated. The methodology will likely be useful for other types of experiments that require exhaustive testing of all possible combinations of a given set of reagents (e.g., drug-drug antagonism and synergy).

      The authors provided a useful R script that generates a detailed experimental protocol for building the desired combination from any number of reagents. The produced document is useful and has clear instructions. The output of the computer script will be strengthened if graphical output is also provided (similar to the one provided in Figure 1C).

      The authors show that the error rate of the method doesn't go up with the number of combinations using dyes (Figure 2).

      The authors demonstrate the value of their methodology for studying interactions within microbial consortia by assembling all possible combinations of eight strains of Pseudomonas aeruginosa. The value of their methodology for this application is well-founded. However, it is also unclear why specific experimental choices were made for this application. It is unclear why authors continue to show the absorbance measurements of strain assemblies over the entire wavelength spectrum and not just for ABS 600 nm (Figures 3 and 4). It is also unclear why the authors provided information on the "sum of the three spectra" as this reference line is meaningless and not a reasonable null model for estimating how well specific strain combinations will grow together.

      Figure 5 illustrates the various analysis types that can be performed on the data collected from growing combinations of eight Pseudomonas aeruginosa strains. It is a very informative figure since it provides a "roadmap" on the various ways in which the dataset produced can be explored. The information in Figures 5 and S6 will likely be very useful for a wide audience.

      Reviewer #3 (Recommendations for the authors):

      (1) Congratulations. I think the manuscript lays out a simple and very elegant methodology that will be useful for many. While I think the method is overall well explained and rationalized, the paper can greatly benefit from further expansion of Figure 5 at the expense of Figures 3 and 4.

      We thank the reviewer for their thoughtful assessment of our work. We have considered the recommendations and discuss the following points in response.

      (2) Unless I am missing something, there is no reason to present data collected across the entire wavelength spectrum for microbial assemblies (Figures 3 and 4). Moreover, using the same color palette for bacterial strains (Figure 3A) and colorants (Figure 2) is highly confusing. I suggest considering using only the 600 nm wavelength for any data collected from microbial assemblies and using a very different color palette for bacteria and colorants to avoid misinterpretation of the data.

      We thank the reviewer for this suggestion. Our goal with Figures 3-4 was to illustrate the convenience of the protocol and the ease with which many measurements can be performed in parallel once the combinatorial assembly has been completed. While we focus on Abs<sub>600</sub> for all subsequent analyses, we chose to display the full spectra in Figs. 3-4 in hopes that future studies can make use of our rich dataset to interrogate questions on microbial interactions, with the option to focus on other wavelengths (which can effectively be treated as different community-level functions in their own right; for instance, we have previously used Abs<sub>405</sub> as a proxy for siderophore concentration). We think there is value in Figs. 3-4 in their current form to make this clear to readers.

      (3) Unlike dye absorbance, bacterial carrying capacity has an upper limit, so summing individual population absorbance as a reference line seems unjustified. If the summation of absorbance is meant to provide a "null model" for expected growth, a more suitable model should be considered (e.g., max spectra or a weighted sum of the spectra from individual members).

      We agree with the reviewer that our null model is not biologically constrained, and we did not intend to imply that the additive expectation was derived from biological principles. Instead, this additive expectation should be interpreted as a simple statistical baseline with minimal assumptions. The use of an additive baseline for quantifying microbial interactions has been addressed in the literature (see, e.g., references 10, 19, 24, 29, 37, and 38), and so here we chose to conform to this convention to avoid introducing new, non-standard quantifications of pairwise and higher-order interactions. We have revised the text to make this more explicit.

      (4) The R script is a valuable tool. I think that a valuable improvement will be to also generate visual representations as part of the script’s output such as the colored plates in Figure 1C that are specific to the generated protocol.

      We have updated the script so that it now also outputs a table specifying the location of each consortium within the plates. We chose to make this a text, rather than a graphics output, to ensure cross-device compatibility.

      (5) The discussion rightly acknowledges the potential to extend the protocol to larger libraries using liquid handlers. To facilitate this implementation, it might be beneficial to modify the script output so that the ‘volume’, ‘plate’, and ‘column’ values are tab- or comma-delimited.

      We thank the reviewer for the suggestion. We have modified the output so that it is now tab-delimited.

      (6) Figures 3 and 4 do not provide a lot of insight. I would suggest combining them into a single figure and using only absorbance values at 600 nm. It would also be interesting to add a histogram of these absorbance values and possibly show histograms for subgroups (e.g. all assemblies with more than 3 strains vs all assemblies with 3 or fewer strains).

      With respect to Figs. 3 and 4, we refer the reviewer to our response to comment #2. With respect to the histogram/subgroups plot, we understand that this would be a slightly modified version of the current Fig. 5A, where we show means and standard deviations across all subgroups of 1 to 8 species, and so we find it unclear what this figure would add.

      (7) With the recommendations of removing or reworking Figures 3 and 4, and the fact that Figure 5 is data-rich (and extremely useful), it would be beneficial to split Figure 5 and include the data shown in Figure S6 in the main figure. The analysis in Figure 6S is valuable and it might be beneficial to elevate this analysis to a primary figure and provide a detailed explanation of its rationale and methods in the main text.

      We appreciate this suggestion. In our view, we find that both the text and the figures benefit from a heavy focus on the assembly protocol, as this is the main contribution of this work. While we do think it is valuable to highlight the type and amount of data that can be collected with a full factorial assembly, as well as the types of analyses that can be performed with this data, we are afraid that allocating more space to these analyses may distract readers from the methodology itself. We have therefore chosen to keep the original structure for Figs. 5 and S6.

    1. Author response:

      Reviewer 1:

      We thank the reviewer for bringing a critical theoretical distinction to our attention. We agree that the Temporal Generalization (TG) results specifically rule out the reinstatement of post-onset neural codes, the idea that the brain pre-activates the same neural representation evoked by the stimulus. In fact, we mention in the discussion: "This temporal variability underscores the need for a more nuanced view of what constitutes predictive pre-activation, as no stable representational state appears to persist after word presentation that could serve as its target.".

      To our understanding, prediction is rarely explicitly defined in the literature, and the distinction between predictive pre-activation and other forms of prediction is seldom made. Moreover, the idea of compressed or abstract forms of pre-activated representations has not, to our knowledge, been explicitly articulated in the literature. Our TG findings therefore, put meaningful constraints on theories of prediction. In the revisions we will expand on this more and include a broader description of potential forms of pre-activation. We will emphasize that the TG results specifically rule out that the brain pre-activates the same neural code used for sensory-evoked processing.

      Moreover, although TG analysis does not rule out alternative notions of predictive pre-activation, we believe our second analysis (the inclusion of future word embeddings) provides independent evidence that argues against more abstract forms of prediction. Unlike the TG analysis, this encoding approach is not constrained to a specific neural code; if the brain represented upcoming words in any linearizable format (abstract, probabilistic, or latent) incorporating those embeddings should have improved the brain score at the current word's onset. We found no such improvement until the word was actually heard. In the revised manuscript, we will reformulate the narrative to clarify that while TG alone rejects a specific form of pre-activation, the combined evidence from both analyses suggests there is a broader lack of predictive pre-activation.

      Reviewer 2:

      We thank the reviewer for their constructive feedback and for bringing to our attention the missing information in our Methods section. We realized that the final two sections were inadvertently omitted during formatting changes before submission. These will be restored in the revised version.

      We appreciate the reviewer's careful reading of this analysis and agree that the concern whether the decorrelation in figure 4 forces the model to unlearn the associations between pre- and post-onset activity is a valid one. To clarify, this is not what we intended to claim. Rather, our argument follows a different logic: if we assume that pre-onset encoding is purely a signature of predictive pre-activation, then decorrelating the pre- and post-onset brain responses should effectively remove that signature. The fact that pre-onset encoding remains largely intact after this procedure suggests that our initial premise was false; the observed pre-onset encoding is likely not a signature of pre-activation. We would also like to note that in this analysis, we use both residualized neural data and we use decorrelated embeddings. Therefore, the majority of stimulus dependencies are removed. Nevertheless, as the reviewer notes, some dependencies such as bi-grams and other word-co-occurrences, inevitably remain. These dependencies might explain the remaining pre-onset encoding we observed. This aligns with our main message of the paper. In the revisions, will provide a detailed description of the decorrelation process and we will make this interpretive logic more explicit in the main text.

      Reviewer 3:

      We are grateful for the reviewer’s detailed comments and for raising several points that will significantly improve the clarity and comparability of our study. Specifically, the reviewer’s feedback helped us realize that our evidence for postdiction required further clarification. While the encoding of the immediate preceding word ($d-1$) may involve recognition lags, we observe that word $d-2$ further improves the brain score even after the current word's onset, beyond what is explained by word $d-1$ alone. This may extend beyond simple recognition delays. To address this we will visualize this effect further in the upcoming version and expand the manuscript to include alternative explanations for this observation, such as extended lexical processing or integration delays.

      To ensure our results are not biased toward high-frequency or function words, we will re-run our analyses including multi-token words. Given that these words constitute a small part of the datasets, we expect our core findings to remain stable.

      In line with our response to reviewer 2, we will more clearly emphasize that despite our extensive controls, we cannot be sure that we accounted for all regularities inherent to natural speech.

      Additionally, we will increase the context windows of the LLM to match the larger windows used in previous literature and add significance tests, error bars, and noise floor indications to our figures to ensure the reliability and variability of our findings are clearly communicated.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Completeness and clarity of Methods (Weakness #1).

      We will substantially expand the Methods section to include:

      (a) Detailed information on C. difficile strain ribotype 1382 (correcting the typographical error "1482"), including its virulence characteristics, toxin production dynamics, and rationale for its selection.

      (b) Step-by-step protocols for on-chip bacterial quantification by flow cytometry, including sample collection volume, processing, and the specific normalization procedure (with clarification that normalized values are intended for within-experiment comparisons only).

      (c) Full description of mouse experiments: antibiotic pre-treatment regimen, inoculation details (spores vs. vegetative cells, justification of the 1×10^9 CFU dose), animal numbers, housing conditions, and cage-effect considerations. The IACUC approval statement will be moved from Acknowledgments to Methods.

      (2) Mucin layer characterization under anoxia (Weakness #2a).

      We will clarify in the Methods that mucin staining was performed after the initial oxic culture phase to confirm differentiation prior to anaerobic challenge. We will cite relevant literature discussing the stability of pre-formed mucin layers under short-term anoxic conditions and incorporate this discussion to contextualize our experimental design in the revised Methods.

      (3) Discrepancy in C. difficile counts and mechanism of LXA4 action (Weakness #2b, #3).

      We will provide a detailed explanation of our flow cytometry normalization algorithm, emphasizing that values are only comparable within a given experimental batch. We plan to perform additional in vitro experiments to directly assess the effect of LXA4 on bacterial growth and toxin secretion. These data will help distinguish between direct antibacterial effects and host-mediated protection, and the revised Discussion will incorporate this analysis.

      (4) Missing controls and experimental timelines (Weakness #2c–d).

      We will clarify that Figure 4 presents gut-on-chip experiments, not animal studies. The corresponding methods will be fully described. Additionally, we will include cross-experiment alignment analyses (using the CDI group as a common reference) to integrate negative control data from separate experimental batches. We also plan to generate additional data examining the effect of LXA4 alone (without infection) on epithelial barrier integrity and inflammatory status, which will be included as supplementary controls.

      (5) C. difficile strain characterization (Weakness #1g).

      A comprehensive section on ribotype 1382 will be added to the Methods, detailing its in vitro growth kinetics, toxin production profiles, and disease dynamics in the murine model, with appropriate literature citations.

      (6) Dysbiosis definition and phrasing adjustments (Other comments #b–d).

      We will revise the text to provide a clear definition of dysbiosis in the context of CDI. We will also temper the phrasing in line 82 to more accurately describe the advantages of our GOC system relative to other in vitro models, and correct the description of C. difficile as an obligate anaerobe.

      Reviewer #2 (Public review):

      (1) Synergy between LXA4 and vancomycin in vivo.

      We agree that the synergistic effect observed in the GOC model requires validation in an animal model. We are currently conducting mouse experiments to test the combination of prophylactic LXA4 with vancomycin treatment. The results will be included as a new Figure 5 in the revised manuscript.

      We are confident that these planned revisions will fully address the reviewers' concerns and significantly enhance the rigor and impact of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a reanalysis of a large existing dataset to examine whether serial dependence effects-systematic influences of recent stimulus history on current perceptual judgments-are associated with generalization in perceptual learning. The central hypothesis is that extended, longer-range history effects (beyond the most recent trials) are beneficial for transfer across locations. The authors re analyze data from a texture discrimination task in which observers discriminated peripheral target orientation against a line background, with performance quantified by stimulus-onset asynchrony thresholds. Three training conditions were compared: a fixed single location condition, a two-location alternating condition, and a dummy-trial condition with frequent target-absent trials. Transfer was assessed after training at new locations. Serial dependence was quantified using history-sequence analyses and linear mixed effects models estimating bias weights across stimulus lags, with summary measures distinguishing recent (1-3 trials back) and more distant (4-6 trials back) dependencies.

      The authors report extended serial dependence effects, persisting up to 6-10 trials back, with substantial cumulative bias that remains stable across multiple days of training and is not correlated with overall performance thresholds. Recent history effects are stronger for faster responses, suggesting a contribution from decision- or responserelated processes, whereas more distant effects decline within sessions, potentially reflecting adaptation dynamics. Critically, longer-range serial dependence is significantly stronger in training conditions that promote generalization than in the single-location condition. Individual differences in the strength and decay profile of distant history effects predict the magnitude of transfer across locations, whereas recent history effects do not. History effects are also correlated across trained locations, suggesting stable individual differences.

      The authors interpret longer-range serial dependence as reflecting integrative processes that extract task-relevant structure over time, thereby supporting generalization, while shorter-range effects are attributed to more transient mechanisms such as priming or decision-level bias. The discussion connects these findings to Bayesian accounts of perceptual stability and to concepts of overfitting in machine learning.

      The study offers a novel and thoughtful link between short-term serial dependence and long-term generalization in perceptual learning, helping bridge two literatures that are often treated separately. The large dataset enables robust estimation of individual differences, and the use of mixed-effects modeling appropriately accounts for variability across observers. The empirical distinction between recent and more distant history effects is well-supported and adds important nuance to interpretations of serial dependence. Converging evidence from both group-level comparisons and individuallevel correlations strengthens the central conclusions.

      Several limitations should be addressed. First, the study relies entirely on previously collected data, without experimental manipulations designed to selectively isolate serial dependence mechanisms. Filtering choices, while theoretically motivated, may amplify history effects in ways that are difficult to quantify. Second, sequential dependencies can arise from multiple sources, including gradual updating of internal weight structures, adaptation processes, and history-dependent biases in decisionmaking. The current analyses do not clearly separate these contributions, limiting mechanistic attribution of long-range effects. Third, the conclusions are based on a single perceptual task, leaving open questions about generality across paradigms. Finally, while the discussion references computational ideas, no explicit modeling is provided to test whether plausible learning rules can jointly account for the observed history profiles and transfer effects.

      We now address these issues in the manuscript (see below for detailed responses) and provide a toy model (supplementary material) where the observed effects are explained by simple learning mechanisms.

      The findings align with theoretical frameworks that conceptualize perceptual learning as gradual reweighting of stable sensory representations at the decision stage (e.g., Petrov et al., 2005). Trial-by-trial updates in these models naturally give rise to sequential dependencies and sensitivity to training statistics. The observation that longer-range history effects predict generalization is consistent with broader temporal integration supporting more flexible learning, while narrower integration may lead to specificity. The results also indicate that multiple mechanisms - including decisionlevel biases and adaptation - may coexist with reweighting processes, highlighting the value of hybrid accounts.

      In summary, this is a careful and data-rich reanalysis that highlights a potentially important role for serial dependence in enabling generalization during perceptual learning. While the underlying mechanisms remain underspecified, the evidence supporting the reported associations is strong, and the work provides a valuable empirical foundation for further experimental and modeling efforts.

      Reviewer #2 (Public review):

      This manuscript investigates how people's perceptual reports are influenced by events and trials in the past, and how this long-range dependence relates to broader learning across locations in a visual learning task. The authors present clear and internally consistent analyses showing that extended temporal integration is associated with greater generalization of learning. The study is thought-provoking and may contribute meaningfully to understanding how short-term influences and long-term improvement interact, although several interpretational points would benefit from clarification.

      Strengths:

      (1) The manuscript identifies unusually long-range perceptual biases extending up to ten trials back, which is a striking and potentially important finding.

      (2) The association between strong long-range dependence and greater learning generalization is clearly documented and supported by consistent analyses.

      (3) The dataset is large and rich, and the authors apply repeated and well-controlled analyses that give confidence in the stability of the effects.

      (4) The writing is generally clear, and the manuscript raises interesting conceptual links between temporal integration and generalization of learning.

      Weaknesses / Points Requiring Clarification:

      (1) The manuscript repeatedly equates generalization with increased efficiency, but this relationship is not universally true. In some populations or tasks, excessive generalization can reduce task-specific efficiency. The authors should discuss this context-dependence to clarify when generalization is beneficial versus detrimental.

      We agree with the reviewer that generalization does not strictly imply increased efficiency; in some contexts, over-generalization can indeed be detrimental. We now explicitly note in the Introduction that serial dependence can impair performance when stimuli vary randomly across trials. We have reviewed the manuscript to ensure we do not explicitly equate generalization with efficiency. Our argument is specifically that long-range SDEs support the transfer of learning (generalization).

      (2) Serial dependence is also present, though smaller, in the central fixation task. It remains unclear whether this bias could contribute to the serial dependence observed in the main task. The authors should clarify whether the two biases are independent or whether the central-task bias might partially influence orientation judgments in the main task.

      These two tasks are independent, one requires T/L discrimination the other V/H discrimination. See our detailed response below.

      (3) Several figure captions and labels contain minor inconsistencies in formatting and terminology. Careful proofreading would improve clarity.

      We thank the reviewer for pointing this out and have proofread the captions to improve formatting and terminology consistency throughout.

      Reviewer #3 (Public review):

      This reanalysis of a classic study of visual perceptual learning in a texture discrimination task convincingly demonstrates the presence of sequential dependence effects, commonly seen in response time analyses in 2-alternative tasks, on response accuracy in the texture task in the visual periphery and in a simultaneous central letter report at fixation. Overall, this paper provides a new and interesting analysis of the effects of sequential dependencies from trial to trial on performance, learning, and generalizability in perceptual learning.

      Strengths:

      This new analysis of sequential dependency effects (SDEs) extends commonly observed sequential effects in two-choice reaction times to accuracy and relates them to response accuracy during visual learning in a frequently used perceptual learning task. The paper makes a convincing case that different conditions known to impact generalization of learning to a second visual location also express quantitatively distinct n-back SDEs.

      Weaknesses:

      Most of the new analyses emphasize the effects of SDEs, including trials designed to enhance the size of the effects, specifically when the current trial is low visibility, and the prior trial is of high visibility. Unless there is an argument that learning and subsequent generalization primarily occur in low-visibility trials, the presentation should also include displays and an emphasized discussion of analysis for all trials, unfiltered.

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) How manipulations of stimulus statistics, uncertainty, or feedback could selectively engage different forms of serial dependence

      We expect serial dependence to be modulated by all these parameters. In classical SDT, stimulus statistics are known to affect response bias, as are temporal correlations in stimulation sequences. We note in the manuscript that we employed random sequences (50% chance for V and 50% for H targets), eliminating expectation-based biases toward either orientation. Stimulus uncertainty is known to increase serial dependence, as we also found here. Feedback is also expected to have an effect, the literature is somewhat ambiguous about this, but this may also depend on experimental design. We note that the main task studied here (TDT) had no feedback while the central T/L task did have feedback, both showing serial dependencies. In the manuscript we point to reviews of SDE where much of this is discussed.

      (2) How explicit computational models could help distinguish decision bias from structural learning

      We use the drift diffusion model (DDM) to distinguish decision bias (starting point in DDM) from structural learning (changes in drift rate). DDM predicts that decision bias is short lived, mainly affects fast reaction times (RT) while biases due to drift rate asymmetry persists to long RTs. We present these results in Figure 3.

      (3) Whether similar relationships are observed in other perceptual domains

      We are not aware of any other study linking serial dependence and perceptual learning or reporting such a link. We expect the link between long-range serial dependence and learning generalization to extend beyond the TDT (see new paragraph in Discussion). We hope this framework will motivate similar analysis in other labs where comparable datasets exist.

      (4) How sensitive are the results to the filtering choices used in the analysis?

      We analyze effects on close to threshold (small-medium SOA) current targets preceded by above threshold (high SOA) reference targets. This is motivated by both technical issues and theoretical assumptions. In psychophysics, when using percent correct as a measure of performance, bias cannot be reliably estimated at or near ceiling performance, as correct responses leave little room for bias to manifest. Regarding the ‘easy’ targets used as a reference, having them at low SOA introduces uncertainty as for the reference orientation against which bias is measured, with their perceptual effect being ambiguous. Theoretically, we note that in perceptual learning with threshold targets, the introduction of clear targets in the absence of feedback enables learning (see Discussion, where we added: 'Most interestingly, in our experiments without feedback on the texture task, the experimental conditions yielding the strongest bias were also found to enhance learning in the absence of feedback (Liu et al., 2012)')

      We have addressed this concern also by conducting additional robustness analyses with unfiltered prior-trial history. We analyzed data without the prior-visibility filter; results are presented in a new Supplementary Figure S3 and confirm our main findings (see addition to Methods: "Finally, to verify that our findings are not artifacts of these filtering choices, we also conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm the robustness of our main findings.").

      Reviewer #2 (Recommendations for the authors):

      (1) Clarify mechanisms underlying long-range serial dependence. Please better distinguish possible sources of serial dependence (e.g., decision bias, adaptation, reweighting) and clarify which interpretations are supported or remain ambiguous given the current analyses

      Our manuscript discusses the mechanisms underlying the dissociation between recent and distant SDEs in the Discussion section. Specifically, we report that:

      Recent SDEs are RT-dependent (stronger with faster responses) consistent with decision-level criterion shifts (Dekel & Sagi, 2020)

      Distant SDEs are RT-independent consistent with neural reweighting/template updating

      We also discuss the role of sensory adaptation in truncating long-range integration, supported by within-session decline of SDEs, reduced distant SDEs in the 1loc condition, and the original findings by Harris et al. (2012).

      We have added an explicit acknowledgment that our correlational approach cannot definitively establish causality (see addition to Discussion: "While these converging findings support distinct mechanisms for recent and distant SDEs, our correlational approach cannot definitively establish causality, and targeted experimental manipulations would further strengthen these interpretations.").

      (2) Test robustness to analytic choices

      We have conducted robustness analyses by removing the prior-trial visibility filter. The results are presented in a new Supplementary Figure S3 and confirm that our key findings remain qualitatively unchanged (see addition to Methods referencing Supplementary Figure S3).

      (3) Strengthen the computational link

      We have expanded the Discussion to reference relevant computational models and specify predictions for future modeling work. We now cite Petrov et al. (2005). We provide a toy model implementing trial-by-trial template update that show SDE that is correlated with learning transfer. Importantly, in this model, long range SDE is a consequence of learning dynamics (see new paragraph in Discussion, and model simulation in supplementary material).

      (4) Discuss generality and experimental tests. Briefly address whether similar effects are expected across other tasks or sensory domains, and outline experimental manipulations that could causally test the role of serial dependence in generalization.

      We have added discussion of generality across perceptual domains and outlined the prediction that future work could test the SDE-generalization link in other tasks where both phenomena have been documented (see new paragraph in Discussion).

      Reviewer #2 (Public Review - Point 2): Central task SDE independence

      The SDEs observed in the central letter task and peripheral TDT are likely independent, as they involve different stimulus features (letter identity vs. orientation), different response mappings, and show distinct performance patterns across conditions. The absence of condition differences in central-task SDEs (described in the Results section under "SDE differences between conditions" end of paragraph), despite robust differences in TDT SDEs, further suggests that the peripheral orientation biases are not contaminated by central-task response tendencies. Note that the central task was fixed across conditions, stayed at fixation when location was changed, and when dummy trials were presented.

      Reviewer #3 (Recommendations for the authors):

      (1) Reference to Falmagne, Cohen, & Dwivedi (1975)

      We have added this reference to the Introduction, acknowledging the historical foundation of sequential effects in perceptual decisions

      (2) The SDE data of Figure 1 are (per the figure legend) from the 1 loc data of Harris et al., "pooled over all testing days", and filtered for trials with low-visibility current targets (SOA < SOA-threshold+20ms). Specify whether this threshold criterion is on a per-subject basis. State in the legend that "all testing days" includes Days 1-8 (4 days with the first location and another 4 days testing generalization to a second location).

      We have revised the Figure 1 legend to clarify:

      "Days 1–8; 4 days at the first location and 4 days at the second location to assess generalization"

      "calculated on a per-subject basis"

      (3) The leadup emphasizes that the analysis in the figure emphasizes trials where the effect is expected to be as large as possible (cited as 40 +/- 3%), while visible current targets (at n) biases were 5+/-1%.

      See below, after (4).

      (4) Unless a theoretical position associates learning just with low visibility (if so, explain), consider including two other panels showing the sequential dependencies for all trials, and the linear model weights over the last 10 trials for all trials.

      We acknowledge that the main analyses emphasize conditions that maximize SDE expression. To verify robustness, we conducted control analyses including all prior-trial history regardless of visibility; these results are presented in Supplementary Figure S3 and confirm our main findings.

      There are both theoretical and technical justifications for the filtering applied:

      It is well known that learning, in particular without feedback (as in our TDT), is facilitated by a mixture of threshold level stimuli and suprathreshold easy trials (e.g., Liu et al., 2012).

      Technically, it is impossible to measure bias with highly discriminable stimuli where performance is perfect or close to it, thus such trials are expected to dilute the measured effect. On the other hand, when considering serial effects from low sensitivity trials, we face an uncertainty involved in defining the actual orientation relative to which the bias needs to be computed.

      (5) Figure S1 seems to indicate that average thresholds over all days (location 1 and location 2) are unrelated to the sequential dependence across subjects and that the amount of learning in location 1 is unrelated to the sequential dependencies across subjects in all the varied conditions. Since Figure S1 includes all 50 subjects, it includes some conditions with dummy trials interspersed. Clarify in the description whether the dummy trials are ignored for the purposes of the SDE analyses.

      We have clarified in the Methods how trials are handled in the analysis: "To preserve the precise temporal structure of the data, all trials were included in the sequential n-back count across all experimental conditions, thus dummy trials were counted as time bins but their contribution was ignored. In the Linear Mixed Effects (LME) analysis, we modeled these trial types using distinct regressors: each n-back lag included separate predictors for visible and invisible targets, further differentiated by trial type (dummy vs. target) and relative location (ipsilateral vs. contralateral) where applicable. The SDE values reported here reflect only the influence of relevant target-present history trials; the effects of other history types (e.g., dummy trials), while estimated to ensure the temporal integrity of the model, are not presented."

      (6) The conclusion from this analysis seems to be that the overall average threshold and the amount of initial learning are both uncorrelated with the strength of sequential dependencies across subjects. This conclusion should be added to the description in the main paper.

      This finding is now discussed in the Discussion section, referring to the main Results section [ No significant correlation was found between biases and SOA thresholds across observers (r = -0.13, p = 0.37, average across days 1-8), nor between biases and improvements in performance at the first location (r = -0.09, p = 0.54, average across days 1-4), suggesting that the magnitude of serial dependence does not predict the overall amount of perceptual learning (Supplementary Figure S1)].

      (7) Decay of SDE section clarifications

      We have made the following clarifications:

      RT definition: Added to Methods: "The reaction time (RT) used in the analysis was defined as RT(TDT) – RT (fixation task), where RT for each task was measured from stimulus onset."

      N-back counting: Clarified in Methods (see response to point 5 above): all trials were included in the chronological sequence; the LME analysis assigned separate predictors at each lag for visible/invisible targets and for trial categories (dummy vs. target) and locations (ipsilateral vs. contralateral). The results reported do not include effects of dummy trial, except where response dependent SDE was reported (Fig 2a, SDE for response key).

      2loc n-back effect: The longer-range effects in the 2loc condition likely reflect reduced adaptation allowing longer temporal integration, combined with the location-selective nature of SDEs.

      RT and mechanism interpretation: The manuscript discusses that the critical observation is the qualitative difference in RT sensitivity between recent and distant SDEs, consistent with the drift-diffusion framework where criterion shifts are RTdependent while drift bias is RT-independent (Dekel & Sagi, 2020). We have added an acknowledgment of the correlational limitations of this interpretation.

      Moving figures to supplement: We prefer to keep Figures 4 and 5 in the main text as they document important dynamics supporting our mechanistic interpretation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, Pinto and colleagues set out to investigate whether the cow udder is a potential mixing site for the influenza virus. The authors have demonstrated that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, supporting the idea that the cow udder may be a potential site for reassortment. Furthermore, they demonstrate that the bovine-adapted IAV replicates to similar titers in avian epithelial cells when compared to an AIV precursor virus. Thus, suggesting there is no fitness trade-off, and confirms the potential for spill-back of the cattle B3.13 into poultry, which has already been observed. Overall, I believe the authors achieved their aims. However, there are instances in which the results do not entirely support the conclusions (noted in weaknesses). Given the ongoing questions surrounding highly pathogenic avian influenza A virus in dairy cows, this work provides valuable evidence for the potential of the cow udder as a site of reassortment. These findings highlight the need for surveillance of influenza A virus incursions into livestock species, particularly cows. Some specific strengths and questions regarding weaknesses have been outlined below.

      Strengths:

      (1) The authors use a diverse range of cell types and influenza A virus strains, as well as a wide range of techniques to address the questions at hand.

      (2) The use of cells from multiple bovine breeds for the MAC-T, bMEC and explants suggests the phenomenon is not unique to a single breed.

      (3) The results suggesting there is no fitness trade-off for Cattle Texas in an avian host are interesting, and confirm the potential for spill-back of the cattle B3.13 into poultry, which has been observed.

      Weaknesses:

      I have listed my complete questions/concerns below. However, there are two main weaknesses of the article in its current state. Firstly, there is no apples-to-apples comparison in terms of determining a preference for IAV to infect the cow udder over other organs (Q4). The mammary gland and respiratory tract are represented by epithelial cells, but for other organs, fibroblasts were chosen. I think the fairer comparison would be to compare epithelial cells from different organs to demonstrate a preference for the mammary gland. Secondly, the main premise of the article relies on bMEC and MAC-T (primary and immortalised mammary epithelial cells), facilitating higher viral growth than the cells from other organs. Yet throughout the article, a 10x higher dose of IAV is used in the bMEC cells compared to everything else (Q6). This raises the question of how much of the results are due to a preference for the mammary epithelial cells, and how much is simply due to the increased dose.

      When we set out to test if cow mammary gland cells were particularly susceptible to IAV infection compared to other bovine cell types, we used what was available in the Roslin Institute in the first instance – a mix of primary and continuous cells from various anatomical sites: three epithelial cell types (two mammary, one respiratory tract) two immune cell types and four sets of fibroblasts from various organs. Given the representation of different anatomical sites, cell types and differentiation statuses, we considered this a suitably diverse panel with which to characterise infection dynamics of a broad range of IAVs, before more focussed investigations using the mammary bMEC and explant tissues. Both mammary epithelial cell types grew our library of influenza challenge strains significantly better than the BAT-II respiratory epithelial cells, as well as the two immune cell types and all four fibroblast populations. Of the fibroblast cells, those derived from the brain grew IAV significantly better than the skin and turbinate fibroblasts, while blood-derived macrophages grew virus significantly better than the lymphocytes and non-brain fibroblasts. Therefore, there are “apple to apple” comparisons as well as apple to pear comparisons that give significant differences. We therefore think that our conclusions (in the abstract) that mammary cells are particularly replication competent for IAV, (at the end of the introduction) that “a wide range of cow-derived cells are susceptible” and that (in the results section) that “mammary cells showed the highest susceptibility” are entirely justifiable. We do not claim that mammary cells are the only permissive bovine cells, but our evidence suggests they are highly susceptible.

      We used a higher MOI for bMECs because test experiments with WT PR8 and the Cattle Texas 6:2 reassortant showed that MOI 0.01 infections gave more variable results than ones run at MOI 0.1, perhaps because of the intrinsic variability of mixed primary cell populations. We therefore chose to go with the higher MOI. However, the end-point titres between the two conditions were not significantly different, so we do not think this choice is a confounding issue. We will add the comparison of the two MOIs as a supplementary figure in the formal revision.

      Reviewer #2 (Public review):

      The authors use a library of influenza A viruses from different strains, classified in lab-adapted, human, avian, and swine according to the animal from which they were isolated. They propose that the cow mammary gland serves as a mixing vessel for influenza A viruses. As a first approach, the authors assess susceptibility to infection across different cell types, including continuous and primary cell lines, bovine mammary cells, and mammary explants. All these cells support polymerase activity. Then, they analyzed changes in the bovine virus's viral fitness relative to an avian precursor. The authors use single-gene replacement to study whether and which RNP segments improve viral transcription. As part of this section, they also test IFN-specific antagonism by NS1 to assess the input of segment 8. Quantitative glycomic analysis was performed on the continuous bovine mammary cell line to demonstrate the presence of both a2,3 and a2,6, which is consistent with their observation that these cells can be co-infected with human and avian IAVs simultaneously. The main question, however, is: what is the glycome in the explants, or directly from tissues?

      We report quantitative glycomics for the primary bovine mammary epithelial cells as well as the continuous line the referee highlights. However, we agree with R2 that a detailed glycomic analysis of primary bovine mammary tissue would allow a better understanding of the actual glycosylation status in vivo. This has now been undertaken by the authors and is available as a bioRxiv preprint:

      Bovine H5N1 influenza viruses have adapted to more efficiently use receptors abundant in cattle

      Jack A. Hassard, Jiayun Yang, Bernadeta Dadonaite, Jonathan E.Pekar, Jin Yu, Samuel A. S. Richardson, Rute M. Pinto, Kristel Ramirez Valdez, Philippe Lemey, Jessica L. Quantrill, JinghanXue, Tereza Masonou, Katie-Marie Case, Jila Ajeian, Maximillian N. J. Woodall, Rebecca A. Ross, Nicolas Hudson, Kan Zhong, Hongzhi Cao, Samuel Jones, Hannah J. Klim, Brian R. Wasik, Desi N. Dermawan, Jean-Remy Sadeyen, Dirk Werling, DylanYaffy, Joe James, Alessandro Nunez, Paul Digard, Ian H. Brown, Daniel H. Goldhill, Pablo R. Murcia, Claire M. Smith, Yan Liu, Jesse D. Bloom, Munir Iqbal, Wendy S. Barclay, Stuart M.Haslam, Thomas P. Peacock: bioRxiv 2026.04.02.715584; doi:https://doi.org/10.64898/2026.04.02.715584

      Overall, the manuscript is clearly written and provides new insights into the behaviour of the cattle isolate, now compared with a representative group of model or precursor HAs of different origins.

      It would be great if a consistent nomenclature for the IAV strains could be used in the study. There is a mix of origin (Texas), animal from which the virus was isolated (mallard), or abbreviations that do not follow guidelines (IAV07). Are the USSR and Udorn not lab-adapted?

      We chose the abbreviated names for a variety of reasons. Partly from common usage (e.g. PR8, Udorn), partly for consistency with other already published papers from the FluTrailMap consortia (e.g. Cattle Texas; Dholakia et al 2026), partly to make diversity obvious in certain figures (e.g. H3N1, H5N2 etc) and partly to avoid confusion between viruses that originate from the same geographic area (e.g. AIV07, AIV09, H5N8-20 etc which are all Ck/England/isolate numbers). Overall, we found it more confusing to use the expanded nomenclature. Re AIV07 which the referee criticises for not following naming guidelines – if this is a reference to the EURL nomenclature, AIV07 is the abbreviation for the specific virus A/Chicken/England/053052/2021, our representative virus for EURL genotype EA-2020-C, as we say in the text. We should however have included this nomenclature in Table 1, which otherwise provides a cross-reference for all the names. This will be added in the formal revision to help with clarity.

      As to whether USSR and Udorn are lab adapted – that depends on definitions. There is a continuum of adaptive changes and/or sequence drift starting from the very first growth of an isolate in the laboratory. The viruses we define here as lab adapted are ones that have been deliberately adapted to other hosts or which have very long passage histories in multiple host species resulting in known functionally significant changes. For example, PR8, with 100s of passages in mice, ferrets and embryonated hens eggs (doi: 10.3390/v12060590), makes it unarguably lab-adapted. We admit that A/USSR/77 and A/Udorn/307/1972 are probably further along this adaptive pathway than more recent isolates such as A/Norway/3433/2018, but are unaware of any specific reason that would put them into our lab adapted category.

      The experimental setup includes bovine mammary primary and continuous cells, as well as mammary explants. Some of the most significant differences, for example, in viral fitness studies and co-infection experiments, are observed in these explants. Perhaps there could be some additional focus on this observation. The implications in comparison to the results obtained in cultured cells could be described. How will the human and other HA subtype viruses fare in the explants?

      We agree that this is an important and interesting question, and have tested the strains we used for co-infections, human seasonal H1N1 “Norway” and low pathogenic avian influenza “H3N1”, in the mammary explants. Both replicate, the avian virus to 20-fold higher titres. We will add this new information to the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This excellent manuscript by Pinto, Sharp, and colleagues examines bovine tissue tropism for influenza viruses. They find that bovine flu, as well as other strains, has strong replication in mammary tissue. They also map the genetic changes to influenza that improve replication in bovine cells. Overall, the study is well designed and executed, and the results are very timely.

      Strengths:

      (1) The experiments are well-controlled.

      (2) The figures are well-constructed and easy to follow.

      (3) The Methods and legends are detailed, with sufficient information.

      Weaknesses:

      (1) A comparison to human cells would strengthen the overall impact of the results. Are human mammary cells also uniquely susceptible to influenza? Are bovine mammary cells special in some way?

      This is an interesting question but we have not tested mammary gland cells from humans (or any other species of mammal), but we have reported elsewhere (Dholakia et al., Nat Commun. 2026 Jan 16;17(1):1603. doi: 10.1038/s41467-026-68306-6.) that Cattle Texas grows well in a variety of human respiratory cells. Here we are considering the bovine mammary organ as a potential reassortment site for IAVs; human mammary organs are unlikely to create this opportunity.

      (2) For the virus infection studies with segment 8 swaps, it should at least be noted that some of the phenotypes could be driven by NEP.

      We agree, and will change the text to acknowledge this in a revised version.

      (3) The data demonstrating that bMEC can support co-infection are compelling and important, but would be strengthened with a comparison from a different cell type or species. Do mammary cells uniquely support higher co-infection?

      We have data showing that co-infection also occurs in the continuous MAC-T udder cell line and will include these data in a revision. We have not tested bovine cells from other organs for co-infection potential as they do not seem to be significant sites of infection in vivo.

    1. Author response:

      We sincerely thank the Reviewing Editor, Senior Editor, and both reviewers for their careful and constructive assessment of our manuscript. We are encouraged that the reviewers recognize the value of our dataset and its potential contribution. We greatly appreciate the thoughtful comments and have carefully considered the reviews. We plan to revise the manuscript accordingly. 

      First, we will revise and refine the cross-species comparative analysis, with particular attention to clarifying the basis of the comparisons between ascidian and mouse endodermal lineages. In particular, we will adopt a more cautious and precise comparative framework, clarify the scope and limitations of the mouse comparison, and broaden the context by incorporating additional vertebrate and invertebrate deuterostome systems where relevant.

      Second, we will strengthen the gene-level interpretation of the identified endodermal populations and clarify the molecular basis for the similarities and differences. In particular, we will more clearly identify the key marker genes defining each population, better explain their relationship to previously described developmental sources. 

      Third, we will improve the clarity of the Results presentation, including the description of the two major endodermal progenitor populations and their subcategories, as well as the organization of the text, figures, and figure legends. 

      Fourth, we will substantially rewrite the Discussion, especially the sections dealing with evolutionary implications, to ensure that our interpretations are presented in a more cautious manner.

      These revisions are intended to address the reviewers’ concerns regarding both the evolutionary framing and the presentation of the data. We believe that these revisions, which will include both rewriting and additional analyses, will improve the clarity and rigor of the manuscript. We look forward to submitting a revised version.

      We thank the editors and reviewers again for their time and expertise.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc plays a key role in sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well-designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      We thank the reviewer’s thoughtful comments regarding the mutagenesis analysis and the possible mechanisms underlying zinc regulation of Slo3. Regarding the suggestion to perform double or combination mutants, we agree that such experiments would provide valuable mechanistic insight. However, due to limited resources, we were not able to perform these additional experiments within the scope of this study. Our current results show that mutations at E169 and E205 partially abolish zinc inhibition, which suggests that the inhibitory mechanism is not mediated through a single residue and is likely more complex.

      Alternative mechanisms that may contribute to zinc modulation of Slo3 include indirect effects through modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites within Slo3 channel other than the sites discovered through this study. At present, these mechanisms remain speculative and further studies will be required to clarify their contributions. This study provides the foundational basis for understanding how zinc inhibits the Slo3 channel and serves as an important starting point for defining the molecular mechanism in more detail.

      We already acknowledged in the Discussion section that the precise molecular basis of zinc inhibition remains unknown and that future work involving more extensive mutational and structural analyses will be essential to fully resolve this issue.

      We also added the discussion section as follows:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.

      Thank you for highlighting a very important point that requires deeper discussion and explanation regarding how zinc regulates Slo3 current at the molecular level. As reported, Slo3 is gated by membrane depolarization and, at the same time, this channel is also gated by intracellular pH, particularly alkalinization (Leonetti et al., 2012; Schreiber et al., 1998; X. Zhang et al., 2006). This makes the gating mechanism of this channel complex. The molecular mechanism underlying pH regulation of the Slo3 channel remains unknown (M. D. Lyon et al., 2023). We tested different pH conditions and membrane voltage to elucidate the effect of zinc on the Slo3 channel. Our data suggests that zinc inhibition in mSlo3 channels is dependent on pH (Fig. 2A-E), voltage (Fig. 2G-H; Fig.2—figure supplement 1A, B) and exhibits a long-lasting inhibitory effect (Fig. 2I, K).

      However, as much as we are aware that these data alone cannot explain the molecular mechanisms of zinc’s effect on Slo3 current, our mutagenesis experiments also did not provide a straightforward answer. The single amino acid mutations examined in this study, which contain clustered negative residues, did not significantly alter zinc-mediated current reduction compared to the wild type. As the reviewer pointed out, mutating one single amino acid may not be sufficient to fully identify other contributing residues within the predicted mSlo3 zinc-binding site. Therefore, more extensive mutagenesis studies will be required to fully elucidate the molecular mechanism of zinc inhibition in mSlo3, which could not be fully understood in this study.

      On the other hand, when we analyzed the percentage of current recovery of all the mutants, E169A and E205A showed significant current recovery upon the wash-out by pH 8.0 alone. Consistent with MD simulations, our electrophysiological recordings demonstrated that the long-lasting inhibitory effect of zinc was partly abolished by these mutations. Thus, our findings highlight the contribution of E169A, located at the lower end of S3 domain and E205A, located at the lower region of S4 domain, to zinc-mediated inhibition of mSlo3 current.

      Additionally, since the molecular mechanism of pH regulation on Slo3 channel remains unknown, the molecular basis of its dual gating has yet to be elucidated, making it difficult to draw a single definitive conclusion from our current research data on how zinc inhibits mSlo3 current. Nevertheless, this study provides the foundation for understanding possible mechanisms of zinc inhibition. Our VCF data suggest that zinc influences the movement of VSD of mSlo3, and together with our mutagenesis and MD simulations results, these findings represent an important first step toward elucidating the molecular mechanism of zinc inhibition of the mSlo3 current.

      Intracellular zinc exerts inhibitory effect on mSlo3, similar to what has been reported for Slo2.2 channels (J. Zhang et al., 2023), high- and low-voltage activated calcium channel families (Sun et al., 2007) and KCNQ4 channels (Gao et al., 2017). These studies identified different regions, amino acids, and possible mechanisms of zinc inhibition among these ion channels. For instance, in Slo2.2 channels, which belong to the same Slo family as Slo3, the zinc-binding site was identified in the RCK2 domain, where cysteine and histidine residues form a canonical zinc binding motif (J. Zhang et al., 2023). In KCNQ4 channels, zinc inhibits the channel activity in a non-canonical manner that depends on its physiological activator, the membrane lipid PI(4,5)P<sub>2</sub> (Gao et al., 2017). Although zinc exerts the inhibitory effects on those various voltage-gated potassium and calcium channels, the mechanisms differ. Our data suggests another distinct mechanism of zinc inhibition in the mSlo3 channel with the identified sites located in the VSD, where zinc influences the voltage-sensor motion, and consequently affects the complex gating of Slo3.

      We revised the discussion section as follows, which is also related to the previous comment:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

      We thank the reviewers for the valuable comment in this regard. We agree that mechanism of intracellular zinc export during capacitation is crucial for the regulation of sperm function, and it would be an important finding if we could provide the experimental data on this. However, there are significant technical difficulties in performing such experiments. Two protein families facilitate the transport of zinc across cellular and intracellular membranes in opposite directions: ZnT and ZIP. ZIP12 has been reported to be highly expressed in mouse testis (Zhu et al., 2022), as well as ZnT-1 (Elgazar et al., 2005). To date, there are no known inhibitors for zinc transporters, and there is also no suitable antibodies available for these transporters, which makes it difficult to design experiments to examine the intracellular zinc transport during sperm capacitation. Apart from the two reported zinc transporters, the functional significance of other ZnTs and ZIPs, particularly those related to capacitation, remains largely unclear, leaving the mechanisms of zinc transport in sperm during capacitation poorly understood. Moreover. homozygous Znt-1 knockout mice exhibit a lethal phenotype (Andrews et al., 2004).

      Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper, the authors use Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux impacts sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. While the ideas are interesting, the experiments are not well described in many places making understanding the results very difficult. In addition, critical controls are missing throughout the paper.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 presents an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      Much of the paper lacks experimental description which makes interpretation quite difficult, or a detailed discussion is missing. Examples include:

      (1) Figure 1, particularly the Zn imaging, is not sufficiently described. How is the fluorescence intensity measured? A representative ROI? The whole tail and head? Are the sperm immobile? If not, there is evidence that motion artifacts can significantly distort these sorts of measures from Calcium measurements in Cilia. Were there controls done? Is the small amount of Zn seen in the tail above the background?

      We sincerely thank the reviewer for pointing out important details that we should provide in this study in order to make it well understood. We would like to answer and respond to the points raised by reviewer as follows:

      Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm. We have included this in the materials and methods.

      Materials and Methods

      “Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm.”

      Yes sperm is immobile during zinc imaging.

      We added the control data of zinc imaging without capacitation medium and incorporated the data into the graph in Figure 1B. For the control in non-capacitation medium, we use HS medium as newly explained in the methods, results, related figure (Figure 1B), and figure legends.

      Yes the small amount of Zn seen in the tail above the background. As shown in Fig. 1A we confirmed that the signal intensity at the proximal region of the tail was higher than the background. Therefore, the data for this region were calculated after background subtraction.

      (2) The second half of Figure 1 is also not well described. What is the extracellular solution in the recordings? When you apply the Zn ionophore, do you expect influx or efflux? I assume efflux is based on the conclusions but this should be discussed explicitly.

      The extracellular solution in the recordings for Figure 1 is HS solution (HEPES-buffered saline solution), a standard non-capacitation medium. We will include this information in the materials methods.

      Materials and methods

      “HS-based solution was used as the extracellular solution.”

      We assume that intracellular zinc levels increase upon application of zinc ionophore. Previous work has reported that sperm contain approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). When zinc pyrithione is applied, it facilitates the influx of Zn<sup>2+</sup> from the surrounding medium into the cell, thereby increasing intracellular zinc concentration. Zinc pyrithione functions both as a zinc source and as a transport facilitator, allowing Zn<sup>2</sup> to cross the otherwise impermeable lipid membrane without compromising membrane integrity.

      (3) Figure 2H labels the Y axis, "normalized current". Normalized to what? Why do neither of the curves end at 1? A better description of what this figure represents is needed.

      Normalization for figure 2H was performed by dividing the absolute current of mSlo3 at pH 8.0 of each voltage by the absolute current at the pre-determined highest voltage that still produced a stable mSlo3 current (i.e., good patch, good clamp). In this analysis, +140 mV was chosen as the highest voltage for normalization, since in few cells the patch was lost at +160mV and +180mV. Similar to the control condition, the absolute current of mSlo3 in the presence of 100 µM zinc was normalized to the absolute current of the control at +140 mV. This information has been included in the figure legends and the Materials Methods section of the revised manuscript.

      Materials Methods section:

      Figure legends for figure 2H has been updated.

      (4) The alpha fold simulations are not well described. How many Zn binding sites were found? Are all of the histidine mutations in Figure 4 Supplement 1 the ones that were found?

      We thank the reviewer for the question. In our AlphaFold3 input, we only input the transmembrane region of the protein. From there, we found four sites located as follows:

      Given that we are only interested in the intracellular side of the membrane, we are only interested in the site with the highest pLDDT value (confidence values). On the IC side, there are only two sites, where the other sites are located near the pore domain. The site is near E310 and K319.

      Author response image 1.

      AlphaFold3 prediction of the Zn binding site on IC side of Slo3

      The histidines in Fig. 4—figure supplement 1 are all histidines that are not in the transmembrane region. These residues were not included in the initial inputs for AlphaFold3. However, we conducted MD simulations including these residues and we were able to show that a few of these residues are in contact with Zn. We have now plotted the minimum distance between each of these residues and Zn in the flooding simulations.

      Author response image 2.

      MD simulations of histidines residues located in IC of Slo3

      Minimum distances between histidines in Fig. 4—figure supplement 1 and Zn<sup>2+</sup> from the flooding simulations. Different colors indicate different repeats.

      (5) There is no discussion of physiological intracellular Zn concentration. How much Zn is inside the sperm? How much if likely Free vs buffered? Is 100uM a reasonable physiological concentration?

      We estimated the intracellular zinc concentration in sperm based on human sperm data, which report a zinc concentration of approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). Considering the volume of a typical human sperm is about 15 µm<sup>3</sup> (Laufer et al., 1977), this translates to an estimated intracellular zinc concentration of approximately 400 mM, although the concentration of free zinc must be much lower than this level. Although exact intracellular zinc concentrations in mouse sperm are not well-documented, this estimate supports the observation of elevated zinc in non-capacitated sperm.

      There are a number of areas where the interpretation is not well supported by the data including:

      (6) You say in the Figure 4 supplement, that "we did not observe any significant decrease in the percentage of current inhibition." But that is a pretty misleading statement. There are large changes (increases) in the amount of zinc inhibition. These might be allosteric changes but I don't think you can safely eliminate these as relevant Zn binding sites. Also, some of these mutations appear to allow at least some unbinding of Zn.

      In our MD simulations, H720 is not at the zinc binding site and therefore, mutation to arginine would indeed eliminate its binding. We are showing this in the minimum distance analysis between Zn and H720 and show that they are further than 4 Å from each others (n=3), as shown in author response image 2.

      Chimera of Slo3/Slo1 RCK2 also showed large increases in the amount of zinc inhibition, and this might serve as a potential binding site. We agree that the statement: “we did not observe any significant decrease in the percentage of current inhibition.” is misleading, therefore we revised our interpretation and statement into:

      We revised the result section as follows:

      “However, the percentage of current inhibition varied across the mutated constructs, showing either increases or no appreciable change (Fig. 4—figure supplement 1B, C).”

      (7) Following up on the above point, it seems unfair to conclude that the D162S, E169A, and E205 mutants are part of the inhibitory binding site for Zn when the mutation has no effect on inhibition and only an effect on the washout. The mutations on the intracellular side also had an impact on the washout so it seems equally likely that they are the critical residues based on your data.

      We thank the reviewer for this important point. We agree that the absence of a strong reduction in the initial zinc inhibition makes it challenging to assign any single residue as a definitive zinc binding site. However, our interpretation is based not only on the electrophysiological data but also on the MD simulations, which consistently identified E169 and E205 as residues that frequently interact with zinc and stabilize zinc occupancy within the VSD region. Although the mutations did not markedly reduce the peak level of zinc inhibition, both E169A and E205A significantly altered the long-lasting inhibitory component during washout, which is consistent with the MD-predicted interactions. In contrast, the intracellular mutations affected washout but were not supported by MD simulations as potential zinc interaction sites. Taken together, these combined datasets support the idea that E169 and E205 contribute to zinc modulation of Slo3 in the VSD, even though additional residues or mechanisms are likely involved.

      (8) Nowhere in the paper do you make the specific link between Zn flux and membrane hyperpolarization via Slo3. You show that Zn flux changes the ability of the sperm to hyperpolarize and you show that Slo3 is inhibited by Zn but the connection between the two is not demonstrated. There appears to be a specific Slo3 blocker. If you use this in sperm, do you no longer see the Zn effect?

      Thank you for pointing out the need for clarifying this point. It is already known that sperm capacitation is well associated with the increase of intracellular pH (Vredenburgh‐Wilberg & Parrish, 1995; Y. Zeng et al., 1996), the hyperpolarization of the membrane (Arnoult et al., 1999; Y. Zeng et al., 1995) and the elevation of intracellular Ca<sup>2+</sup> concentration level (Breitbart, 2002; Publicover et al., 2007) through diverse ion channel activities. To explore whether these pathways are influenced by intracellular zinc, we used patch-clamp techniques to measure the membrane potential (Vm) as shown in Fig. 1D-K. It has been reported that under the whole-cell current clamp of mouse epididymal spermatozoa, resting membrane potential is hyperpolarized after intracellular alkalinization (Navarro et al., 2007). We mentioned this in line 100-108 in the manuscript.

      Next, our findings from the experiments using mouse spermatozoa suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel and found that zinc inhibits mSlo3 current. We explained this rationale of the experiment in line 143-150.

      We add following sentence to add more clarity to the text:

      “During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010).”

      Therefore, the text was modified into:

      “Our findings suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel.”

      Regarding the specific inhibitor, as has been pointed out by the reviewer that a new Slo3 inhibitor, VU0546110, exhibited more than 40-fold selective for human Slo3 over Slo1 (M. Lyon et al., 2023). However, the effect of VU0546110 in mSlo3 has not been tested yet. Both mouse and human Slo3 exhibit similar responses to certain inhibitors, but mouse and human Slo3 is also differ in their responses to several other inhibitors (M. D. Lyon et al., 2023), making it uncertain if this VU0546110 will work on mSlo3.

      (9) In the second half of Figure 1, the authors suggest that there is "no hyperpolization in 100uM Zn. That is not really true. It is reduced but not absent.

      We modified the wording of “no hyperpolarization in 100 µM Zn” to “alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group.”

      “In contrast, alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group”

      (10) The claim that Lrcc52 with Slo3 shows a higher current inhibition at pH 7.5 than pH 8 is not well supported because there are only 3 replicates in the 7.5 case. In addition, the claim is made in the test that 100uM ZnCl2 "already inhibited mSlo3+Lrcc52 at pH7.5", contrasted with mSlo3 alone, is not tested statistically.

      Thank you for the valuable comment. Although Fig. 3F shows a statistical difference, we agree that having only three replicates at pH 7.5 may somewhat weaken the conclusion. Following this suggestion, we have revised the sentence as follows:

      “Alkalinization appeared to increase the percentage of current inhibition by 100 µM ZnCl<sub>2</sub>.”

      We provided statistical analysis to compare pH 7.5 between mSlo3 alone and mSlo3+Lrrc52 in the Figure 3—figure supplement 1D:

      The statistical analysis showed that 100 µM zinc significantly inhibited the mSlo3 + Lrrc52 current at pH 7.5 compared to the mSlo3 current alone. We have incorporated the necessary changes into the revised manuscript and updated the figure legends accordingly.

      In a number of places, better controls are needed.

      (11) How specific is this effect for Zn? Mg2+, for instance, is also a divalent cation that is in the hundreds of uM range inside the cell. Does it exert the same effect? Each ion certainly has unique preferred coordination geometries, does your predicted binding with MD show what you might expect for tetrahedral coordination with Zn? Did you test other divalent cations functionally or in silicon?

      To answer this question, we have tested this by building another AlphaFold3 model, with Mg<sup>2+</sup> instead of Zn<sup>2+</sup>. We did not opt for the all-atoms MD simulations due to the cost of the simulation. Here, the model shows that Mg are all clustered at the pore domain and does not reside anywhere near the Zn<sup>2+</sup> site from both MD simulations and the AF3 model.

      Author response image 3.

      AlphaFold3 model of Slo3 channel with Mg<sup>2+</sup>

      The Slo3 AlphaFold model from residue M1 to L330. The colour gradient reflects the pLDDT score range from 1.73 to 95.69. Purple sticks highlighted E169, N171 and E205. In this study, we did not examine other divalent cations in our electrophysiological recordings. Exploring their effects will be an important direction for future research.

      (12) For the VCF experiments, a significantly higher concentration of Zn was used (10mM). What is the reason for this? There is no discussion of how much a "puff" is. Assuming you are using the RNA injector it is probably on the order of 50nL or less. Assuming the volume of an oocyte is 1uL that would argue that the final concentration is 500uM or higher. But this is also complicated by potential local effects of high Zn at the injection site, artifacts of injecting that much metal, and the fact that a great deal of the Zn will likely be bound to other things inside the cell. Better controls are needed for this experiment.

      As pointed out by the reviewer, the volume of the oocytes is estimated to be approximately 1 µL. We performed manual injections using glass needle typically used for RNA injection. However, because the injections were done manually during real-time VCF recording (as illustrated in the experimental scheme), the exact volume of the solution injected into each oocyte could not be precisely controlled. We estimated that each drop to be approximately 50 nL, resulting in a final concentration around 500 µM, as described by the reviewer.

      The rationale for using relatively high concentration was to ensure that the zinc concentration inside the oocyte reached an effective level, since manual injection may sometimes deliver less than 50 nL of solution. In some cases, injections failed entirely due to the technical difficulty of the method. Because VCF recordings are already technically difficult, we aimed to ensure that zinc injection was successful in oocytes that exhibited robust fluorescence signal by injecting an excess amount of zinc that would not disrupt normal oocyte conditions. For example, 10 mM zinc was prepared in an acidic solution (pH 2.5). We verified that this acidic condition did not affect mSlo3 current by performing control injections with the acidic solution alone, since the mSlo3 current is not activated under acidic pH conditions

      Author response image 4.

      VCF control experimentes: vehicle injection.

      Reviewer #3 (Public review):

      Summary:

      The study titled "Zinc is a Key Regulator of the Sperm-Specific K+ Channel (Slo3) Function" aims to investigate the role of intracellular zinc in sperm capacitation and its regulation of the sperm-specific Slo3 potassium channel. Capacitation is a crucial physiological process that enables sperm to fertilize an egg, and membrane hyperpolarization through Slo3 activation is a well-established event in this process. The authors propose that intracellular zinc dynamically decreases during capacitation and inhibits Slo3-mediated K⁺ currents, thereby playing a regulatory role in sperm function.

      Strengths:

      (1) Novel Contribution to Sperm Physiology.

      The study provides new insights into how zinc dynamics contribute to sperm capacitation, specifically through its direct inhibition of Slo3 activity.<br /> Previous research has focused primarily on extracellular zinc's effect on sperm function; this work expands the discussion to intracellular zinc regulation, an area with limited prior investigation.

      (2) Strong Electrophysiological Evidence.

      The study employs inside-out patch-clamp recordings in Xenopus oocytes to demonstrate zinc's direct inhibition of Slo3 currents. The observed slow dissociation of zinc from Slo3 suggests a long-lasting regulatory effect, adding to the understanding of ion channel modulation in sperm cells.

      (3) Molecular Mechanistic Insights

      Using Molecular Dynamics (MD) simulations and mutagenesis, the authors identify potential zinc-binding sites within Slo3's voltage-sensing domain (VSD), particularly E169 and E205. These computational predictions are supported by electrophysiological recordings, strengthening the argument that zinc directly binds and inhibits Slo3.

      (4) Physiological Relevance and Functional Implications

      The study suggests that zinc inhibition of Slo3 could contribute to sperm motility regulation during capacitation.

      The authors provide sperm motility assays as supporting evidence, showing that zinc chelation affects motility only after capacitation has begun, suggesting a dynamic role of intracellular zinc in the capacitation process.

      Weaknesses:

      While the study presents compelling electrophysiological data and molecular insights, there are several critical gaps that must be addressed before fully supporting the physiological relevance of the findings.

      (1) The authors should measure the effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      We thank the reviewer for the valuable comments to strengthen the physiological relevance of our findings. We provided additional data of Slo3 currents measured using perforated patch-clamp recording in sperm cells in experiments with zinc pyrithione (ZnPy) before and after the addition of 10 mM NH<sub>4</sub>Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl. These data have been integrated into Figure 1L-N and Figure 1—figure supplement 1A, B.

      It is worth noting that Slo3 current in this recording might contain other endogenous current, as no specific blocker was used. Nonetheless, the data showed that the Slo3 current in sperm tends to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy. There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group.

      We also provided data with the cell capacitance as suggested; however, cell capacitance obtained from the sperm recordings showed the capacitance throughout the head and midpiece of spermatozoa. On the other hand, Slo3 channels are not expressed in the entire spermatozoa, therefore the cell capacitance acquired from these recordings does not accurately reflect the area where the Slo3 channels are localized. Although we included normalization of Slo3 currents to cell capacitance before and after ZnPy application, this normalization should be interpreted with caution for the reasons mentioned above. The corresponding figure has been included in the supplementary data Figure 1—figure supplement 1A, B.

      We added sentences to the result section as follows:

      “We also measured Slo3 current using perforated patch-clamp recordings in spermatozoa treated with ZnPy, before and after the addition of NH<sub>4</sub> Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl (Fig. 1L-N; Fig. 1—figure supplement 2A, B). Slo3 current in sperm tended to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy (Fig. 1L, M). There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group (Fig. 1N). Taken together, these results confirmed that intracellular zinc indeed inhibits alkalinization-induced hyperpolarization in mouse sperm.”

      (2) Lack of Controls in Non-Capacitated Sperm

      The claim that zinc is exported from sperm during capacitation needs stronger experimental validation.

      The authors did not include a control group of non-capacitated sperm in key fluorescence imaging experiments, making it difficult to confirm that the observed zinc decrease is capacitation-specific rather than a general zinc redistribution process.

      To strengthen this conclusion, experiments should be performed in non-capacitating conditions to determine whether intracellular zinc levels remain unchanged.

      We added the control group of non-capacitated sperm in key fluorescence imaging experiments, as integrated in Figure 1B.

      The following changes in the Results and Figure Legend sections are revised and added:

      “We observed that there was a gradual and significant decrease in fluorescence intensity in both regions (Fig. 1B), particularly prominent in the flagellum (Fig. 1C). This decline suggests the active release of intracellular zinc from sperm flagellum occurs during capacitation. In contrast, the fluorescence intensity of the control group of non-capacitated sperm remained unchanged (Fig. 1B).”

      Figure Legend 1B was modified accordingly.

      (3) Unclear Role of Zinc in Physiological Capacitation

      The study clearly demonstrates zinc inhibition of Slo3 but does not sufficiently establish how this affects capacitation at a functional level.

      Additional motility and capacitation markers should be analyzed to confirm that zinc influences sperm behavior beyond Slo3 inhibition.

      We thank the reviewer for this valuable comment. We fully agree that zinc can influence sperm physiology through multiple mechanisms and that its overall effects on capacitation are complex. However, the main goal of our study is to investigate the mechanism and to determine whether intracellular Zn<sup>2+</sup> directly inhibits Slo3. Our results from both the heterologous expression system and the sperm membrane potential recordings consistently support this conclusion.

      For these reasons, we believe that adding such assays would not clarify the role of Slo3 in capacitation but rather risk confounding interpretation. Instead, we have expanded the Discussion to explicitly acknowledge these limitations and to emphasize that future studies combining genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to fully define its physiological impact.

      We added sentences to the discussion section in the revised manuscript as follows:

      “Although these results support a mechanistic link between zinc and Slo3 activity, future studies that combine genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to define its physiological impact in more detail. Within this context, this study highlights the potential importance of intracellular zinc in the regulation of sperm capacitation.”

      (4) Insufficient Data on Zinc-Slo3 Specificity

      The authors should consider using quinidine, a known washable Slo3 inhibitor, to confirm that zinc acts specifically on Slo3 channels rather than other endogenous ion channels.

      The study would benefit from including washout controls in the inside-out patch-clamp recordings, as seen in Figure 3-Supplement 1, to confirm that zinc inhibition is reversible or long-lasting.

      We thank the reviewer for raising the point regarding the need to confirm that the current observed in our recordings indeed represents Slo3 current by using a specific blocker such as quinidine, as there is a possibility that endogenous currents might also be present and that zinc could act on those endogenous currents. Performing experiments with quinidine would indeed be crucial to demonstrate the specificity of Slo3 current in our patch-clamp recordings.

      However, in our current experimental protocol, we apply ramp pulses multiple times and require a long series of recordings within a single session in one patch as described in the materials and methods as well as Figure 2I, Figure 4—figure supplement 1C, Figure 5B (pH 8.0 → 100 µM zinc → pH 8.0, to observe the washout effect). Incorporating quinidine into this sequence would make the protocol even longer (pH 8.0 → quinidine → washout → pH 8.0 → 100 µM zinc), which increases the likelihood of patch loss before completing the full set.

      Furthermore, we have ensured that the recorded current corresponds to Slo3 by using appropriate experimental conditions, specifically the suitable voltage range for activation, a high intracellular pH (pH 8.0), and high-potassium solutions in our recordings.

      (5) Missing Discussion of Zinc's Role in CatSper Regulation

      The study focuses solely on Slo3 but does not mention CatSper, the principal Ca<sup>2+</sup> channel essential for sperm capacitation.

      Zinc has been reported to inhibit CatSper activity, which could significantly impact sperm function.

      The discussion should address whether zinc's effect on Slo3 represents a broader regulatory mechanism influencing multiple ion channels during capacitation.

      Thank you for the comment. To the best of our knowledge, there have been no reports showing that CatSper activity is directly regulated by zinc ions.

      Furthermore, in our patch-clamp recordings with NH<sub>4</sub>Cl and ZnPy, we observed that the normal CatSper current increased even in the presence of ZnPy, which makes it challenging to conclude whether zinc directly affects CatSper channel activity.

      We added sentences to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      Final Assessment

      This work presents important findings on zinc regulation of Slo3 channels, supported by strong electrophysiological and molecular analyses. However, the physiological relevance of these findings remains unclear due to missing controls, and needs additional functional assays. Addressing these issues would significantly enhance the manuscript's scientific rigor and impact.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Most of the specific comments and suggestions are in the public review. Minor additional comments primarily focused on presentation and textual errors are here.

      (1) There is something strange happening in Figure 6D in the -100ish range. I think it's likely related to the reversal potential of K+.

      Thank you for pointing it out. Yes in figure 6D there was strange plot in the range of -100 mV. As the reviewer has pointed out we also think that it is related to the reversal potential of potassium ions.

      (2) There are a number of errors in the text that make following it difficult. For instance, multiple times the authors say "In consistent" (line 120 as an example) when I think they mean consistent with.

      We changed the “in consistent” with “consistent with” throughout the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      The authors provide well-described experiments, particularly those examining the effects of intracellular zinc on Slo3 channels using inside-out patch-clamp recordings. However, some experimental designs intended to assess the physiological relevance of these findings during capacitation require additional controls and data before the authors' claims can be fully supported.

      Comments

      Major Concerns & Suggested Improvements

      Line 65: "In the present study, we find that intracellular zinc is exported during capacitation, indicating that zinc dynamics in spermatozoa play an important role in fertilization."

      This claim requires additional experimental data to be fully supported.

      Thank you for pointing it out. We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Line 79: "Intracellular zinc is exported from sperm during capacitation."

      The authors should include controls in non-capacitated conditions to determine whether zinc export is specific to capacitation or a general process in sperm cells.

      Again, we have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Figures - General Comment:

      In all figures, please replace SEM (Standard Error of the Mean) with Standard Deviation (SD) for consistency and a more accurate representation of variability.

      SEM (Standard Error of the Mean) has been replaced with SD (Standard Deviation) in all figures (main figures and supplements) as well as in numerical description accordingly.

      Figure 1

      Panel B:

      Include a non-capacitating media control to confirm that the observed decrease in zinc-sensitive dye fluorescence is not due to artifact/photobleaching.

      We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Perform an experiment with capacitating media supplemented with a higher concentration of zinc. If intracellular zinc export is a real effect, added extracellular zinc should prevent or reduce this phenomenon.

      We appreciate the reviewer’s suggestion; however, we believe that supplementing the medium with high concentrations of zinc is unsuitable for validating the export phenomenon due to confounding physiological factors. Our preliminary tests demonstrated that increasing extracellular zinc triggers a drastic increase in intracellular zinc as well (Author response image 5). Furthermore, the high concentration of BSA in the capacitation medium acts as a potent zinc buffer, precluding precise control over free Zn<sup>2+</sup> levels. Therefore, the inherent difficulty in maintaining defined extracellular and intracellular Zn<sup>2+</sup> gradients makes the interpretation of such data highly problematic. Future studies will focus on identifying the specific zinc transporters involved and characterizing their molecular mechanisms.

      Author response image 5.

      Zinc addition

      Clarify whether the "n" value represents different cells or multiple recordings from the same cell.

      n value represents different cells.

      Supplemental Figure 1:

      Incorporate Δ (delta) comparison between 10 min and 2 hours under control conditions and in the presence of TPEN.

      Here we provide data:

      Author response image 6.

      Δ comparition between control and TPEN

      Provide statistical analysis for these comparisons to make the effects of capacitation clearer.

      We did the calculation and statistical analysis, however there was no statistical difference, as shown in the author response figure 6 due to high variability of individual data.

      Figure 2

      Panel C:

      Incorporate inhibition at pH 7.4 and 6.0 for direct comparison.

      Recording inhibition effect of zinc at pH 6.0 is not possible because there would be no current to begin with, as mSlo3 is gated by both voltage and alkaline pH.

      Panel D:

      Include a washout control, similar to what is shown in Panel A.

      We included a washout control trace to Figure 2D.

      Panel E:

      Provide a longer reference trace in the absence of zinc to clearly visualize the control condition. The current reference segment is too short to properly assess baseline activity.

      Although we do not have a longer reference trace in the absence of zinc for Figure 2E, we instead show the trace recorded under the application of 0.1 µM zinc in Figure 2—figure supplement 1A to illustrate the current behavior.

      Panels G-H:

      Include inside-out patch-clamp traces and quantification of zinc washout effects.

      Inside out patch traces are shown in Figure 2G as we applied step-pulses protocol. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Panels I-K:

      Provide additional traces. In Panel I, the inhibition by zinc is clear, but in Panel J, the reduction appears less distinct and could be due to rundown or an artifact. Additional controls should clarify this.

      Figure 2K presents the most representative trace among five recorded cells. The apparent reduction is less distinct, likely due to an artifact caused by a bubble in the rapid perfusion system during solution exchange. However, at the end of zinc application (t = 50 s), the current amplitude was clearly reduced compared with that at t = 0–10 s.

      Figure 3

      Panel D:

      Include additional data showing the transition to pH 6 and washout with pH 7.5, similar to the experimental design in Panels A and B.

      We included additional data showing raw trace of the application of pH 6.0 in Figure 3D, also included the transition to pH 6 and washout with pH 7.5 in Figure 3E.

      Figure 3-Supplement 1:

      Include zinc washout experiments. This approach is one of the best ways to evaluate the reversibility of zinc inhibition on the channel.

      As mentioned above, in this recording we recorded step pulses up to +180 mV. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Figure 6

      Zinc Inhibition Specificity:

      The authors should use quinidine, a known washable Slo3 inhibitor, to assess Slo3 activity before and after zinc injection.

      This experiment would confirm that zinc specifically inhibits Slo3, rather than affecting other endogenous channels.

      We sincerely thank the reviewer for this valuable suggestion. However, given the technical difficulty of these experiments, which involve lengthy VCF recordings and manual zinc injections that significantly compromise oocyte health, it is not feasible to apply quinidine at this stage.

      Moreover, we observed voltage-dependent fluorescence changes around the VSD, and this change was influenced by the application of zinc, confirming that zinc specifically inhibits Slo3 rather than affecting other endogenous channels.

      Discussion - Key Revisions Needed

      Line 308: "Our results demonstrated that intracellular zinc is exported from spermatozoa during capacitation."

      This claim needs to be supported by experiments using non-capacitated conditions.

      Additionally, measuring maximum and minimum zinc concentrations under different conditions would improve the interpretation of fluorescence intensity changes.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 309: "We further discovered that intracellular zinc regulates alkalinization-induced hyperpolarization in mice spermatozoa, mediated by Slo3 channel."

      Additional controls are needed to substantiate this claim.

      At this stage of the study, we do not have access to Slo3 knockout (KO) mice; therefore, performing additional experiments is not feasible.

      Line 316: "Using FluoZin3-AM for zinc imaging, we confirmed the presence of intracellular zinc in sperm (Fig. 1A), which is consistent with previous findings (Henkel et al., 1999). Our observations revealed that treatment with capacitation medium induced a decrease in zinc fluorescence intensity (Fig. 1B, C), suggesting that zinc levels are dynamic during capacitation."

      This statement must be supported by negative controls, including non-capacitated sperm conditions.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 327: "We also observed that zinc chelator significantly affected the sperm motility only after, but not before, capacitation (Fig. 1-figure supplement 1)."

      Data presentation should be revised to highlight the effects of capacitation itself.

      The discussion should specify which motility parameters were affected and why others were not.

      In the text we mentioned that:

      “We incubated the isolated spermatozoa with cell permeable Zn<sup>2+</sup> chelator N,N,N',N'-Tetrakis(2-pyridylmethyl)ethylenediamine (TPEN) and measured the motility parameters before and after capacitation. We found that VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) were influenced by the TPEN treatment only after the capacitation, as shown in Fig. 1—figure supplement 1. These results demonstrate that the dynamics of zinc levels during capacitation potentially contributes to sperm motility, highlighting the importance of zinc action in sperm physiology.”

      Indeed, we observed that zinc chelator significantly affected the sperm motility specifically in VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) only after, but not before, capacitation (Fig. 1—figure supplement 1). Of note, it has been recently reported that all these motility parameters (VAP, VCL, and VSL) are reduced by Slo3-specific inhibitors in human sperm (M. Lyon et al., 2023). These findings are consistent with the idea that endogenous zinc dynamics control sperm motility through Slo3 during the capacitation process.

      Figure legend is revised accordingly.

      Line 369: "Structural determinants of zinc inhibition in the mSlo3 channel."

      The authors should include an analysis of the evolutionary conservation of the mutated sites across Slo1, Slo2, and Slo3.

      If Slo3 has a unique regulatory mechanism, these sites should show high sequence variability compared to other Slo channels.

      If these sites are highly conserved, the authors should explain how Slo3 differs functionally from Slo1 and Slo2 despite this conservation.

      We thank the reviewer for the valuable suggestions regarding the inclusion of additional discussion points on the structural determinants of zinc inhibition in the mSlo3 channel. We performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm.

      Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. To date, there have been no report examining the corresponding residues to E169 (E191 in mslo1 or E176 in mslo2.2) for their zinc sensitivity. This might be because in both channels the zinc-binding sites are well defined where they are located in RCK1 domain for Slo1 (Hou et al., 2010) and RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified binding site in Slo2.2 is conserved in Slo2.1 but not present in Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. However, this does not rule out the possibility that regions surrounding E191 or E176 could provide to additional insights into zinc regulation in these channels, which could be of interest for future studies.

      Interestingly, in contrast to E169, E205 is not conserved across the Slo family, making this residue unique to the mouse Slo3 channel and potentially a determinant of zinc sensitivity in mSlo3. Given that E205 is located in the S4 domain and supported by our VCF results showing that zinc inhibition influences the motion of voltage-sensing domain of mSlo3, E205 represents an important residue to be explored in future studies. Furthermore, as this residue is unique only to Slo3, it highlights the distinct functional properties of Slo3 such as its gating mechanism as it is regulated by both membrane voltage and alkalinization, which has a different voltage range of activation compared to mSlo1 (Li et al., 2024) and involves distinct ligands and gating mechanisms compared to Slo2 (J. Zhang et al., 2023).

      We add the sequence alignment results into Figure 5—figure supplement 1F.

      We revised the results section as follows:

      “Additionally, we performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm. Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. (Figure 5—figure supplement 1F).”

      We revised the discussion section as follows:

      “Based on sequence alignment, E169 (mSlo3 numbering) is conserved among Slo family channels in mice, whereas E205 (mSlo3 numbering) is not (Fig. 5—figure supplement 1F). To date, no studies have examined the corresponding residues to E169 (E191 in mSlo1 or E176 in mSlo2.2) for their potential zinc sensitivity, likely because the established zinc binding sites in these channels are located in the RCK1 domain for Slo1 (Hou et al., 2010) and the RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified zinc binding site in Slo2.2 is conserved in Slo2.1 but is absent in both Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. Although regions surrounding E191 or E176 may still provide additional insights into zinc regulation and could be of interest for future investigation, E205 stands out because, unlike E169, it is not conserved across the Slo family, making it unique to mSlo3 and potentially a specific determinant of zinc sensitivity in this channel.”

      Figure legend is revised accordingly.

      Line 392: "Physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm."

      The authors should mention the effects of zinc on CatSper channels, as CatSper is also crucial for capacitation.

      Slo3 inhibition may represent only one component of zinc's broader regulatory role during capacitation.

      We thank the reviewer for raising this important point regarding the physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm. We agree that we should have also discussed the effect of zinc on CatSper channels, as this channel is crucial for capacitation. To date, there are only few reports on the effect of zinc on CatSper channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, which facilitating sperm to escape into female genital tract (Jeschke et al., 2021). Taking this into consideration, as the reviewer pointed out, zinc inhibition on Slo3 may represent only one component of zinc’s broader regulatory role during capacitation.

      We added a sentence to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      The study presents valuable insights into the role of intracellular zinc in sperm capacitation and Slo3 channel function. However, the physiological impact of these findings remains unclear due to insufficient controls and missing key experimental data. The suggested revisions would strengthen the validity of the claims made by the authors and improve the overall scientific rigor of the manuscript.

      Key Areas for Improvement:

      Control experiments in non-capacitated conditions.

      Increased statistical rigor in figure analyses.

      More detailed experiments to confirm specificity of zinc action on Slo3.

      Expanded discussion of zinc's role beyond Slo3, including CatSper regulation.

      The authors should measure these effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      By addressing these concerns, the manuscript will provide a more robust foundation for understanding zinc's regulatory role in sperm physiology and capacitation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. This study is an advance towards simplifying protein expression workflows, and the evidence provided is solid, starting with nanoluc, a protein that expresses readily in many systems, to applications to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Data on the underlying mechanisms and efficiency of the presented system in terms of protein yield relative to other known cell-free systems would greatly enhance the findings' significance and the strength of the evidence. The paper remains of interest to scientists in microbiology, biotechnology and protein synthesis.

      We thank the editors for the positive assessment of our optimized E. coli cellfree protein synthesis (eCFPS) system and the "fast lysate" preparation.

      As suggested, we have significantly strengthened the evidence by adding:

      (1) Mechanism data: We have integrated a detailed analysis of the endogenous metabolic pathways (amino acids and nucleotides) into the Discussion section, supported by literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      (2) Efficiency comparisons: We have added quantitative comparisons of absolute protein yields between our simplified 7-component system and the conventional 35-component system (now in Figure S3 E-F), demonstrating that our system matches or exceeds traditional titers.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors only provided the data for optimization, leaving the underlying mechanism that explains the phenomena unexplained.

      We appreciate this feedback. To address the mechanism of how protein synthesis persists without exogenous additives, we have expanded the Discussion to explain how the "fast lysate" retains active endogenous enzymes. By omitting runoff and dialysis, our system preserves the metabolic capacity to synthesize amino acids (e.g., Cys and Trp from Ser) and nucleotides from residual precursors, as supported by the literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      Reviewer #2 (Public review):

      The production of the lysate requires special instrumentation, limiting accessibility. While the strengths of the study are well-emphasized, the limitations are not mentioned.

      We thank the reviewer for this point. While a high-pressure homogenizer is common in many molecular biology labs, we acknowledge it may be a barrier for some. We have now included a dedicated Limitations paragraph in the Discussion addressing accessibility and the inherent challenges of prokaryotic systems in producing complex human proteins requiring post-translational modifications.

      Reviewer #3 (Public review):

      (1) Clarification on "highly efficient" and the lack of comparison with typical high-yield systems.

      We have clarified "highly efficient" as a holistic balance of high yield, robustness, and simplified preparation. Crucially, we added absolute yield data (sfGFP standard curve) to Figure S3E-F demonstrating that our 7-component system performs comparably to or better than traditional high-yield protocols.

      (2) How did the authors ensure chemical composition only affected translation and not transcription?

      This is a key distinction. We performed new experiments using pretranscribed mRNA templates (Figure S3G) to isolate translational effects. While translation efficiency slightly decreased in the simplified buffer, the overall protein yield increased significantly due to a dramatic boost in transcription efficiency, confirming the system's net performance gain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are specific concerns that need to be addressed:

      (1) On page 4, lines 103-109, the authors speculate that protein synthesis persists even in the absence of amino acids like arginine, cysteine, and tryptophan. They suggest that this is likely due to residual amounts of these amino acids present in the cell lysate. Yokoyama et al. demonstrated that these amino acids are generated from other amino acids by endogenous amino acid metabolic enzymes in the cell lysate (J. Biomol. NMR 48, 193, (2010), doi: 10.1007/s10858-010-9455-3.). Cysteine and tryptophan can be derived from serine. In this context, asparagine and glutamine can be disregarded because they are synthesized from aspartate and glutamate, respectively. A more indepth analysis is required to interpret the results accurately.

      We thank the reviewer for this insightful comment and for pointing us toward the relevant literature. We agree that the persistence of protein synthesis in the absence of exogenous amino acids like Arg, Cys, and Trp is driven by the robust metabolic capacity of our "fast lysate."

      Unlike conventional protocols, our "fast lysate" procedure deliberately omits runoff and dialysis steps, ensuring the maximal retention of active endogenous metabolic enzymes and residual small-molecule pools. As demonstrated by Yokoyama et al. (2010), E. coli cell extracts retain functional enzymes capable of synthesizing acid-sensitive amino acids from precursors or more stable amino acids. We have integrated a detailed mechanistic analysis of these endogenous metabolic pathways into the Discussion section and have cited Yokoyama et al. (2010) to support this interpretation.

      (2) On page 4, lines 111-115, the authors demonstrated that protein synthesis could occur even in the absence of CTP or UTP, provided ATP and GTP are present. This phenomenon can also be attributed to the analogous complementary actions of metabolic pathways.

      We agree with the reviewer's assessment. The ability of the optimized eCFPS to function without exogenous CTP/UTP relies on the same principle of endogenous metabolic conversion mentioned above. The omission of dialysis ensures that the lysate retains not only residual nucleotide pools but also the full suite of nucleotide metabolic enzymes. Powered by our optimized energy regeneration system, these enzymes maintain sufficient levels of CTP and UTP to support transcription and translation. This explanation has been added to the Discussion section to clarify the robustness of our system.

      (3) On Figure 3A, protein synthesis kinetics are presented in a stair plot instead of the commonly used scatterplot. Is there a specific reason for choosing the stair plot?

      We chose the stair plot representation to more clearly visualize the cumulative process of protein synthesis and its stabilization over discrete time intervals. Given that sampling occurred every 10 minutes, a stair plot effectively highlights the "plateau" phases and the incremental nature of accumulation, which can sometimes be obscured by dense scatter plots.

      (4) On Figure 3C. It is unclear which system is referred to as the "initial" system in Figure 3C. Which data point on Figures 3A and 3B corresponds to this "initial" system?

      We apologize for the lack of clarity. In Figure 3C, "initial" refers to the traditional 35-component system prior to our streamlining process. Figures 3A and 3B characterize the performance of the final optimized system alone. To resolve this ambiguity, we have updated the legend for Figure 3 to explicitly define the "initial" system as the pre-optimization control.

      (5) In Figure 5D, previously reported eCFPS and the system using "fast lysate" were compared. The only difference between the two systems seems to be the type of lysate used, according to the Supplementary table. Optimal concentrations for the components are the same for both lysates, or is there still room for optimization for "fast lysate"?

      The "fast lysate" primarily differs from conventional lysates in its preparation speed and the retention of endogenous cofactors/enzymes. While the optimal salt and energy concentrations remained consistent across both lysates in our tests, the "fast lysate" provides a higher baseline signal due to the endogenous T7 RNA polymerase and metabolic factors. We believe this demonstrates the robustness of the optimized reaction buffer across varying lysate preparation qualities.

      (6) The study suggests that the removal of DTT didn't negatively affect protein expression. However, based on my experience, certain proteins, especially those with cysteine residues on their surface, tend to aggregate without DTT. Did the authors attempt to express such proteins, or did they draw this conclusion based on the limited number of proteins tested?

      This is a valid concern. We based our conclusion on the functional expression of Bsal and vimentin—two proteins that are inherently prone to aggregation and misfolding. Their successful synthesis suggests that the intrinsic reducing capacity of the lysate (e.g., glutathione and thioredoxin systems) is sufficient for many targets (Prinz et al. 1997). However, we acknowledge that specialized cysteine-rich proteins may still require exogenous DTT. We have addressed this in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 77-78 "we iteratively evaluated the contribution of individual constituents through luciferase reporter assays" - where is all the data? Please use an appropriate figure citation. Figure 1 cherry picks some components, but I think all data should be included.

      We have structured the data presentation to show dispensable components in Figure 1 (where removal does not inhibit reaction) and essential components in Figure 2 (where 0-concentration results in zero activity). This ensures a logical flow of the "streamlining" narrative. All raw data for these screenings have been included in the Source Data files.

      (2) Line 127 typo "concentrations".

      We thank the reviewer for pointing out this error. The typo "concentrations" has been corrected.

      (3) Figure 2: "protein expression levels" measured how?/what is the unit of the vertical bar on the right? I'm assuming that this experiment was conducted for discrete concentrations and thus generated discrete data points. However, the graph makes it seem as if this is continuous data. Kindly change the type of graphing to indicate that this is discrete data, showing each data point.

      We appreciate the reviewer's suggestion. Protein expression levels were measured using the Nanoluciferase (NLuc) reporter gene assay. We utilized heatmaps/contour plots because our data are bivariate, representing the simultaneous optimization of two concentrations (e.g., Mg<sup>2+</sup> and K<sup>+</sup> in Figure 2A). For such matrix-based screenings, heatmaps are significantly more effective than scatter plots at conveying synergistic trends and identifying optimal reaction landscapes. Notably, this visualization approach for discrete biochemical optimization data was successfully employed by Ban lab in their recent study on translation system optimization (Bothe and Ban 2024). The vertical color bar on the right represents the relative expression ratio, normalized to the maximum yield. Although we have provided a scatter plot of this discrete data for reference (see Author response image 1), we believe it appears visually cluttered due to the high density of data points, making it difficult to discern overarching trends. Heatmaps, by contrast, offer a much clearer representation of the optimal reaction landscape. To maintain transparency, the discrete concentration points tested are clearly reflected by the axis ticks, and all raw discrete data are available in the Source Data files.

      Author response image 1.

      (4) Also, for all figures: the way the units are presented (DTT/mM) is confusing to me; it could just be something like [DTT] (mM).

      We have revised all figures and tables to follow the standard format (e.g., [Component] (unit)) as suggested.

      (5) Do the sucrose gradient sedimentation data have replicates? If so, please indicate statistics.

      The sucrose gradient data provided (Figure 5C) is intended as qualitative evidence that the "fast lysate" method preserves intact 70S ribosomes across different preparation batches. This experiment has been performed independently multiple times with consistent results, demonstrating the high reproducibility of our preparation method. While we did not perform a quantitative comparative analysis of ribosome concentration, the consistency of the peaks confirms the integrity of the translational machinery.

      (6) Line 457: fix the red line.

      We thank the reviewer for pointing this out. The formatting issue has been resolved in the revised manuscript.

      (7) Please mention the limitations of this study in the discussion.

      We thank the reviewer for this suggestion. We have added a paragraph to the Discussion addressing the limitations of prokaryotic systems regarding complex eukaryotic post-translational modifications and chaperone requirements.

      (8) Please include all uncropped gels in the source data, alongside the raw data, as you have already done.

      As requested, we have provided all original, uncropped gel images in the Source Data files, alongside the raw data, to ensure full transparency and compliance with the journal's data sharing policies.

      Reviewer #3 (Recommendations for the authors):

      (1) The study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      We have performed new quantitative experiments (now included in Figure S3 E-F) to measure absolute protein yields. Our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components. We have also clarified in the text that "highly efficient" refers to the synergistic balance of high yield, low cost, and simplified preparation time.

      (2) What do the authors mean by "highly efficient", often used in the manuscript?

      We thank the reviewer for the opportunity to clarify our terminology. We have performed new quantitative experiments (now included in Figure S3) to measure absolute protein yields, demonstrating that our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components.

      In the context of this manuscript, we use the term "highly efficient" as a holistic descriptor that encapsulates three key dimensions of the system:

      (1) Performance Superiority: Achieving higher expression levels and faster kinetics compared to conventional 35-component systems.

      (2) Functional Robustness: The ability to efficiently synthesize challenging targets, such as cytotoxic proteins (BsaI) and aggregation-prone proteins (vimentin), which often fail in simplified systems.

      (3) Practical Utility: A drastic reduction in preparation time and cost through the "fast lysate" protocol and the removal of 28 auxiliary components, thereby lowering the barrier to adoption.

      This definition aligns with the study's core objective: developing a system where efficiency is measured not only by final yield but by the synergy between high performance and extreme ease of use.

      (3) In this article, the term 'optimisation' is used as a synonym for 'simplification'. In biochemistry, optimisation commonly refers to an increase in yield, or the same yield achieved more easily or at a lower cost. In this case, however, we have no idea how this new system compares to a conventional expression system in terms of yield.

      We thank the reviewer for this conceptual clarification. We agree that in biochemistry, "optimization" typically implies an improvement in yield or cost-effectiveness. In our study, we use the term to describe the process of achieving a superior balance between system simplicity and protein production. To address the reviewer's concern regarding the lack of a direct yield comparison, we have added new data in Figure S3. This figure provides a sideby-side comparison of protein yields between our simplified 7-component system and the conventional 35-component system. The results demonstrate that our system not only matches the performance of the traditional setup but frequently exceeds it in terms of final protein titer, while significantly reducing the reagent cost and preparation complexity. Thus, the simplification achieved in this work represents a true biochemical optimization of the cell-free synthesis process.

      (4) The levels of transcripts of the proteins studied were not determined in any of the experiments performed. Therefore, it is unknown whether the effects of different experimental conditions on NLuc, GFP or other protein expression are due to an effect on transcription, translation, or both.

      This is an excellent point. We performed a new set of experiments using mRNA templates instead of DNA to isolate the effects on translation (Figure S3G). Our results indicate that while the system's overall boost in NLuc expression is partially attributable to enhanced transcription efficiency, the translation machinery remains highly robust. We have updated the Results and Discussion to reflect this distinction.

      References

      Bothe, Adrian, and Nenad Ban. 2024. “A Highly Optimized Human in Vitro Translation System.” Cell Reports Methods 4 (4): 100755.

      Kigawa, T., T. Yabuki, Y. Yoshida, M. Tsutsui, Y. Ito, T. Shibata, and S. Yokoyama. 1999. “Cell-Free Production and Stable-Isotope Labeling of Milligram Quantities of Proteins.” FEBS Letters 442 (1): 15–19.

      Prinz, W. A., F. Aslund, A. Holmgren, and J. Beckwith. 1997. “The Role of the Thioredoxin and Glutaredoxin Pathways in Reducing Protein Disulfide Bonds in the Escherichia Coli Cytoplasm.” The Journal of Biological Chemistry 272 (25): 15661–67.

      Yokoyama, Jun, Takayoshi Matsuda, Seizo Koshiba, and Takanori Kigawa. 2010. “An Economical Method for Producing Stable-Isotope Labeled Proteins by the E. Coli Cell-Free System.” Journal of Biomolecular NMR 48 (4): 193–201.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Bobola et al reports single-nucleus expression analysis with some supporting spatial expression data of human embryonic and fetal cardiac outflow tracts compared to adult aortic valves. The transcription factor GATA6 is identified as a top regulator of one of the mesenchymal subpopulations, and potential interacting factors and downstream target genes are identified bioinformatically. Additional bioinformatic tools are used to describe cell lineage relationships and trajectories for developmental and adult cardiac cell types.

      Strengths:

      The studies of human tissue and extensive gene expression data will be valuable to the field.

      Weaknesses:

      (1) The expression data are largely confirmatory of previous studies in humans and mice. Thus, it is not clear what novel biological insights are being reported. While there is some novelty and impact in using human tissue, there are extensive existing publications and data sets in this area.

      (2) Major conclusions regarding spatial localization, differential gene expression, or cell lineage relationships based on bioinformatic data are not validated in the context of intact tissues.

      (3) The conclusions regarding lineage relationships are based on common gene expression in the current study and may not reflect cellular origins or lineage relationships that have previously been reported in genetic mouse models.

      (4) An additional limitation is the exclusive examination of adult aortic valve leaflets that represent only a subset of outflow tract derivatives in the mature heart. The conclusion, as stated in the title regarding adult derivatives of the outflow tract, is not accurate based on the limited adult tissue evaluated, exclusive bioinformatic approach, and lack of experimental lineage analysis of cell origins.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Leshem et al. presents a transcriptomic analysis of the developing human outflow tract (OFT) at embryonic and fetal stages using snRNAseq and spatial transcriptomics. Additionally, the authors analyze transcriptomic data from the adult aortic valve to compare embryonic and adult cell populations, aiming to identify persistent embryonic transcriptional signatures in adult cells. A total of 15 clusters were identified from the embryonic and fetal OFT samples, including three mesenchymal and four endothelial clusters. Using SCENIC analysis on the embryonic snRNAseq data, the authors identified GATA6 as a key regulator of valve precursor cells. Spatial transcriptomic analysis of four fetal OFT sections further revealed the spatial distribution of mesenchymal nuclei, smooth muscle cells, and valvular interstitial cells. Trajectory analysis identified two distinct developmental origins of fetal mesenchymal cells: the neural crest and the second heart field. Finally, the authors used snRNAseq data from the adult aortic valve to propose that embryonic transcriptional signatures persist in a subset of adult cells.

      Strengths:

      (1) The study offers a rich and detailed dataset, combining snRNA-seq and spatial transcriptomics in human embryonic and fetal OFT, which are challenging to obtain.

      (2) The use of SCENIC and trajectory analysis adds mechanistic insight into cell lineage and regulatory programs during valve development.

      (3) This study confirms GATA6 as a key regulator of valve precursor cells.

      (4) Comparison between embryonic/fetal and adult datasets represents a novel attempt to trace persistence of developmental transcriptional programs.

      Weaknesses:

      (1) A major limitation is the lack of experimental validation to support key conclusions, particularly the claim of persistent embryonic transcriptional signatures in adult cells.

      (2) The manuscript would benefit from a clearer discussion of how these results advance beyond previous studies in human heart and valve development.

      (3) The comparison between embryonic and adult data is interesting, but would be more convincing with additional evidence supporting the proposed persistence of embryonic transcriptional signatures in adult cells.

      Reviewer #3 (Public review):

      Leshem et al have generated a transcriptional cell atlas of the human outflow tract at two developmental timepoints and its adult valvular derivatives. This carefully performed study provides a useful resource for the study of known genes implicated in outflow tract defects and potentially also for discovering new disease genes. The authors reveal neural crest and mesodermal contributions to different outflow tract components and show that GATA6, known to play a role in arterial valve development, controls a set of genes expressed in endocardium-derived cells during valve development. Interestingly, the results suggest lineage persistence of expression of certain genes through to the adult timepoint, a main new finding of this study.

      The following points should be addressed to reinforce the conclusions and emphasize the novel features of this study.

      (1) It would be helpful to clarify how these new findings confirm or diverge from what is known from analysis of neural crest and mesodermal lineage contributions to different cell populations in the mouse heart. Did the authors identify any human-specific populations of cells, such as the LGR5 population reported by Sahara et al?

      (2) The authors should clarify in the introduction and results that they consider the endocardium to be on the SHF trajectory as indicated in Figure S4C. Please add a reference for this point.

      (3) The GATA6 results are interesting and support this experimental approach. The paper would be reinforced if the authors could provide any functional validation (in addition to their GATA6 genomic occupancy data) that the designated target genes are regulated by GATA6. This might involve looking at mutant mouse embryos or cultured cells. Do the authors consider that GATA6 may regulate the endocardial to mesenchymal transition during the early stages of valve development? Or the valve interstitial cell versus fibroblast fate choice?

      (4) Do the new findings reveal whether human valves have a direct SHF to VIC trajectory (ie, without transiting through endocardium) as has been recently shown in the murine non-coronary valve leaflet? Relevant to this point, Figure 5E appears to show contributions to a single adult aortic valve leaflet - this should be explained, or corrected.

      We sincerely thank the Editor and the Reviewers for their constructive and insightful comments. We have carefully addressed the majority of the points raised and believe the revisions have substantially strengthened the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers felt that integrating these datasets with prior snRNAseq datasets on human OFT (de Bono et al, 2025) would enhance analyses and provide broader context.

      Several human fetal heart single-cell datasets have been published, including De Bono et al, 2025. We carefully considered whether integrative analyses with these datasets would further strengthen our study. However, there are substantial differences in anatomical scope: most published datasets encompass broad cardiac regions, whereas our study specifically targets the OFT, enabling higher-resolution characterization of OFT-specific cell states. Integration across datasets with markedly different regional compositions would likely be driven by largescale anatomical differences rather than yield additional OFT-specific insight. In addition, cross-study integration requires batch correction. When datasets differ in anatomical scope, as well as developmental timing, and experimental protocols, stronger correction may be needed, increasing the risk of overcorrection and potential loss of biologically meaningful OFTspecific signals.

      Importantly, our dataset has been deposited in the Human Cell Atlas and is fully available for future comparative analyses. We therefore believe that broader cross-dataset integration is best undertaken within such harmonized frameworks as more closely matched datasets become available.

      Overall, cluster annotations should be more rigorous, which may be facilitated by comparisons with earlier studies.

      We have clarified all the points raised by the reviewer regarding cluster annotation. Specifically: (1) the “cardiac” cluster has been renamed “cardiac muscle” to more accurately reflect its transcriptional identity; and (2) we now explicitly state that mesenchymal populations not resolved in the initial global analysis (across all samples) were subsequently defined through dedicated sub clustering analyses performed separately for the adult and developmental datasets. These clarifications have been incorporated into the revised manuscript.

      Citation of other spatial transcriptomics studies on human OFT would be useful.

      We apologise for missing these contributions. They have now been added to the text.

      Can the authors identify a human-specific population of cells, such as the LGR5 population reported by Sahara et al?

      While our dataset does not reveal a novel single-gene marker comparable to the human specific LGR5 marker described for the LGR5-positive population by Sahara et al., it does identify a distinct GATA6-enriched embryonic mesenchymal population that functions as a human valve progenitor lineage. Using regulatory network analysis, RNA velocity, lineage tracing and spatial transcriptomics, we show that this GATA6-driven program is specifically associated with semilunar valve morphogenesis and that its transcriptional signature persists in fetal and adult VIC populations. Thus, the novelty of our study lies in defining this human GATA6-regulated valve progenitor population and its lineage trajectory, rather than in the identification of previously unreported single marker genes.

      “….Although we have not defined a novel single-gene marker (analogous to LRG5 [Sahara et al]), our identification of a GATA6 network highlights…..”

      Further investigation of the specific role of GATA6 would strengthen findings.

      FISH studies would indicate whether GATA6 is involved in EMT or fibroblast versus valve interstitial cell fate choice.

      We have added a panel to Fig. S2 (D), showing that GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing nuclei are detected across all embryonic clusters. Given this broad expression pattern, FISH analysis would not distinguish whether GATA6 functions in EMT or in fibroblast versus valve interstitial cell fate specification. While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that its expression levels are highest in cluster 4 (post-EMT) cells. This suggests that GATA6 activation is more likely a consequence of the transition rather than its initiating cause (shown in Fig. S2D).

      Functional validation of some proposed GATA6 targets would strengthen findings.

      To our knowledge, there are currently no publicly available datasets defining the GATA6 regulatory network in human OFT cells or valvular fibroblast progenitors. Existing datasets focus primarily on cardiomyocytes, which arise from a distinct developmental lineage. Given the well-established cell-type and context dependence of transcription factor activity, these datasets are unlikely to provide meaningful insight into regulatory relationships within the valvular lineage examined here.

      As noted in the original submission, we previously leveraged published mouse GATA6 ChIPseq data from E11.5 OFT (DOI: https://doi.org/10.7554/eLife.31362) as independent support for the GATA6 regulon identified in our human dataset. In this revised version, we have now extended this analysis by formally quantifying the overlap between the cluster 4 GATA6 regulon and genes bound by GATA6 in the mouse OFT dataset. Using a hypergeometric enrichment test, we found that the observed overlap is approximately two-fold greater than expected by chance and highly significant (p = 1.2 × 10<sup>-33</sup>). This statistical analysis strengthens our original interpretation and provides quantitative support that the identified regulon is strongly enriched for bona fide GATA6-bound targets in a closely related developmental context.

      In addition, we examined the spatial expression pattern of the GATA6 regulon gene set and found that it specifically localizes to the semilunar valves (OFT derivatives), consistent with GATA6 activity in this developmental context. This new analysis has been incorporated into Figure 2F of the revised manuscript.

      Collectively, the cross-species binding enrichment and valve-specific expression pattern provide orthogonal support for the biological relevance of the identified GATA6 regulon and strengthen the mechanistic interpretation of GATA6 function in OFT and valve development.

      As GATA6 has been previously identified in mouse studies, can the authors identify novel transcription factors potentially involved in OFT development?

      To identify additional transcription factors potentially involved in OFT development and to define regulators that may confer specificity to GATA6 activity, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20).

      While all cluster 4 regulators share some downstream targets, GLI3 regulon showed approximately twice the degree of overlap with the GATA6 regulon compared to the other factors. This suggests a potential functional interaction between GATA6 and GLI3 in OFT associated mesenchyme. Consistent with this, cooperation between GATA6 and GLI3 has been reported in mouse limb development. These findings have now been incorporated into the Results section, and co-expression of GATA6 and GLI3 in CS16-17 populations is shown in Figure S2DE.

      Although GATA6 has previously been implicated in OFT development, SCENIC analysis provides mechanistic insight by defining the downstream gene programs active in specific human embryonic lineages. Thus, the novelty of our findings lies not in re-identifying GATA6, but in characterizing its regulon in human OFT- and valve-associated mesenchyme and identifying potential cooperating regulators such as GLI3.

      Embryonic signatures in adult valve cells are an interesting finding, that should be further explored by pseudotime trajectories, which may also indicate whether SHF cells have a direct trajectory to VIC (without transiting endocardium), as recently shown in mice.

      We included all embryonic populations, including cardiac progenitor cells (SHF), in the pseudotime trajectory analysis. However, we did not observe evidence of a direct trajectory from SHF cells toward VIC. In contrast, the same analysis consistently identified a trajectory linking endocardial cells to VIC, supporting an endocardial origin in our dataset.

      Reviewer #1 (Recommendations for the authors):

      (1) Major conclusions regarding cell lineages and derivatives are based on common gene expression patterns and bioinformatic tools. Thus, these conclusions are not based on empirical data, and assumptions regarding lineages based on gene expression may not be accurate. The language related to lineage analysis, derivative, and longitudinal gene expression is not supported by data. For example, studies in mice have shown that aortic valve interstitial cells from endocardial cushions and neural crest-derived lineages have overlapping patterns of ECM gene expression and cannot be easily distinguished in adults. Thus, it is not possible to determine derivation and cell origins based on gene expression alone.

      While we fully acknowledge that gene expression-based analyses provide correlative rather than direct lineage-tracing evidence, the Reviewer’s statement that “it is not possible to determine derivation and cell origins based on gene expression alone,” and the example cited in support, appear to equate global transcriptional similarity with the distinct embryonic transcriptional signatures that underpin our analysis.

      As the Reviewer notes, a given differentiated cell type can derive from different embryonic progenitors. Due to functional convergence, differentiated cells often exhibit highly similar expression profiles that reflect their shared function rather than developmental origin. Consequently, discriminating embryonic origins based on global expression profiles, or even for highly distinctive genes of differentiated cells, is very challenging. The example cited by the Reviewer - overlapping ECM gene expression in aortic valve interstitial cells derived from endocardial cushions and neural crest - illustrates precisely this point.

      However, our analysis does not rely on global transcriptional similarity or on markers of mature differentiated cells. Instead, we specifically identified gene sets that are highly distinctive of embryonic clusters prior to the onset of differentiation. These signatures are enriched for transcription factors and signaling molecules that define developmental identity, rather than functional effector genes associated with mature cell states. We have shown that these embryonic signatures persist in fetal cells (which already express differentiated markers but are developmentally closer to the embryonic stage relative to adult cells) and remain detectable, albeit attenuated, in adult cells. It is these distinctive embryonic transcriptional signatures, rather than global or shared functional gene expression, that we have used to infer potential lineage relationships.

      We fully acknowledge that this constitutes correlative evidence rather than direct lineage tracing, which is not feasible in human studies. However, the persistence of embryonic regulatory signatures into fetal and adult stages provides a biologically plausible link to developmental origin. This persistence most plausibly reflects partial retention of ancestral embryonic transcriptional programs in descendant cells, rather than de novo activation later in life of embryonic genes that were never previously expressed in that cell’s lineage.

      (2) Most of the findings related to cell composition, gene expression, and cell lineages seem to be largely confirmatory of previous reports. Novel findings should be emphasized and validated in the tissues.

      We agree that several aspects of our dataset reproduce and extend findings from previous human and animal studies, which we regard as an important validation of the atlas. However, our study also provides multiple novel insights that are directly supported by our spatial data. Specifically, we (i) identify a GATA6-enriched embryonic mesenchymal valve progenitor population, (ii) delineate its GATA6 transcriptional regulon and direct targets implicated in OFT and valve disease, and (iii) trace its embryonic transcriptional signature into fetal and adult valve interstitial cell populations. These findings are strengthened by our spatial transcriptomic data, which maps the GATA6 regulon and key targets to the semilunar valves and adjacent arterial root, providing in situ validation of both cell identity and gene expression patterns (see Fig. 3 and the newly added Fig. 2F). We have revised the Discussion to more explicitly highlight these novel aspects and their spatial validation in the final

      “In summary, our work goes beyond confirming previously reported cell types by (i) defining a GATA6-regulated human valve progenitor lineage and its descendants, (ii) establishing distinct embryonic origins for smooth muscle and valvular fibroblasts, and (iii) demonstrating persistence of embryonic signatures in adult valve cell populations. These findings are directly supported in tissue by our spatial transcriptomics data, which map these lineages and regulatory programs to defined anatomical domains within the human OFT and semilunar valves.”

      (3) The developing outflow tract of the heart contributes to more than just the aortic valve leaflets in adults. Additional conotruncal structures need to be evaluated in order to define adult derivatives of the developing outflow tract as described in the title.

      The title has been changed to reflect that only adult aortic valves were examined.

      (4) Major conclusions regarding the GATA6 regulatory network and downstream target genes are not validated in the context of the developing outflow tract or adult valves. Is GATA6 expression restricted to specific outflow tract populations? Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We performed additional analyses that further reinforce the relationship between GATA6 and its target genes and support the biological relevance of GATA6 downstream targets in arterial valve development. Below, we address the specific questions raised by the reviewer.

      (1) Is GATA6 expression restricted to specific outflow tract populations?

      GATA6 expression is not restricted to specific outflow tract populations. In CS16-17 embryos, GATA6-expressing cells are detected across all embryonic clusters; however, expression levels are highest in cluster 4 (valve precursor cells).

      Despite this broad expression pattern, SCENIC identifies GATA6 activity (i.e., a GATA6 regulon) specifically in cluster 4. This apparent restriction of GATA6 regulatory activity to cluster 4 may be explained, at least in part, by its elevated expression levels within this cluster. Alternatively, given that transcription factors often act in a combinatorial manner, GATA6 may co-regulate its target genes in cluster 4 together with additional cluster-specific regulators. To explore this possibility, we compared the GATA6 regulon with the regulons of other cluster 4 transcription factors identified by SCENIC (namely SOX4, GLI3, RARG, ETV1, GLIS3, BACH2, ZNF423, FOXO3, ZBTB20) in order to identify potential co-regulatory modules. As expected, since these regulons are sampled from the subset of genes enriched in cluster 4, all regulators share a substantial proportion of downstream targets with GATA6. However, GLI3 stands out, showing approximately twice the degree of overlap compared to the other factors. This suggests a functional interaction between GATA6 and GLI3, consistent with previously reported cooperation in mouse limb development. These results have been incorporated into the Results section, and the expression of GATA6 and GLI3 in CS16-17 cell populations is shown in Fig. S2DE.

      (2) Is GATA6 binding or responsive gene expression detected for the indicated target genes?

      We were unable to find public data describing the GATA6 regulatory network or its downstream targets in the specific human cell types examined here (OFT cells; valvular fibroblast progenitors). Available datasets focus primarily on cardiomyocytes, which arise from a distinct lineage, and because transcription factor function is highly cell-type and context dependent, these datasets are unlikely to be helpful in inferring regulatory relationships in the valvular lineage.

      The strongest validation for the GATA6 regulon identified in this study comes from the mouse GATA6 occupancy data (this was included in the original manuscript). Although derived from a different species, GATA6 binding has been profiled in a highly related developmental context, the OFT. To assess the relevance of these data to our human findings, we performed a hypergeometric test comparing the GATA6 regulon identified in cluster 4 (this study) with genes bound by GATA6 in E11.5 mouse OFT ChIP-seq data (DOI: https://doi.org/10.7554/eLife.31362). The observed overlap is substantially greater than expected by chance: it is approximately twice the expected value, and the enrichment is highly significant (p = 1.2 × 10<sup>-33</sup>). Biologically, this strongly supports the interpretation that many genes within GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding, rather than representing a random gene set. This analysis has been added to the revised manuscript.

      In this revised version of the manuscript, we also overlapped the expression of GATA6 regulon genes to our fetal spatial transcriptomics data. GATA6 regulon was identified in embryonic cluster 4, whose expected trajectory is fetal valvular fibroblasts (cluster 12). Remarkably, GATA6 regulon genes are expressed in both the aortic and pulmonary valves, and their expression pattern aligns closely with HAPLN1-positive valvular fibroblasts (cluster 12), further supporting the biological relevance of this gene set. This new data has been added to Fig 2(F).

      Together, the strong enrichment of GATA6 regulon genes among GATA6-bound targets in the OFT, and the specific expression of this gene set within the arterial valves (cluster 4 descendant cells), support the biological relevance of GATA6 downstream targets in arterial valve development and disease. In addition, we identify GLI3 as a potential GATA6 co-binding partner.

      (5) What are "cardiac" cell types in the embryonic single cell clustering? Are these cardiomyocytes? Cardiac is an ambiguous term if the cells being analyzed are all in the heart.

      Thank you for highlighting this ambiguity. The “cardiac” population refers specifically to cardiac muscle cells. We have updated the labels in Fig. 1E, 1F, and Fig. S3A to make this explicit.

      (6) The methods and analytical tools seem fairly standard for single nuclear gene expression and spatial genomics studies. What are the new tools and resources being reported? The "novel lineage tracing algorithm" mentioned in the methods is not well described. A Cellxgene VIP app is mentioned, but is not described in detail. Also, it seems to be housed on a local server, which is not optimal.

      The description of the lineage tracing algorithm has been expanded in the method’s section of the paper.

      The data has been submitted to the Human Cell Atlas, a coordinated global effort to systematically map human cell types using standardized, interoperable formats. Public access via cell x gene enables interactive visualization, gene-level queries, and cross-dataset comparisons without requiring advanced computational expertise. This broad accessibility enhances reproducibility, facilitates integration with complementary single-cell and spatial datasets, and maximizes the visibility, transparency, and long-term impact of our work.

      (7) Only adult aortic valves from females were included in the study.

      The rationale for using female tissues has been explained in the result section:

      We collected female samples to mitigate individual variability and maximise the possibility to analyse healthy aortic valves, justified by the lower incidence and severity of aortic disease in females versus males.

      (8) In many of the figures, the font size of the text is too small to read.

      We have increased the font size in all figures where this was compatible with the layout. For the larger plots, additional enlargement would necessitate scaling the panels beyond the allowable page dimensions, and therefore could not be implemented.

      (9) "CAT" is not a commonly used abbreviation for congenital heart anomalies related to persistent truncus arteriosus.

      CAT is now the preferred term for PTA as latinised terms are no longer used.

      Reviewer #2 (Recommendations for the authors):

      Overall, this study is thoughtfully conducted and offers valuable observations that contribute to our understanding of valve morphogenesis. However, my main concern is the lack of experimental validation to support the findings, particularly the conclusion regarding the persistence of transcriptional signatures in adult cells, which is not sufficiently substantiated or clearly argued. It is unclear how this study advances beyond previous research in humans.

      Major points:

      (1) Several recent studies have applied spatial transcriptomics to human embryonic and fetal hearts, including OFT (Asp et al., 2019; Queen et al., 2023; Farah et al., 2024; De Bono et al., 2025). It is disappointing that the authors did not acknowledge these important contributions.

      We apologise for missing these contributions. They have now been added to the text.

      (2) The present study used snRNAseq to explore the transcriptional signature of the fetal OFT. A similar approach was used by De Bono et al. (2025) to analyze fetal hearts. Integrating these complementary snRNAseq datasets could enhance the current analysis and provide broader context for the findings.

      The reviewers suggested that integrating our datasets with prior snRNA-seq datasets on human OFT (de Bono et al., 2025) could enhance the analyses and provide broader context. While several fetal heart datasets have been published (e.g., Sahara et al.), our study focuses specifically on the OFT. These other studies do not perform cross-dataset comparisons. We therefore do not see a strong rationale for integrating ours, especially given that those datasets cover much larger regions of the heart.

      (3) Figure 1 presents 18 distinct clusters identified through unsupervised clustering. The authors classify three of these clusters broadly as mesenchymal cells. However, the term "mesenchymal cells" lacks precision. The authors should clarify why these clusters were not more specifically defined as fibroblasts or myofibroblasts based on marker expression.

      Clustering of the full dataset does not provide sufficient resolution to distinguish all mesenchymal cell types. The clusters broadly annotated as mesenchymal comprise heterogeneous populations, including both undifferentiated embryonic mesenchymal cells and more differentiated fetal mesenchymal cells. These mesenchymal clusters were therefore further subclustered, and the resulting cell identities are described in detail in the Results sections corresponding to Fig. 2 and Fig. 3.

      (4) The authors used SCENIC on their snRNAseq datasets to infer key cell fate regulators and identified GATA6 as a top regulator of embryonic mesenchymal cluster 4. However, the rationale for focusing on GATA6, which is already known to be associated with CHD in humans, is not fully convincing. Why not investigate a transcription factor whose role in valve development remains unexplored?

      There are two key outcomes from a SCENIC analysis: (1) the identification of major transcriptional regulators driving the differentiation of a given cluster, and (2) the identification of their regulons (the downstream gene programs they control). While GATA6 is indeed already known to be associated with CHD in humans, including valve malformations and major OFT defects, its downstream targets in the relevant human developmental lineages have not been defined. Understanding these targets is essential for clarifying the molecular basis of GATA6-mediated CHD. Thus, the significance of our result does not lie in the rediscovery of GATA6 as a CHD-related factor, but in identifying the genes it regulates in embryonic OFT- and valve-associated mesenchyme. These GATA6-controlled genes in the OFT and valves represent biologically plausible candidate genes for human OFT defects, as disruption of GATA6 targets could similarly contribute to CHD.

      In this revised version we have performed a hypergeometric test showing that GATA6 regulon genes are significantly enriched among genes bound by GATA6 in the OFT. Biologically, this strongly supports the interpretation that many genes within the GATA6 regulon are likely to be direct GATA6 targets, or at minimum are strongly associated with GATA6 binding in the OFT, rather than representing a random gene set.

      We have also mapped the expression of GATA6 regulon to the semilunar valves. Collectively, these analyses demonstrate that the GATA6 regulon captures a biologically coherent and developmentally relevant program, offering new mechanistic insight into how GATA6 influences OFT and valve formation and how its disruption may contribute to CHD.

      (5) Several studies have already suggested a role for GATA6 in EMT. Do the authors propose that GATA6 regulates this process during embryonic valve development? Once again, validation using FISH would be important to support these findings.

      We do not propose that GATA6 directly regulates EMT during embryonic valve development. We rather make two independent observations: (1) cluster 4 derives from cluster 7 (likely through EMT); (2) GATA6 regulates cluster4-specific genes.

      The first observation is supported by RNA velocity, which links cluster 7 to cluster 4. Supporting this interpretation, endothelial cluster 7 is enriched for genes associated with arterial valve development, and mesenchymal cluster 4 cells are identified as progenitors of fetal valve fibroblasts. Because cluster 7 is endothelial and cluster 4 is mesenchymal, this trajectory suggests an endothelial-to-mesenchymal transition.

      Second, SCENIC analysis identifies GATA6 as a regulator of cluster 4 genes. Additionally, the GATA6 regulon shows distinct localization to the formed valves in fetal cells (new data added to Fig 2F). Together these findings support the notion that GATA6 regulates a gene program specific to the cell populations that will give rise to the valves and that these genes remain selectively expressed in valve cells once the arterial valves have formed.

      While we cannot exclude the possibility that GATA6 contributes to EMT, we observe that GATA6 expression levels are highest in cluster 4 (post-EMT) cells, suggesting that its activation may be a consequence of the transition rather than its initiating cause (now shown in Fig S2D).

      For validation using FISH, please see response to point 6 below

      (6) I found it curious that the ST section was used to validate MECOM expression (Figure 2I), while ST had not yet been introduced at this point in the manuscript. Validation using FISH would have been a more appropriate approach.

      Thank you for drawing attention to this discrepancy. Spatial transcriptomics is now introduced before MECOM analysis, in the Results section pertaining to Figure 2F

      “…spatial transcriptomic analysis of a later stage (12pcw) OFT shows that GATA6 regulon is mainly restricted to the aortic and pulmonary valves (Fig 2F)”.

      With regard to this and the above comment concerning FISH, while RNA FISH/RNAscope would provide an additional orthogonal approach, the Visium-based spatial transcriptomics platform directly measures MECOM transcripts in tissue sections and, in our view, represents an appropriate and sufficiently sensitive method for validating its spatial distribution in the human OFT. We have therefore relied on the spatial transcriptomics dataset to confirm and validate gene expression patterns, rather than performing additional FISH experiments. We now explicitly state that this approach serves as an independent in situ validation of gene expression, including MECOM.

      (7) "Spatial resolution of mesenchymal nuclei in the OFT" section: It is unclear which cluster the authors are referring to in this section.

      As mentioned in the text, we “mapped the five fetal mesenchymal clusters to distinct structures in the OFT” and used distinctive markers to confirm spatial assignments.

      (8) The authors should justify their choice to use Cell2location instead of a deconvolution method.

      We selected cell2location because it provides a probabilistic, hierarchical Bayesian framework that explicitly models technical variability across both single-cell reference data and spatial transcriptomics platforms. Rather than relying on predefined marker genes or simple linear regression, cell2location leverages the full transcriptomic profile of reference single-cell data and incorporates a factor analysis-based framework to model shared transcriptional signatures and latent structure across cell types. This approach improves discrimination between closely related cell states and reduces sensitivity to gene selection bias. Additionally, the probabilistic formulation yields uncertainty estimates for inferred cell abundances, enhancing interpretability and statistical rigor. Together, these features make cell2location particularly well suited for resolving complex cellular composition in our fetal human tissue spatial transcriptomics data.

      (9) Figure 3: Cluster 9 is identified as endothelial, yet it includes markers such as MYH11 among its top genes, a gene more commonly associated with cells at the base of the aorta. This raises questions about the accuracy of the cluster annotation.

      We could not find the definition of cluster 9 as endothelial to which the reviewer refers to. In Fig 3, both in the result text and in the figure legend, cluster 9 is identified as smooth muscle, which is consistent with MYH11 expression. The endothelial cluster is shown in Fig S3C.

      (10) The approach used to trace embryonic signatures in adult cells, based on overlap with the top 100 genes in embryonic clusters, relies largely on gene expression similarity, without incorporating lineage inference tools such as RNA velocity or pseudotime analysis. This limits the ability to distinguish true developmental relationships from shared functional programs. I believe that the use of aggregated adult samples may mask individual variability. Validation in separate samples (AV1 and AV3) lacks statistical rigor. The observed lower expression of embryonic genes in adult cells further complicates interpretation, raising the possibility that these signatures reflect residual expression rather than persistent lineage markers.

      We thank the reviewer for the opportunity to clarify our approach.

      We fully agree that tools such as RNA velocity and pseudotime are powerful for capturing short-term dynamic transcriptional changes and inferring lineage trajectories within continuous developmental processes. Indeed, we applied RNA velocity and identified a transition between clusters 7 and 4 in embryonic cells (Fig 2). However, as noted in the Results section, “trajectory inference methods failed to establish lineage relationships between embryonic and fetal populations”. These methods assume temporal continuity and comparable transcriptional kinetics between cells. When comparing samples separated by large developmental intervals (e.g., embryonic versus adult tissues), these assumptions do not hold: RNA velocity vectors become unreliable and may even yield biologically meaningless directions. Therefore, rather than forcing a continuous trajectory across temporally distant datasets, we employed an anchoring approach designed to identify conserved transcriptional programs and potential lineage correspondences between embryonic and adult cell types.

      To address the concern about individual variability, we performed analyses both on aggregated adult samples and on individual replicates (AV1 and AV3). The results were highly consistent across both levels of analysis, and statistical significance was supported by very low p-values, indicating that the observed patterns are robust and reproducible. We therefore believe our analysis in independent samples is statistically sound.

      Finally, we agree that adult cells display lower expression of embryonic genes, and we acknowledge that these signatures may represent residual rather than persistent expression. This observation aligns with our intended interpretation: our goal was not to demonstrate enduring embryonic marker expression, but to highlight that adult cells retain transcriptional traces that connect them to their developmental origins.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify if MEIS1, JAG1, ROR1, PRDM6 have been previously implicated in neural crest cell development. Are these then new potential regulators of neural crest cells? The same applies to SOX6 for the mesodermal population.

      The main reason for selecting these genes (MEIS1, JAG1, ROR1, and PRDM6 in cluster 20, and SOX6 in cluster 4) is that they serve as distinctive markers of specific embryonic clusters. Because their expression remains restricted at later developmental stages, they allow reliable tracing of bona fide descendant cells originating from cluster 20 and cluster 4 into fetal and adult tissues. Importantly, MEIS1, JAG1, ROR1, and PRDM6 were not chosen as new potential regulators of neural crest (NC) cells, but rather because their expression is enriched in cluster 20 and remains restricted at later developmental stages, allowing reliable tracing of bona fide descendant cells originating from cluster 20. Since cluster 20 is, based on transcriptional profiles, the embryonic mesenchymal cluster most closely related to the NC lineage, these markers enable lineage tracing of NC-descendent cells. Nonetheless, these genes have all been linked to neural crest biology, either through known functional roles or through specific expression patterns associated with NC development.

      Similarly, SOX6 was selected for its restricted expression in cluster 4, a pattern that is preserved in its descendant populations, making it a suitable marker for tracking the mesoderm-derived lineage.

      (2) Please comment in the text whether any regional transcriptional differences (rather than cell type differences) were detected between the aortic and pulmonary regions.

      We have added the following text to the result section related to Fig 3: “No molecular differences or distinguishing markers were identified between the aortic and pulmonary valves.”

      (3) There appear to be no myocardial cells in the adult valve tissue - the authors could discuss what the fate of myocardium is in the embryonic OFT. Are they only looking at a subset of derivatives of the embryonic OFT?

      Our adult dataset represents the aortic valve complex and adjacent arterial root tissue (a subset of outflow tract derivatives) rather than the entire outflow tract (this has now been specified in the title). Spatial transcriptomic analysis identified myocardial gene expression within the ventricular and outflow tract walls at CS16-19, but not within the valve leaflet cluster (Queen et al., 2023). This is consistent with previous observations that myocardium contributes to the arterial root and supports early cushion formation, but does not persist in mature valve tissue, which becomes predominantly fibrous and populated by valve interstitial cells. This explanation has been added to the analysis of cell populations in the valves.

      (4) Please equate Carnegie stages 13-23 to embryonic days or weeks of gestation in the first paragraph to help the general reader.

      We have added the suggested clarification and noted that this period spans four weeks of human development, rather than the three weeks previously indicated. The text has been updated accordingly.

      (5) I suggest rewriting the first sentence of the introduction using the plural, as there are many different types of CHD.

      The sentence has been changed accordingly.

      (6) It would be helpful to add the persistence of embryonic signatures into adult valve cell types in Figure 4E.

      We thank the reviewer for this helpful suggestion. To address this point, we have now added an analysis of the persistence of embryonic signatures in adult valve cell types to Figure 4E. Specifically, we selected 10 representative genes from the 100-gene embryonic signature lists of cluster 4 and cluster 20 and projected their expression onto the t-SNE shown in Figure 4E. The combined (module) expression of these 10 genes is now shown in Figure S6E, and the expression of the individual genes is presented in the newly added Figure S7.

      We would like to clarify that our statistical framework identifies potential descendant populations based on significant enrichment of an embryonic gene signature. Therefore, individual embryonic genes are not necessarily expected to be expressed exclusively or uniformly within a single adult population.

      (7) Please explain how the 2-dimensional plot in 2J relates to the other plots.

      The plot originally shown in Fig 2J (now Fig 2K) was generated by applying RNA velocity exclusively to CS16-17 nuclei. Developmental nuclei (excluding adult samples) were subclustered as shown in Fig S2AB, resulting in the 5 clusters of embryonic nuclei analysed in Fig 2J: cardiac muscle (2, 17), endothelial (7), and mesenchymal (4, 20).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This thoughtful and thorough mechanistic and functional study reports ARHGAP36 as a direct transcriptional target of FOXC1…… Although this study largely represents a robust and near-comprehensive set of focused investigations on a novel target of FOXC1 activity, several significant omissions undercut the generalizability of the findings reported.

      (1) It is notable that the volcano plot in Figure 1a does now show evidence of canonical Hedgehog gene regulation, even though the subsequent studies in this paper clearly demonstrate that ARHGAP36 regulates Hedgehog signal transduction. Is this because canonical Hedgehog target genes (GLI1, PTCH1, SUFU) simply weren't labeled? Or is there a technical limitation that needs to be clarified? A note about Hedgehog target genes is made in conjunction with Table S1, but the justification or basis of defining these genes as Hedgehog targets is unclear. More broadly, it would be useful to see ontology analyses from these gene expression data to understand FOXC1 target genes more broadly. Ontology analyses are included in a supplementary table, but network visualizations would be much preferred.

      Space constraints precluded labelling the Volcano plot with all 285 significantly differentially expressed genes. So rather than just Hedgehog pathway members, the most dysregulated were labelled (those with a 4-fold change: -2 <log\<sub>2\</sub>> +2) and the full list of DEGs provided in the supplemental excel file. We have added the suggested network analysis, and for additional rigor also included protein interaction partners of Gli1 and Arhgap36 (Fig. S12).

      (2) Likewise, the ChIP-seq data in Figure 2 are under-analyzed, focusing only on the ARHGAP36 locus and not more broadly on the FOXC1 gene expression program. This is a missed opportunity that should be remedied with unbiased analyses intersecting differentially expressed FOXC1 peaks with differentially expressed genes from RNA-sequencing data displayed in Figure 1.

      We agree that genome-wide analysis of ChIP-seq data from Foxc1 over-expression is worthwhile, not least for diverse malignancies where FOXC1 is over-expressed. We chose to restrict the focus of this paper in order to define, as comprehensively as we could, the FOXC1 - ARHGAP36 relationship. Our ChIP and RNA-seq datasets are freely available to other researchers via GEO (GSE297865/GSE297719). Our future manuscript is integrating ChIP-seq and RNA-seq with ATAC-seq: replicate ATAC-seq experiments permit rigorous characterization of genes transcriptionally regulated by Foxc1 as well as Foxc1’s pioneering abilities. However, these additional assays, and particularly validation of findings, take significant time and so lie beyond the scope of the current manuscript.

      (3) RNA-seq and ChIP-seq data strongly suggest that FOXC1 regulates ARHGAP36 expression, and the authors convincingly identify genomic segments at the ARHGAP36 locus where FOXC1 binds, but they do not test if FOXC1 specifically activates this locus through the creation of a luciferase or similar promoter reporter. Such a reagent and associated experiments would not only strengthen the primary argument of this investigation but could serve as a valuable resource for the community of scientists investigating FOXC1, ARHGAP36, the Hedgehog pathway, and related biological processes. CRISPRi targeting of the identified regions of the ARHGAP locus is a useful step in the right direction, but these experiments are not done in a way to demonstrate FOXC1 dependency.

      We agree and undertook the suggested luciferase reporter assays. The results demonstrate that transcriptional activity is dependent on Foxc1 and abrogated by mutation of the predicted Foxc1binding motifs (Fig. S8).

      (4) It would be useful to see individual fluorescence channels in association with images in Figure 3b.

      The figure has been revised to provide individual fluorescence channel data, as suggested.

      (5) Perhaps the most significant limitation of this study is the omission of in vivo data, a shortcoming the authors partly mitigate through the incorporation of clinical outcome data from pediatric neuroblastoma patients in the context of ARHGAP36 expression. The authors also mention that high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors," but do not provide clinical outcome data for these cohorts. Such analyses would be useful to understand the generalizability of their findings across different cancer types. More broadly, how were high, medium, and low levels of ARHGAP36 expression identified? "Terciles" are mentioned, but such an approach is not experimentally rigorous, and RPA or related approaches (nested rank statistics, etc) are recommended to find optimal cutpoints for ARHGAP36 expression in the context of neuroblastoma, "specific CNS, breast, lung, and neuroendocrine" tumor outcomes.

      The issue of analyzing in vivo data for neuroblastoma is addressed in more detail below, as it is also raised by the other reviewers. The neuroblastoma data represent the initial findings after the Foxc1Arhgap36 link was defined. There is vastly more that could and should be undertaken to determine mechanism(s) for ARHGAP36’s beneficial association with this tumor’ survival. This is the ongoing focus for the lab.

      The original text omitted details of the cancer expression datasets surveyed that revealed high levels of ARHGAP36 expression were also detected in "specific CNS, breast, lung, and neuroendocrine tumors". This oversight has been corrected – when submitting, we omitted to upload a supplemental file (Table S4) that provided these data, which were derived from the following four sites (TCGA, TARGET, PCAWG and CCLE). However, these excellent online resources infrequently provide clinical outcome data.

      The three independent neuroblastoma cohorts were analyzed identically. Each was stratified into an ordered dataset for ARHGAP36 expression, and then divided into three equal-sized groups [terciles]. Stratification into smaller subgroups [quartiles/quintiles] would have been equally feasible. The same methodology is used by the UCSC Xena browser for Kaplan-Meier survival analysis, and offers the advantage of avoiding a priori assumptions; it is thus agnostic regarding the data. We agree that there is scope for additional approaches, including recursive partitioning analyses, but suggest it may be better to reserve these for the future, not least in analyses that test the reported ARHGAP36-survival association in additional neuroblastoma datasets.

      Reviewer #2 (Public review):

      FOXC1 is a transcription factor essential for the development of neural crest-derived tissues and has been identified as a key biomarker in various cancers. … Together, these findings uncover a novel FOXC1-ARHGAP36 regulatory axis that modulates Hh and PKA signaling, offering new insights into both normal development and cancer progression.

      The main strengths of the study are:

      (1) Identification of a novel signaling pathway involving FOXC1 and ARHGAP36, which may play a critical role in both normal development and cancer biology.

      (2) Mechanistic investigation using RNA-seq, ChIP-seq, and functional assays to elucidate how FOXC1 regulates ARHGAP36 and how this axis modulates Hh signaling.

      (3) Clinical relevance demonstrated through analysis of neuroblastoma patient datasets, linking ARHGAP36 expression to improved 5-year overall survival.

      The main weaknesses of the study are:

      (1) Lack of validation in neuroblastoma models - the study does not directly test its findings in neuroblastoma cell models, limiting translational relevance.

      We agree that the mechanisms by which increased ARHGAP36 levels are protective, are important to define. Despite experiments over many months manipulating ARHGAP36 expression, that induce quite rapid death of neuroblastoma cells in vitro, the precise mechanism(s) remain unresolved. Currently, we are endogenously labelling multiple neuroblastoma lines with Histone 2B-mCherry to facilitate live cell imaging and differentiate effects on proliferation and apoptosis. In the interim, we believe publication of the current dataset allows other researchers to independently test our findings for this pediatric malignancy. We are also establishing collaborations to access patient tissue samples, that will facilitate investigation of non cell autonomous mechanisms mediated via the tumor microenvironment.

      (2) Incomplete mechanistic insight into PKA regulation - the study does not fully elucidate how FOXC1-ARHGAP36 regulates PKAC activity at the molecular level.

      Other laboratories elegantly demonstrated that ARHGAP36’s effect on Hedgehog output is mediated by one motif blocking PKAC activity and the targeting of PKAC for degradation [PMIDs 25024229, 27713425, 30598432]. With these effects well-established, we limited experiments to confirming that Foxc1induced Arhgap36 reduced PKAC, and pT197 PKAC levels, to those of ectopic Arhgap36 expression.

      (3) Insufficient discussion of clinical outcome data - while ARHGAP36 expression correlates with improved survival in neuroblastoma, the manuscript lacks a clear interpretation of this unexpected finding, especially given the known oncogenic roles of FOXC1, ARHGAP36, and Hh signaling.

      ARHGAP36 expression may influence neuroblastoma survival via multiple mechanisms. Considering just canonical Hedgehog, possibilities include: cell cycle modulation, symmetric vs asymmetric cell division, maintenance of cancer stem cells, EMT, metastasis… Others include Hedgehog’s anti-apoptotic roles and the diverse mechanisms by which PKA influences cell function and survival. Faced with such diversity, we focused the discussion on what the presented data demonstrate.

      Reviewer #3 (Public review):

      Summary:

      The focus of the research is to understand how transcription factors with high expression in neural crest cell-derived cancers (e.g., neuroblastoma) and roles in neural crest cell development function to promote malignancy. The focus is on the transcription factor FOXC1 and using murine cell culture, gain- and loss-of-function approaches, and ChIP profiling, among other techniques, to place PKC inhibitor ARHGAP36 mechanistically between FOXC1 and another pathway associated with malignancy, Sonic Hedgehog (SHH).

      Strengths:

      Major strengths are the mechanistic approaches to identify FOXC1 direct targets, definitively showing that FOXC1 transcriptional regulation of ARHGAP36 leads to dysregulation of SHH signaling downstream of ARHGAP36 inhibition of PKC. Starting from a screen of Foxc1 OE to get to ARHGAP36 and then using genetic and pharmacological manipulation to work through the mechanism is very well done. There is data that will be of use to others studying FOXC1 in mesenchymal cell types, in particular, the FOXC1 ChIP-seq.

      Weaknesses:

      Work is almost all performed in NIH3T3 or similar cells (mouse cells, not patient or mouse-derived cancer cells), so the link to neuroblastoma that forms the major motivation of the work is not clear. The authors look at ARHGAP36 levels in association with the neuroblastoma patient survival; however, the finding, though interesting and quite compelling, is misaligned with what the literature shows about FOXC1 and SHH, their high expression is associated with increased malignancy (also maybe worse outcomes?). Therefore, ARHGAP36 expression may be more complicated in a tumor cell or may be unrelated to FOXC1 or SHH, leaving one to wonder what the work in NIH3T3 cells, though well done, is telling us about the mechanisms of FOXC1 as an oncogene in neuroblastoma cells or in any type of cancer cell. Does it really function as an SHH activator to drive tumor growth? The 'oncogenic relevance' and 'contribution to malignancy' claimed in the last paragraph of the introduction are currently weakly supported by the data as presented. This could be improved by studying some of these mechanisms in patient-derived neuroblastoma cells with high FOXC1 expression. Does inhibiting FOXC1 change SHH and ARHGAP36 and have any effect on cell proliferation or migration? Alternatively, does OE of FOXC1 in NIH3T3 cells increase their migration or stimulate proliferation in some way, and is this dependent on ARHGAP36 or SHH? Application of their mechanistic approaches in cancer cells or looking for hallmarks of cancer phenotypes with FOXC1 OE (and dependent on SHH or ARHGAP36) could help to make a link with cellular phenotypes of malignant cells.

      The manuscript stems from the lab’s findings that Foxc1 influences cilia-mediated signaling (Hedgehog and PDGFRalpha), offering an explanation for FOXC1’s pleiotropic phenotypes. Due to FOXC1’s largely unexplained roles in malignancy, the effects on Hedgehog prompted investigation of differential gene expression in NIH3T3 cells when Foxc1 was over-expressed. This identified Arhgap36 as a prime candidate for the Hedgehog pathway alterations, and most of the paper reports the characterization of this relationship. The final, small component of the paper, tests the relevance in neural crest derived cells, where Foxc1 has key roles. Neuroblastoma’s frequent lethality has created a network of highly supportive researchers with shared datasets, and these survival data were assayed. This in turn revealed that high levels of ARHGAP36 expression were associated with a favorable survival outcome.

      Defining the underlying molecular mechanisms for this novel association is clearly important. As outlined above, one challenge reflects the diversity of potential mechanisms, coupled with the requirement to validate those identified from 2-D culture in patient-derived tumor explants as well as immuno-deficient model organisms. Such experiments take significant time, and our present focus is on manipulating ARHGAP36 expression directly, rather than by altering FOXC1 expression, which inevitably has even more diverse effects.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The study would be strengthened by validating key findings, such as the resistance to Hh inhibition, in neuroblastoma cell lines to enhance disease relevance.

      Planned future experiments include in vitro evaluation of PKA antagonists and agonists on neuroblastoma survival.

      The authors show that FOXC1/ARHGAP36 reduces PKAC protein levels; however, it is unclear whether this regulation occurs at the transcriptional level. Assessing PKAC mRNA expression would help explain the mechanism. Additionally, if PKAC is transcriptionally downregulated, overexpression of PKAC can be used to test whether it reverses the FOXC1/ARHGAP36induced activation of Hh signaling.

      The RNA-sequencing data exclude this possibility at the transcriptional level, since PKA is not significantly differentially expressed (Table S1). Instead, Figures 1&3 support Foxc1 inducing Arhgap36 expression, with elevated Arhgap36 protein levels reducing those of PKAC and catalytically active pT197 PKAC, in both the cytoplasm and adjacent to the basal body.

      The Discussion should address the potential effects of ARHGAP36 overexpression on other signaling pathways-particularly Hh and PKA signaling and PKA in neuroblastoma. These effects may help interpret the observed association between ARHGAP36 expression and clinical outcomes in patients. Of note, it has been reported that Hh may correlate with better survival in neuroblastoma (Cancers, 2021 Apr 15;13(8):1908; J Pediatr Surg. 2010 Dec;45(12):2299).

      Both Hedgehog signaling and protein kinase A have broad effects on normal cell biology, that are likely more extensive in malignant cells. Consequently, although tempting to propose why ARHAGP36 overexpression is associated with enhanced survival, it may be better to wait until the causative mechanisms have been defined.

      If treatment information for the patient cohorts is available, it should be included as it may enhance the interpretability of the survival analyses.

      This is an excellent suggestion, although at present this information is not available to us. As the manuscript moves forward to publication, we will be liaising with the corresponding authors of the three datasets [GSE49711, E-MTAB-178191 and TARGET] to explore such additional clinical possibilities.

      The 'A' label in Figures S9 and S10 should be removed, as neither figure contains sub-panels.

      This has been corrected, as suggested.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Figure 5A, B: Unclear how meaningful the inhibitor experiments are in the absence of SHH (presumable none in the media or made by NIH3T3 cells?), other than as a control for the FOXC1 OE treated with Smo antagonists. A potentially better experiment could be to take malignant cells with high FOXC1 and high SHH signaling and put on Smo inhibitors.

      Figure 5A demonstrates Foxc1’s induction of GLI1 expression is not dependent on Hedgehog ligand. While certainly feasible to repeat in malignant cells strongly expressing FOXC1, doing this comprehensively would require testing lines from many or all of the ~15 malignancies where FOXC1 has a defined contribution.

      (2) Figure 6: the Gli2-mGFP seem to have higher levels of ciliary Sufu, they also have higher levels of Gli1 (see Figure 1C), does the Gli2-mGFP expression change SHH signaling? What controls have the authors done to test if this is a serious confound in their studies? They use it for most experiments, this is important to address.

      Although Gli2-mGFP expression affects Hedgehog signaling, in the absence of Gli2 (e.g. untransformed NIH3T3) Foxc1 induces Arhgap36 expression. The scope for interaction between Foxc1 and Gli2 represents an additional motivation for the ATAC-seq experiments described above to better determine if these two transcription factors have synergistic effects.

      (3) Figure 3B: (1) Please use color-blind friendly LUTs for the signals (same comment for other figures), (2) The Gli2-mGFP line with the current color scheme is confusing; it looks like only 647 and 555 secondaries were used, did they not image with the mGFP? Why not? (3) What is the evidence that these are basal bodies? (4) Why did the authors use cycloheximide in these IF experiments? Was this also done in other methods? The reasoning behind this is missing.

      For now, we have included separate channels for Figure 3. In future manuscripts we will adopt the suggestion of moving to either magenta and green, or cyan and magenta combinations for depicting immunofluorescence.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth.

      Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex, and suggest a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge with interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of some targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the some patterns that underscore the complexity of metabolic systems.

      Thank you for reviewing the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. In the revised version of the manuscript, the authors present convincing evidence that MMV1028806 targets the mitochondrial electron transport (ETC) chain of the parasite (although they don't identify the actual target in the ETC). The revised manuscript also nicely addresses my other criticisms of the original version. Overall, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors. In addition to insights into candidate bradyzoite inhibitors, the study also provides new insights into the physiological role of the mitochondrial electron transport chain of bradyzoites, and raises a host of interesting questions around the functional roles of mitochondria in this stage of the parasite.

      Weaknesses:

      In the revised manuscript, the authors have included additional oxygen consumption rate data that indicate that MMV1028806 targets the mitochondrial electron transport chain (ETC). These data are convincing. On line 481, the authors state that "treatments with ATQ, BPQ, MMV1028806, and antimycin A resulted in substantially reduced oxygen consumption levels relative to the DMSO control and suggest indeed a blockage of the mETC consistent with the inhibition of the bc1-complex." The OCR assay the authors use is still only an indirect measure of bc1 activity. Given that most OCR-inhibiting compounds in T. gondii are bc1 inhibitors, it is possible (and perhaps likely) that MMV1028806 is targeting this complex. However, the data cannot rule out that it is targeting another component of the ETC (or potentially even a TCA cycle enzyme). Without a direct test that MMV1028806 inhibits bc1 complex activity, the authors should be more cautious in their interpretation (e.g. by acknowledging the limitations of their conclusion, or acknowledging other possible targets). Similarly, the conclusion on line Line 622 that "... we confirmed the bc1-complex as a target" is overstating the findings. The phrasing on lines 683-695 is more appropriate: "... suggesting that it also targets complex III or a functionally linked site within the mitochondrial electron transport chain."

      We are grateful for he thorough review of the updated manuscript and the identification the minor issues. We addressed all of them as detailed below. We also tempered our conclusions regarding the identification of the bc1-complex as a target in line 616:

      “In addition to abundance data, Additionally, we confirmed the bc1-complex as a target by monitoring the incorporation of <sup>13</sup>C and <sup>15</sup>N stable isotopes from glucose and glutamine, respectively, into TCA cycle and pyrimidine biosynthesis intermediates suggest the bc1-complex as a target”

      Reviewer #3 (Public review):

      Summary:

      The authors described an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite-stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlight different metabolic outcome for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused in the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Thank you for reviewing the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thanks for making appropriate updates. I believe it makes the report stronger. Just please double-check proof-reading in newly added text: for example "integration" is misspelled in Figure 4 legend (C, E).

      Typos have been corrected throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on an excellent study. I have several minor comments for the authors to consider before publication.

      Line 99. Schistosoma –

      Corrected

      Line 123. What was the pH of the bicarb-free RPMI medium?

      Added “at pH 7.2”

      Line 218 (and again on line 687). "RHku80" - are these just standard RH strain parasites? Or do the authors mean to imply that the ku80 gene has been knocked out in this line? If the latter, RH∆ku80 may be a better way to describe this line.

      We harmonized all mentions of this strain to RH∆ku80.

      Line 225. "Parasites were incubated in medium with one of the following treatments ..." How long were the parasites incubated in the different treatments before the plate was read? Was there any preincubation? I think not, but it would help to state this so the reader can appreciate that the effects of the compounds on OCR is likely an immediate (rather than a secondary) effect.

      This is indeed a good suggestion. There was no pre-incubation and we added changed the text to: “Parasites were incubated in medium with one of the following treatments immediately before measurement: … “

      Figure S2A. Check the spelling of Toxoplasmosis.

      Done, we corrected this sentence.

      Figure S2B. do you mean 'tachyzoidal' or 'tachyzocidal'? 'bradyzoidal' or 'bradyzocidal'?

      We clarified the formulation of the legends for Fig S2.

      Figure S2D. The "Tachyzoite lowest cytotoxicity" and "Bradyzoite lowest cytotoxicity" columns are, I think, depicting compound toxicity in host cells. Would it be clearer to rename these columns relative to the host cells being tested? e.g. "HFF/KD3 myotube lowest cytotoxicity"

      Good suggestion and we changed the designation accordingly.

      Line 369. "We found that tachyzocidal, bradyzocidal and dually active compounds possess a statistically significantly higher lipophilicity and this trend appeared more accentuated for bradyzocidal and dually active compounds." Significantly higher than what? Need to be clearer about the comparison being made: i.e. to non-active compounds.

      You are correct and we corrected this sentence accordingly.

      Line 500. "we attribute these changes to inhibition of host mitochondria (Fig. 5A)." The reason for referencing Figure 5A here isn't clear. Do the authors mean to point out that host mitochondrial membrane potential is affected by compound treatment? This could be stated more clearly.

      We deleted the reference to Fig 5A. We did not systematically measure the effect of the inhibitors on the membrane potential of the host mitochondria. We also changed the sentence to emphasize the speculative nature of this assertion: “we attribute these changes to potential inhibitory effects on host mitochondria”.

      Line 840. 'hurdling mechanisms'. The authors don't explain what they mean by this expression.

      We truncated the figure title to: “Untargeted metabolomic analysis of bradyzoites treated with bc1-complex inhibitors shows an energy imbalance.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Science. 2019 Jul 26;365(6451):353-360.) . So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study (.Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.) indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2 (Human LSD2/KDM1b/AOF1 Regulates Gene Transcription by Modulating Intragenic H3K4me2 Methylation, Mol Cell. 2010 Jul 30; 39(2): 222–233.), but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2.

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage (Ancelin et al., 2016), which is interesting. I think we may have used different parameters in the confocal laser shooting process. We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting (Ancelin et al., 2016). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development.

      The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the manuscript by Li et al., the authors perform a comprehensive study on the template and cofactor determinants of the SARS-CoV-2 nsp13 protein. They find that, alongside the classical processive unwinding ability of helicases driven by ATP consumption, other chaperone-like and ATP-independent functions exist for this enzyme. By testing DNA and RNA oligos in several conformations, the authors show that these functions are highly dependent on template identity, but also on the ratio of ATP to divalent cations. Ultimately, it is suggested that these distinct mechanisms of action are employed by nsp13 to orchestrate viral replication.

      Overall, this study provides some novel insights into the functionality of a central and conserved enzyme of a relevant human pathogenic virus. While the approach is important and adds to the field, particularly by characterizing the chaperoning activities and adding G-quadruplexes as templates, previous studies have already identified several determinants of nsp13 template binding and processing in vitro (Sommers et al., 2023, JBC; Park et al., 2025, JBC). In addition, some issues regarding experimental design need to be addressed to increase the cogency and biological relevance of the study.

      We thank the reviewer for recognizing the novelty of our work, particularly the ATP-independent chaperone-like activities and G-quadruplex remodeling. We also appreciate the opportunity to clarify the conceptual distinction between our study and the prior work by Sommers et al. (2023) and Park et al. (2025). We fully agree that those studies systematically defined the canonical ATP-driven motor mechanism of Nsp13. Our results on 5′→3′ polarity, DNA preference, and tail/ATP/Mg<sup>2+</sup> dependence align with these benchmarks, confirming the reliability of our platform.

      However, the core novelty of our work lies in revealing that Nsp13 functions as a multifaceted nucleic acid remodeler, integrating motor and non-motor activities within a single protein-a functional regime absent from the JBC papers. Specifically, we uncover three novel layers: 1. Mg<sup>2+</sup>-activated, ATP-independent remodeling of short duplexes and G-quadruplexes. 2. Bidirectional remodeling on duplexes in the Mg<sup>2+</sup>-primed state. 3. Intrinsic chaperone functions including strand annealing and stem-loop restructuring.

      Thus, our work fundamentally expands the biochemical model of Nsp13 from a simple ATP-driven motor to a multifunctional, mode-switchable remodeler. We will highlight these distinctions in the revised Discussion. Below, we respond point-by-point to the specific experimental design issues.

      (1) Generally, low concentrations of monovalent cations (20 mM), as used throughout this study, may influence helicase activity and artificially enhance protein binding/oligomerization, which could favor the observed chaperoning activity (Venus et al., 2022, Methods). In contrast, some helicases, such as HCV NS3, are inhibited by higher K+ concentrations (Gwack et al., 2004, FEBS). Thus, the influence of higher concentrations of monovalent cations should be tested in relevant assays, as intracellular K+ levels are usually >100 mM. Additionally, this could significantly affect template stability. For instance, in some G4 assays, the addition of the trap already leads to observable duplex formation (Figure 5), which may be due to low K+ conditions.

      We thank the reviewer for this critical comment regarding the ionic environment. We agree that monovalent cation concentrations are pivotal for both helicase activity and the structural stability of templates like G4s.

      First, we wish to clarify that the final NaCl concentration in our reaction is not 20 mM, as this refers only to the unwinding buffer. Our protein dilution buffer contains 200 mM NaCl, and each 10 μL reaction includes 2 μL of protein, contributing ~40 mM NaCl. With 20 mM from the reaction buffer, the final concentration reaches~60 mM. We will clarify this in the Methods.

      Second, our choice of ionic strength is guided by established literature. A survey of 27 published nsp13 studies (Author response table 1) shows that the majority use 20–50 mM monovalent cations, with 20 mM being most common. Mickolajczyk et al. (2021) showed that nsp13 activity is highest at low salt and declines at higher concentrations. Thus, low salt conditions are routinely used to capture nsp13’s intrinsic catalytic activity. The intracellular environment is far more complex, with crowding and interacting proteins that likely modulate helicase behavior. The low-salt conditions are therefore a deliberate simplification to isolate and define enzyme function.

      Planned experiments: We fully agree that higher salt concentrations should be tested. In the revision, we will perform key assays such as ATP-independent duplex unwinding and G4 unfolding at ≥100 mM NaCl or KCl to verify that the observed activities persist under more physiological ionic conditions

      (2) As in most publications that focus strictly on helicase (or other enzymatic) functions, the activity of the isolated protein is examined. However, particularly in the case of nsp13, core functions rely on other factors, such as nsp7/8 and other components of the replication-transcription complex (RTC). The overall structure and oligomerization state of nsp13 are altered within the complex (Chen et al., 2022, NSMB). The inclusion of such factors in key experiments would greatly improve the biological relevance of the findings.

      We agree that examining Nsp13 within the context of the RTC is essential for establishing the biological relevance of our findings. The structural reorganization of Nsp13 upon binding to Nsp12 and Nsp7/8 (Chen et al., 2022) suggests that its enzymatic "mode" may be regulated by its protein partners.

      Planned experiments: To address this, we will include the following biochemical characterizations:

      (1) Nsp13/12 and Nsp13/7/8 sub-complexes will be examined to dissect the individual contributions of the polymerase and the primase-like factors to Nsp13’s multifaceted activities.

      (2) The core RTC (Nsp13/12/7/8) will be used to evaluate how the full assembly modulates the functions of Nsp13 particularly on complex templates like G4 and pseudoknots.

      (3) In Figure 4, the authors claim that Mg2+ concentration inhibits RNA unwinding. While this is likely considering previous findings, it must be validated that duplex stabilization is not the primary cause for the observed lower dissociation rates. As the template is only 12 bp long with extensive overhangs, higher ion concentrations may significantly stabilize base pairing by reducing fraying effects. Similarly, in Figure 6, template-dependent effects of Mg2+/ATP should be ruled out.

      We thank the reviewer for this insightful suggestion. We agree that it is critical to distinguish whether the observed inhibition of RNA unwinding at higher Mg<sup>2+</sup> concentrations is due to the physical stabilization of the RNA duplex.

      Planned experiments: To address this, we will perform the following characterizations:

      (1) We will measure the Tm of the RNA duplex used in Figure 4 across a range of Mg<sup>2+</sup> concentrations (0, 0.5, and 1.0 mM). This will allow us to quantify the extent to which divalent cations stabilize the duplex RNA. These data will provide a more rigorous interpretation of the Mg<sup>2+</sup>-dependent unwinding in Figure 4.

      (2) Similarly, we will perform thermal melting analyses for the various DNA and RNA templates used in Figure 6 under different Mg<sup>2+</sup>/ATP conditions to rule out the template-dependent effects of Mg<sup>2+</sup>/ATP.

      (4) It is not entirely clear to me by which principle the templates were chosen. In my opinion, it would improve the overall comparability of the experimental results if, for instance, the blunt-ended duplex had the same sequence as the oligos with overhangs, since factors such as length, G/C content, Tm, etc., may play a significant role in binding and unwinding. Similarly, the oligos for binding and unwinding should be kept somewhat comparable, e.g., the G4 for the binding assay has 3 stacks, whereas RG1 has only 2. This discrepancy could make a significant difference. Thus, key experiments should be repeated using comparable sequence pairs.

      We fully agree with the reviewer that maintaining sequence consistency across different assays is essential for a rigorous comparison of nsp13 activities. We apologize for the ambiguity in the initial presentation of our sequences in Table S1.

      Planned revisions and experiments:

      (1) We wish to clarify that several key substrates were sequence-matched. For unwinding assays, the 12-bp 3′-overhang DNA and blunt-ended DNA share the identical duplex sequence, and the 16-bp 5′-overhang and 3′-overhang DNA substrates are also sequence-matched. For annealing assays, the duplex regions for all DNA substrates (3′, 5′, blunt, and fork) are identical, and the same internal consistency was maintained for all RNA annealing substrates. To make this clear, we will reorganize Table S1 to explicitly group these sequence-paired substrates.

      (2) The reviewer also notes discrepancies between binding and unwinding substrates (e.g., the difference in G4 stacks). To ensure direct comparability, we will perform additional experiments: complete binding assays for RG-1 (the 2-stack G4 used in unwinding) to match the functional data, and systematically measure binding affinities for all key unwinding substrates, including 3′-overhang, 5′-overhang, blunt-ended DNA, and the RNA fork.

      (5) Moreover, in the initial characterization of the binding abilities (Figure 1), the authors should include blunt-ended controls (duplex/hairpin) and, importantly, a pseudoknot (PK), as these structures are crucial for multiple steps in the viral life cycle (frameshifting, replication). Specifically, the PK in the 3'UTR (Sola et al., 2011, RNA Biology) may be an interesting target structure for unwinding assays, as it recruits the RTC, and, to my knowledge, no studies are available regarding nsp13 function at a PK. This would be particularly interesting in combination with nsp7/8 (Ohyama et al., 2024, JACS Au).

      We thank the reviewer for this insightful and inspiring suggestion. Incorporating pseudoknot (PK) structures into our analysis—particularly the well-characterized PK in the 3'UTR (Sola et al., 2011)—represents a significant opportunity to bridge our biochemical findings with the viral life cycle. To address this, we have designed a 3'UTR PK substrate based on recently reported scaffolds (Ohyama et al., 2024).

      Planned experiments:

      (1) We will expand our initial binding assays (Figure 1) to include blunt-ended duplexes, hairpins, and the 3'UTR PK. This will establish a baseline for how Nsp13 recognizes these structurally distinct and physiologically critical templates.

      (2) We will perform unwinding assays to determine whether Nsp13, in its isolated state, possesses the mechanical capability to resolve the complex tertiary interactions within a pseudoknot.

      (3) Following the reviewer's insight, we will examine whether the addition of nsp7/8 is required to facilitate the unfolding of the 3'UTR PK.

      Together, these experiments will allow us to assess whether Nsp13 is capable of managing one of the most challenging structural obstacles in the SARS-CoV-2 genome.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to broaden the understanding of SARS-CoV2 Nsp13 activity to show that a single viral protein can accomplish multiple functions. Additionally, they try to show that helicase function is not limited to ATP-driven, unidirectional unwinding.

      Strengths: The consistent application of statistics to triplicate experiments is a strength of the manuscript. The ToPif1 control in Figure S12 is a good control.

      We thank the reviewer for the insightful assessment and for highlighting the rigor of our experimental design, particularly our reliance on triplicate data with robust statistical validation and the inclusion of the ToPif1 control.

      We are especially grateful for the detailed comments provided by the reviewer. We fully recognize that addressing these specific points is essential for strengthening the cogency of our conclusions and improving the overall rigor of the manuscript. These suggestions have provided us with a clear roadmap for further refining our experimental evidence and clarifying our mechanistic interpretations. Below, we respond point-by-point to the specific issues.

      Weaknesses:

      (1) All the experiments except the one in Figure S2 use N-terminally His-tagged Nsp13. Because the N-terminal tag is known to have large effects on Nsp13 activity, this calls into question virtually all of the results in this manuscript.

      We thank the reviewer for raising this important concern regarding the potential influence of the N-terminal His tag on nsp13 activity. We have carefully considered this issue and provide the following lines of evidence to address it.

      (1) We have generated a tag-free nsp13 variant and our preliminary characterization (Author response image 1) shows that it retains all key activities: ATP hydrolysis (comparable to His-tagged nsp13), both ATP-independent (Mg<sup>2+</sup>-activated) and ATP-dependent unwinding, as well as chaperone activity to remodel stem-loops. These results demonstrate that while the His tag may modulate enzymatic efficiency, it does not create or abolish any specific biochemical function.

      (2) We conducted a systematic survey of 27 published studies on SARS-CoV/SARS-CoV-2 nsp13 (Author response table 1). The results show that 17 out of 27 studies (63%) used affinity-tagged nsp13 without tag removal, including His, MBP, GST, and Strep tags.

      (3) The only study that systematically compared different affinity tags (Adedeji et al., 2012) reported that GST-tagged nsp13 exhibited ~520-fold higher ATPase activity than His-tagged nsp13, demonstrating that the choice of affinity tag can affect enzymatic efficiency. However, both tagged versions retained all core enzymatic activities, including ATP hydrolysis and duplex unwinding. Importantly, no study has compared the full functional spectrum between His-tagged and tag-free nsp13. Our preliminary data suggest that the His tag may affect efficiency but does not alter the presence or absence of any specific activity.

      Planned experiments:

      We fully agree with the reviewer that a more systematic comparison would strengthen the conclusions. In the revision, we will include additional characterization of tag-free nsp13: (i) quantitative nucleic acid binding affinity, (ii) G4 unfolding efficiency, (iii) strand annealing activity. These experiments are currently underway.

      In summary, while we acknowledge that the His tag may influence enzymatic efficiency, our key conclusions are supported by experiments with tag-free nsp13. We will add a discussion of these points and include additional tag-free nsp13 data in the revised manuscript.

      (2) The ATP-independent, bidirectional duplex unwinding shown for short duplex substrates is reminiscent of the trapping of thermal fraying intermediates that have been reported for other helicases. Because they are only observed on short duplexes, do not require ATP, and are bidirectional, this does not suggest strand displacement as suggested in the manuscript. Instead, it suggests trapping of partially melted intermediates.

      We thank the reviewer for this insightful perspective. While the passive trapping of thermal fraying intermediates is a well-established model for non-catalytic protein-nucleic acid interactions, several lines of evidence suggest that nsp13 employs a more active, allosteric mechanism for ATP-independent remodeling.

      (1) If nsp13 were merely a passive trap, increasing duplex stability should decrease unwinding. However, as shown in Figure S3, raising Mg<sup>2+</sup> from 0 to 5 mM increases the DNA duplex Tm by ~10°C, yet nsp13’s remodeling activity is markedly enhanced under the same conditions (Figure 2). This positive correlation between cation-induced substrate stabilization and protein activation supports an active, protein-centered mechanism that overcomes the increased energetic barrier.

      (2) The observed bidirectionality in ATP-independent remodeling does not simply imply a lack of polarity; rather, it can reflect nsp13’s intrinsic chaperone function. In the absence of ATP, nsp13 binds the ss/ds junction (Figure 2F) and, in a Mg<sup>2+</sup>-dependent manner, may use its binding energy to actively intercalate into the duplex. This mechanism is inherently symmetric for 3′ and 5′ overhangs, explaining bidirectional remodeling, while the absence of activity on blunt-ended substrates confirms the requirement for a pre-existing junction.

      (3) The lack of activity on 24-bp substrates does not negate this remodeling mode but defines its energetic boundary. The binding energy released upon nsp13-nucleic acid interaction is sufficient to overcome the lower unwinding barrier of 12-16 bp duplexes, but insufficient to counteract the high stability and rapid re-annealing of a 24-bp duplex without the continuous mechanical power of ATP hydrolysis.

      Planned Revision:

      We thank the reviewer for prompting us to refine our mechanistic model. In the revision, we will add a dedicated discussion explicitly comparing the model of allosterically activated, binding-driven strand intrusion with the passive trapping model, incorporating the Tm data to strengthen our conclusions.

      (3) Results that may be artifacts of unusual in vitro conditions are interpreted as if similar results will occur in the cell, where ATP is likely always present. Along those same lines, SARS-CoV-2 replicates in compartments of the endoplasmic reticulum, which would limit the ability of Nsp13 to access DNA substrates.

      We thank the reviewer for raising this important concern regarding the physiological relevance. We fully agree that in vitro conditions do not entirely recapitulate the complex intracellular environment, and we have been careful not to over-interpret our findings. Below we address the two specific issues raised:

      (1) Regarding the ATP-independent activity, we acknowledge that ATP is abundant in healthy, actively replicating cells. However, during rapid viral replication, local ATP concentrations can fluctuate due to the high energy demand of the RTC as the template contains extensive secondary structures, which may lead to transient ATP depletion. Under such energy-limited conditions, Yu et al. (2025) demonstrated that ADP-bound nsp13 exhibits chaperone activity that destabilizes nucleic acid structures without ATP hydrolysis, and Dumm et al. (2025) reported that SARS-CoV-2 nsp13 resolves RNA stem-loops in an ATP-independent manner.

      Even when ATP is abundant, the ATP-independent mode may enable rapid, local structural adjustments that bypass the kinetic delay of ATP binding and hydrolysis. As shown in Figure 1D, nsp13 exhibits high binding affinity for structured nucleic acids. In this scenario, nsp13 functions not as a processive motor but through a binding-driven mechanism, using the free energy of protein-nucleic acid interaction to transiently destabilize short duplexes or resolve local secondary structures such as G4s and stem-loops in an energy-efficient manner.

      (2) Regarding DNA substrates, we fully agree that RNA is the physiological substrate for nsp13. However, DNA is a validated and widely accepted surrogate for mechanistic studies because DNA is more stable and easier to manipulate than RNA to yield the mechanistic insights. A systematic survey of 27 published nsp13 studies (Author response table 1) shows that 20 out of 27 (74%) used DNA substrates for at least some of their experiments. In our study, we used DNA primarily as a mechanistic probe and a stable control, and we validated all key conclusions on physiological RNA substrates, as shown in Figures 4, 5, 6, S7, S8, S10, S11 and S12.

      Planned revisions: To address the reviewer’s concerns more directly, we will revise the manuscript to include a discussion paragraph explicitly stating that the ATP-independent activity was observed under optimized in vitro conditions and may represent a latent remodeling capability that could be relevant under energy-limited conditions such as local ATP depletion during rapid replication. We will also clarify that DNA substrates were used as mechanistic probes and controls, and that all key findings were validated on physiological RNA substrates. We thank the reviewer for prompting us to strengthen the discussion of these important points.

      (4) There is no evidence to support the conclusion that "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms." 3'-5' duplex melting is limited to short duplexes and is ATP-independent, suggesting it may be due to trapping of thermal fraying intermediates by the ssDNA binding Nsp13. The ATP-dependent and ATP-independent melting on the substrates with the 3'-overhang are the same, suggesting that ATP-dependent melting does not occur on this substrate, which would indicate that bidirectional ATP-dependent translocation does not occur.

      We are grateful to the reviewer for this critical evaluation of our mechanistic claims. We agree that our initial statement regarding bidirectional ATP-dependent remodeling was imprecise and not fully supported by the data. As the reviewer correctly notes, the similar unwinding efficiency on 3′-overhang substrates regardless of ATP presence indicates that ATP hydrolysis does not drive 3′→5′ translocation, which is consistent with nsp13’s known 5′→3′ motor polarity. The observed 3′→5′ activity is therefore more accurately described as an ATP-independent remodeling event, not ATP-dependent unwinding.

      We will revise the Discussion and relevant Results sections to clarify the nature of this bidirectional activity. Specifically, the sentence:

      "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms..."will be corrected to: "Duplex DNA supports bidirectional remodeling via ATP-independent mechanisms."

      We will also explicitly state that while nsp13 requires ATP for long-range, processive 5'→3' helicase activity, its remodeling/chaperone function is inherently bidirectional and powered by the free energy of binding to the ss/ds junction, rather than by ATP-driven mechanical work.

      (5)-The description of ATP-independent unwinding as having "limited processivity," is likely not accurate. These experiments were multiturnover reactions with very high Nsp13 concentrations and no protein trap to ensure single turnover conditions. Because the reactions were multi-turnover, no information about the processivity of Nsp13 can be obtained. On the contrary, it seems likely that the product formed over the 30-minute reaction with a vast excess of Nsp13 is due to binding and dissociation of multiple Nsp13 molecules instead of processive translocation by a single enzyme.

      We thank the reviewer for this important correction. We fully agree that our use of the term "processivity" was technically imprecise. Processivity strictly defines the distance a single enzyme translocates during one binding event, which our multi-turnover assays (with high nsp13 concentrations and no protein trap) were not designed to measure. Our results specifically demonstrate that the ATP-independent remodeling mode is highly sensitive to duplex length, with efficiency declining sharply as the duplex lengthens. To reflect the experimental data more faithfully, we have replaced "processivity" with more accurate descriptors throughout the manuscript.

      Planned revisions:

      (1) Original: "The ATP-independent unwinding mode, however, has limited processivity." Revised: "The ATP-independent unwinding mode, however, exhibits a steep decline in efficiency as the duplex length increases."

      (2) Original: "...an ATP-independent, cation-activated mode with limited processivity." Revised: "...an ATP-independent, cation-activated mode specialized for localized structural remodeling"

      (3) Original: "...primes Nsp13 for basal strand remodeling but supports only limited processivity." Revised: "...primes Nsp13 for basal strand remodeling but is insufficient for the sustained unwinding of extended duplexes."

      (4) Original: "...primes Nsp13 for low-processivity strand displacement." Revised: "...primes Nsp13 for short-range strand displacement rather than long-range processive unwinding."

      We believe these changes clarify that the ATP-independent mode acts as a molecular chaperone for local obstacles (like G4 or short stems) rather than a motor for long-range translocation. We thank the reviewer for helping us improve the precision of our description.

      (6) G4s are much more stable at cellular K+ concentrations than they are at 20 mM K+. As such, Nsp13's ability to unfold a G4 in the absence of ATP may be diminished or eliminated at a physiological K+ concentration.

      We thank the reviewer for this critical point regarding physiological ion concentrations. We agree that K<sup>+</sup> significantly stabilizes G4 structures, which may raise the energy barrier for ATP-independent remodeling.

      Planned experiments:

      To address this, we will perform salt titration assays (up to 150 mM KCl) to evaluate the robustness of nsp13’s G4 unfolding activity under more physiological ionic conditions. We will also measure the melting temperature of our G4 substrates across this K<sup>+</sup> range to correlate structural stability with enzymatic efficiency.

      Author response image 1.

      Preliminary characterization of tag-free Nsp13 enzymatic activities. (A) Comparison of ATPase activity between His-tagged and tag-free Nsp13 in the presence of ssRNA or RNA G4. (B) Raw fluorescence data from stopped-flow FRET analysis of ATP-dependent unwinding (16-bp fork DNA, 2 mM Mg<sup>2+</sup>, 2 mM ATP). F/F<sub>0</sub> represents FAM fluorescence normalized to initial DNA intensity. (C) ATP-independent DNA duplex remodeling (data reproduced from Figure S2). (D) Chaperone activity of tag-free Nsp13 on DNA and RNA stem-loops.

      Author response table 1.

      Summary of affinity tags, monovalent salt concentrations, and substrate types used in 27 published SARS-CoV/SARS-CoV-2 nsp13 studies

      References:

      (1) Ivanov KA, Thiel V, Dobbe JC, van der Meer Y, Snijder EJ, Ziebuhr J. Multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase. J Virol. 2004 Jun;78(11):5619-32.

      (2) Lee NR, Kwon HM, Park K, Oh S, Jeong YJ, Kim DE. Cooperative translocation enhances the unwinding of duplex DNA by SARS coronavirus helicase nsP13. Nucleic Acids Res. 2010 Nov;38(21):7626-36.

      (3) Adedeji AO, Marchand B, Te Velthuis AJ, Snijder EJ, Weiss S, Eoff RL, Singh K, Sarafianos SG. Mechanism of nucleic acid unwinding by SARS-CoV helicase. PLoS One. 2012;7(5):e36521. doi: 10.1371/journal.pone.0036521.

      (4) Adedeji AO, Lazarus H. Biochemical Characterization of Middle East Respiratory Syndrome Coronavirus Helicase. mSphere. 2016 Sep 7;1(5):e00235-16.

      (5) Jia Z, Yan L, Ren Z, Wu L, Wang J, Guo J, Zheng L, Ming Z, Zhang L, Lou Z, Rao Z. Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis. Nucleic Acids Res. 2019 Jul 9;47(12):6538-6550.

      (4) Jang KJ, Jeong S, Kang DY, Sp N, Yang YM, Kim DE. A high ATP concentration enhances the cooperative translocation of the SARS coronavirus helicase nsP13 in the unwinding of duplex RNA. Sci Rep. 2020 Mar 11;10(1):4481.

      (5) Shu T, Huang M, Wu D, Ren Y, Zhang X, Han Y, Mu J, Wang R, Qiu Y, Zhang DY, Zhou X. SARS-Coronavirus-2 Nsp13 Possesses NTPase and RNA Helicase Activities That Can Be Inhibited by Bismuth Salts. Virol Sin. 2020 Jun;35(3):321-329.

      (6) Mickolajczyk KJ, Shelton PMM, Grasso M, Cao X, Warrington SE, Aher A, Liu S, Kapoor TM. Force-dependent stimulation of RNA unwinding by SARS-CoV-2 nsp13 helicase. Biophys J. 2021 Mar 16;120(6):1020-1030.

      (7) Chen J, Wang Q, Malone B, Llewellyn E, Pechersky Y, Maruthi K, Eng ET, Perry JK, Campbell EA, Shaw DE, Darst SA. Ensemble cryo-EM reveals conformational states of the nsp13 helicase in the SARS-CoV-2 helicase replication-transcription complex. Nat Struct Mol Biol. 2022 Mar;29(3):250-260.

      (8) Yazdi AK, Pakarian P, Perveen S, Hajian T, Santhakumar V, Bolotokova A, Li F, Vedadi M. Kinetic Characterization of SARS-CoV-2 nsp13 ATPase Activity and Discovery of Small-Molecule Inhibitors. ACS Infect Dis. 2022 Aug 12;8(8):1533-1542.

      (9) Corona A, Wycisk K, Talarico C, Manelfi C, Milia J, Cannalire R, Esposito F, Gribbon P, Zaliani A, Iaconis D, Beccari AR, Summa V, Nowotny M, Tramontano E. Natural Compounds Inhibit SARS-CoV-2 nsp13 Unwinding and ATPase Enzyme Activities. ACS Pharmacol Transl Sci. 2022 Apr 1;5(4):226-239.

      (10) Lu L, Peng Y, Yao H, Wang Y, Li J, Yang Y, Lin Z. Punicalagin as an allosteric NSP13 helicase inhibitor potently suppresses SARS-CoV-2 replication in vitro. Antiviral Res. 2022 Oct;206:105389.

      (11) Yue K, Yao B, Shi Y, Yang Y, Qian Z, Ci Y, Shi L. The stalk domain of SARS-CoV-2 NSP13 is essential for its helicase activity. Biochem Biophys Res Commun. 2022 Apr 23;601:129-136.

      (12) Grimes SL, Choi YJ, Banerjee A, Small G, Anderson-Daniels J, Gribble J, Pruijssers AJ, Agostini ML, Abu-Shmais A, Lu X, Darst SA, Campbell E, Denison MR. A mutation in the coronavirus nsp13-helicase impairs enzymatic activity and confers partial remdesivir resistance. mBio. 2023 Aug 31;14(4):e0106023.

      (13) Yu J, Im H, Lee G. Unwinding mechanism of SARS-CoV helicase (nsp13) in the presence of Ca2+, elucidated by biochemical and single-molecular studies. Biochem Biophys Res Commun. 2023 Aug 6;668:35-41.

      (14) Sommers JA, Loftus LN, Jones MP 3rd, Lee RA, Haren CE, Dumm AJ, Brosh RM Jr. Biochemical analysis of SARS-CoV-2 Nsp13 helicase implicated in COVID-19 and factors that regulate its catalytic functions. J Biol Chem. 2023 Mar;299(3):102980.

      (15) Maio N, Raza MK, Li Y, Zhang DL, Bollinger JM Jr, Krebs C, Rouault TA. An iron-sulfur cluster in the zinc-binding domain of the SARS-CoV-2 helicase modulates its RNA-binding and -unwinding activities. Proc Natl Acad Sci U S A. 2023 Aug 15;120(33):e2303860120.

      (16) Marx SK, Mickolajczyk KJ, Craig JM, Thomas CA, Pfeffer AM, Abell SJ, Carrasco JD, Franzi MC, Huang JR, Kim HC, Brinkerhoff H, Kapoor TM, Gundlach JH, Laszlo AH. Observing inhibition of the SARS-CoV-2 helicase at single-nucleotide resolution. Nucleic Acids Res. 2023 Sep 22;51(17):9266-9278.

      (17) Inniss NL, Rzhetskaya M, Ling-Hu T, Lorenzo-Redondo R, Bachta KE, Satchell KJF, Hultquist JF. Activity and inhibition of the SARS-CoV-2 Omicron nsp13 R392C variant using RNA duplex unwinding assays. SLAS Discov. 2024 Apr;29(3):100145.

      (18) Sales AH, Fu I, Durandin A, Ciervo S, Lupoli TJ, Shafirovich V, Broyde S, Geacintov NE. Variable Inhibition of DNA Unwinding Rates Catalyzed by the SARS-CoV-2 Helicase Nsp13 by Structurally Distinct Single DNA Lesions. Int J Mol Sci. 2024 Jul 19;25(14):7930.

      (19) Soper N, Yardumian I, Chen E, Yang C, Ciervo S, Oom AL, Desvignes L, Mulligan MJ, Zhang Y, Lupoli TJ. A Repurposed Drug Interferes with Nucleic Acid to Inhibit the Dual Activities of Coronavirus Nsp13. ACS Chem Biol. 2024 Jul 19;19(7):1593-1603.

      (20) Hao W, Hu X, Chen Q, Qin B, Tian Z, Li Z, Hou P, Zhao R, Balci H, Cui S, Diao J. Duplex Unwinding Mechanism of Coronavirus MERS-CoV nsp13 Helicase. Chem Biomed Imaging. 2024 Dec 19;3(2):111-122.

      (21) Park J, Jeong YJ, Chauhan K, Koh HR, Kim DE. ATPase-dependent duplex nucleic acid unwinding by SARS-CoV-2 nsP13 relies on facile binding and translocation along single-stranded nucleic acid. J Biol Chem. 2025 Jul;301(7):110373.

      (24) Yu J, Im H, Cho H, Jeon Y, Lee JB, Lee G. A novel ADP-directed chaperone function facilitates the ATP-driven motor activity of SARS-CoV helicase. Nucleic Acids Res. 2025 Jan 24;53(3):gkaf034.

      (25) Dumm AJ, Zheng AY, Butler TJ, Kulikowicz T, George JC, Bombard PT, Sommers JA, Ding J, Brosh RM Jr. SARS-CoV-2 point mutations are over-represented in terminal loops of RNA stem-loop structures that can be resolved by Nsp13 helicase in a unique manner with respect to nucleotide dependence. Nucleic Acids Res. 2025 May 22;53(10):gkaf447.

      (26) Castro JM, Slack RL, Ong YT, Zhang H, Gifford LB, Courouble VV, Aiken RM, Shankar V, O'Leary TR, Griffin PR, Lan S, Du Y, Fu H, Sarafianos SG. Stalling the Enemy: Targeting Nsp13 for Next-Generation SARS-CoV-2 Antivirals. Int J Mol Sci. 2026 Mar 11;27(6):2587.

      (27) Mingroni MA, Enney BM, Malsick LE, Geiss BJ. Motif V is an allosteric couple between the SARS-CoV-2 nsp13 nucleotide triphosphatase and helicase active sites. J Biol Chem. 2026 Mar;302(3):111198.

    1. Author response:

      eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

      On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.

      Public Reviews:

      Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

      We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing[35].”

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:

      Results, lines 142 ff.:

      “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings[39].”

      Methods, lines 463-465:

      “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”

      Supplementary Tables:

      Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”

      Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”

      For clarity, in Author response image 1 we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column Raw count of parasite developmental category per image and experiment)

      Author response image 1.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:

      Results, lines 211 ff.:

      “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”

      Methods, lines 520 ff.:

      “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”

      Figure 4 legend, lines 830 ff.:

      “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”

      We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      We thank the reviewer for pointing this out; we have now replaced “schistosomes” with “Schistosoma mansoni” (current line 131)

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

      This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing “EdU pulse-chase” with “EdU pulse” experiments in lines 37, 204, and 321.

      Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

      Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both, the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends.

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a].

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located16kd downstream of GBA1, which shares 9698% sequence similarity with GBA1) (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”.

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi: 10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c].

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes / experiments - Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C-H. Kwon, B. Kaur, M. Frederick, S. Thornton, L. Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 1-7. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead.

      We agree that assessment of off-target expression and potential cytotoxicity for AAV is important, this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c].

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in methods as described below:

      For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cutoffs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group). ns, not significant.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH<sup>+</sup> neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus).

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013). https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipids analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH<sup>+</sup> neurons, GFAP<sup>+</sup> glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain).

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

      Recommendations for the authors:

      Reviewing Editor Comments:

      Your paper has been reviewed by three expert reviewers in the GBA field. Although they appreciate the work and its novelty, they raise several concerns. We suggest that you to address these concerns in the next version.

      Reviewer #1 (Recommendations for the authors):

      Statistical and presentation issues

      (1) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (1)- b].

      (2) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated methods to describe the Statistical analysis details [see response to Reviewer 1 Weaknesses (5)-b].

      (3) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (5)- c].

      (4) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      Our RNA-seq data were generated from a single batch of MLOs, with mapping rates exceeding 90%. The GEO accession will be made publicly available upon publication.

      Reviewer #2 (Recommendations for the authors):

      Please consider the following suggestions for revisions:

      (1) Line 86: A bit more explanation/justification for the focus on midbrain-like organoids would be helpful, including introducing the nature of the midbrain pathology to better put some of the MLO findings in context. Is the nGD pathology for the midbrain significantly different / out of proportion to other affected brain regions?

      nGD Patients often display impaired vertical gaze and movement disorders. These symptoms correlate with midbrain involvement due to the sensitivity of this region to neuroinflammatory and degenerative processes (Ref #7, #8). Both human and mouse studies indicate that the midbrain exhibits prominent substrate accumulation compared to other brain regions, suggesting a predisposition for greater pathological involvement in GD midbrain (Ref #8, #9, #10, #11). This rationale was added to Line 86 in the revision.

      References:

      (7) Goker-Alpan O, Ivanova MM. Neuronopathic Gaucher disease: Rare in the West, common in the East. J Inherit Metab Dis.(2024) 47(5):917-934. PMID: 38768609.

      (8) Burrow TA, Sun Y, Prada CE, Bailey L, Zhang W, Brewer A, Wu SW, Setchell KDR, Witte D, Cohen MB, Grabowski GA. CNS, lung, and lymph node involvement in Gaucher disease type 3 after 11 years of therapy: clinical, histopathologic, and biochemical findings. Mol Genet Metab. (2015) 114(2):233-241. PMID: 25219293.

      (9) Tamar Farfel-Becker, Einat B. Vitner, Samuel L. Kelly, Jessica R. Bame, Jingjing Duan, Vera Shinder, Alfred H. Merrill, Kostantin Dobrenis, Anthony H. Futerman. Neuronal accumulation of glucosylceramide in a mouse model of neuronopathic Gaucher disease leads to neurodegeneration, Human Molecular Genetics, (2014). Volume 23, Issue 4, Pages 843–854.

      (10) E. Ellen Jones, Wujuan Zhang, Xueheng Zhao, Cristine Quiason , Stephanie Dale, Sheerin Shahidi-Latham, Gregory A. Grabowski, Kenneth D. R. Setchell, Richard R. Drake, and Ying Sun. High-Resolution MALDI Imaging Mass Spectrometry. SLAS Discovery (2017). Vol. 22(10) 1218–1228

      (11) Xu YH, Xu K, Sun Y, Liou B, Quinn B, Li RH, Xue L, Zhang W, Setchell KD, Witte D, Grabowski GA. Multiple pathogenic proteins implicated in neuronopathic Gaucher disease mice. Hum Mol Genet. (2014) 23(15):3943-57. PMID: 24599400.

      (2) Lines 359-360: Please specify the carbon-chain length of the sphingoid base of the GluCer species analyzed. Also, is there a citation for the statement that 18:0 and 16:0 are "brain-enriched species"?

      The carbon-chain length analyzed ranges from 14:0 to 24:0. The sphingoid base for all GluCer species analyzed is d18:1. For example, the species referred to as GluCer 18:0 corresponds to GluCer(d18:1/18:0). Although both, 16:0 and 18:0 are enriched in the brain, 18:0 is the most abundant species in the brain (Ref #12, #13). We revised "brain-enriched species” to “brain-predominant species (18:0)”.

      References:

      (12) Nilsson, O., and Svennerholm, L. Accumulation of Glucosylceramide and Glucosylsphingosine (Psychosine) in Cerebrum and Cerebellum in Infantile and Juvenile Gaucher Disease. Journal of Neurochemistry (1982) 39, 709–718.

      (13) Sun, Y., Zhang, W., Xu, Y.H., Quinn, B., Dasgupta, N., Liou, B., Setchell, K.D., and Grabowski, G.A. Substrate compositional variation with tissue/region and Gba1 mutations in mouse models--implications for Gaucher disease. PLoS One (2013). 8, e57560.10.1371/journal.pone.0057560.

      (3) Figure 2: It would be interesting to compare the MLO findings to prior gene expression data. Are there previously published transcriptome analyses from nGD brain tissue (or other tissues) that the transcriptome data obtained from MLOs may be compared with? What about transcriptome analyses of mouse GD models?

      We thank the reviewer for this valuable suggestion. To strengthen the biological context of our transcriptomic findings, we have added a new comparative table (new Supplementary Table 3) in the revised manuscript that summarizes key dysregulated pathways in our human nGD MLOs alongside previously published data from nGD mouse midbrain (Ref#14). The table highlights substantial overlap, including axon guidance, neuron differentiation, dopaminergic/glutamatergic/GABAergic synaptic signaling, lipid metabolism, apoptosis/cell death, and nervous system development, emphasizing the translational relevance of our model. We also note that our dataset uniquely reveals pronounced dysregulation of WNT signaling and anterior-posterior patterning (Fig. 2L and 2M), potentially reflecting human-specific early midbrain defects.

      We added the following sentence to Discussion: “Comparative analysis with prior transcriptomic data from nGD mouse midbrain showed consistent dysregulation in axon guidance, synaptic signaling, lipid metabolism, and nervous system development (new Supplementary Table 3), supporting the fidelity of our human MLO model.”

      Reference:

      (14) Dasgupta N, Xu YH, Li R, Peng Y, Pandey MK, Tinch SL, Liou B, Inskeep V, Zhang W, Setchell KD, Keddache M, Grabowski GA, Sun Y. Neuronopathic Gaucher disease: dysregulated mRNAs and miRNAs in brain pathogenesis and effects of pharmacologic chaperone treatment in a mouse model. Hum Mol Genet. (2015) 24(24):7031-48. PMID: 26420838.

      (4) Lines 402-405 & Figure 3D: Is it possible to include a merged image to better visualize the TH and FOXA2 co-staining / potential colocalization?

      The merged images of TH (red) and FOXA2 (green) are shown in Fig. 3E. Yellow arrows indicate TH and FOXA2 co-stained cells, which appear yellow in the merged images. The results demonstrate that the number of co-stained cells is reduced in GD2-1260 MLOs compared with WT-75.1 MLOs at both, week 6 and week 8.

      (5) Lines 447-448 & Figure 4F, G, J: It would be helpful to provide a direct analysis/visualization of MLO size between the WT-75.1, GD2-1260, and iso-GD2-1260 genotypes (allowing direct comparison of WT and iso). Similarly, the same 3-way analysis would be valuable for assessing dopamine levels.

      We have included WT-75.1 in Fig. 4 F/G/J in the revision. All three genotypes, WT-75.1, GD2-1260, and iso-GD2-1260, are presented for analysis compared to WT-75.1. In new Figure 4F, MLO growth is presented by representative MLO images taken under wide field microscopy at day 2, Wk4 and Wk8 of differentiation. In new Fig. 4G, MLOs size was analyzed by NIS elements and presented as the area (µm<sup>2</sup>) of MLO in image (mean ± SEM). N≥10 MLOs were analyzed for each genotype. In new Fig. 4J. Dopamine levels in MLO culture medium from WT-75.1, GD2-1260 and iso- GD2-1260 MLOs at Wk12 cultured in 3 mL BGM medium for 72 hours were analyzed. Data are presented as mean ± SEM (n = 5 per group). Statistical analysis applied was described in the legend.

      (6) Figure 4: What is the explanation/interpretation of the residual autophagy pathway dysfunction in CRISPR-corrected MLOs? nGD requires near-complete loss of GCase activity, so it is a bit curious that autophagic dysfunction would be observed with only ~50% GCase reduction? There is some discussion, but it doesn't fully capture the unexpected nature and implications of this result.

      This phenomenon may be explained by a threshold effect in lysosomal function. Gaucher disease is an autosomal recessive disorder. The carriers with heterozygous GBA1 mutation, who retain approximately 50% of normal GCase activity, do not develop disease. This suggests that even partial restoration of GCase activity can reduce glucosylceramide accumulation below a pathological threshold, thereby restoring lysosomal integrity and autophagic flux. In addition, improved GCase activity may help normalize the lipid composition of lysosomal membranes, facilitating the fusion events required for effective autophagy.

      (7) Lines 512-516 & Figure 5J: The data shown are inconclusive. Can these Western blot data be quantified, noting the number of replicates for each measurement? Without quantification and statistics, it is difficult to assess the claim that levels of LAMP1, LC3-I, LC3-II, 4E-BP1, and p-4E-BP1 in GD2-1260 treated with SapC-DOPS-fGCase are more similar to GD2-1260 treated following SapC-DOPS than to WT-75.1.

      We performed quantitative analysis by comparing WT-75.1 and included the data in new Fig. 5J. The result was revised as:

      Analysis of protein levels showed that decreased LAMP1 expression in GD2 1260 MLOs was not altered following SapC DOPS fGCase treatment (Figure 5J). The elevated LC3-II levels, an indicator of impaired autophagic flux, were reduced upon treatment, suggesting enhanced autophagic activity (Figure 5J). Moreover, phosphorylated 4E-BP1 (Thr37/46), but not total 4E-BP1, was improved in SapC-DOPS-fGCase–treated MLOs, reflecting a decrease in mTOR hyperactivation (Figure 5J). We anticipate that a longer duration of SapC-DOPS-fGCase exposure in nGD MLOs may produce a more robust therapeutic effect in rescuing nGD-associated phenotypes, which will be evaluated in future studies.

      (8) Lines 518-520: The presented data support "effective restoration of GCase activity," but clarification is needed regarding "correction of GD-related disease phenotypes." Perhaps "selected molecular and biochemical phenotypes" would be more accurate. Data are not shown for several other phenotypes, including TH, FOXA2, and dopamine levels.

      This was revised to “selected molecular and biochemical phenotypes “.

      (9) Figure 5D-J: Please clarify whether all experiments were conducted 48 hours after treatment, as indicated for Figure 5C. If so, does this suggest that SapC-DOPS treatment exhibits only short-term effects? Were any data collected to evaluate the persistence of the treatment effect?

      The treatment duration is specified in the Fig. 5 legend. Fig. 5D–J represent experiments conducted after two weeks of treatment, whereas Fig. 5C reflects a 48-hour treatment. In both Gaucher disease lines, two-week treatment restored GCase activity to wild-type levels and reduced GluSph substrate accumulation. These findings were intended as proof-of-principle to demonstrate therapeutic feasibility; evaluation of treatment persistence beyond two weeks was beyond the scope of this study.

      Minor suggestions

      (1) Line 80: "A brain organoid derived from hiPSCs of a healthy individual with GBA1 knockout and α-synuclein overexpression exhibited some PD features23." I would suggest enumerating what "PD features" are to distinguish from "clinical features", which I don't think is the intended meaning.

      This was revised as “exhibited characteristic PD markers”.

      (2) Figure 2I: The reported number of downregulated DEGs is incorrect. It should be 765, not 1429.

      This was corrected in Figure 2I.

      (3) Line 359: change "enrich" to "enriched".

      This word was corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This foundational study builds on prior work from this group to reveal the complexities underlying ligand-dependent RXRγ-Nur77 heterodimer formation, offering a compelling re-evaluation of their earlier conclusions. The authors examine how a library of RXR ligands influences the biophysical, structural, and functional properties of Nur77. They find that although the Nur77-RXRγ heterodimer shares notable functional similarities with the Nurr1-RXRα complex, it also exhibits unique features, notably, both dimer dissociation and classical agonist-driven activities. This work advances our understanding of the nuanced behaviors of nuclear receptor heterodimers, which have important implications for health and disease.

      Strengths:

      (1) Builds on previous work by providing a comprehensive analysis that examines whether Nur77-RXRγ heterodimer formation parallels that of the Nurr1-RXRα complex.

      (2) Systematic evaluation of a library of RXR ligands provides a broad survey of functional outputs.

      (3) Careful reanalysis of previous work sheds new light on how NR4A heterodimers function.

      We thank the reviewer for recognizing our work as foundational. In the nuclear receptor field, current understanding of ligand-regulated nuclear receptor activity is based largely on ligand-dependent coregulator recruitment preferences; for example, agonists enhance coactivator recruitment to activate transcription. Building on our recent study of Nurr1-RXRα, the present work suggests that activation of the evolutionarily related NR4A-RXR heterodimer Nur77-RXRγ by RXR ligands is also consistent with a non-classical activation mechanism involving heterodimer dissociation.

      Weaknesses:

      (1) Some conclusions appear overstated or are not well substantiated by the work presented. It's unclear how the data support a non-classical mode of agonism, for example, based on the data shown.

      We thank the reviewer for this important point. We did not intend to claim that Nur77-RXRγ activation is explained exclusively by a non-classical mode of agonism. Rather, our interpretation was that the data are consistent with two possible, non-mutually exclusive mechanisms: (1) a classical pharmacological mechanism involving ligand-dependent coregulator recruitment; and (2) a non-classical mechanism involving ligand-binding domain (LBD) heterodimer dissociation, as we previously described for Nurr1-RXRα. This differs from our prior eLife study of Nurr1-RXRα, in which the data supported the LBD heterodimer dissociation model but not the classical pharmacological model.

      In our revised manuscript, we clarify two points that are important for interpreting the Nur77-RXRγ data. First, several experimental limitations of the Nur77-RXRγ studies reduced the extent to which the mechanism could be resolved as rigorously as in our earlier Nurr1-RXRα study. Second, and more importantly, the currently available ligand set lacks Nur77-RXRγ-selective agonists. This limits our ability to determine whether LBD heterodimer dissociation is the sole or principal mechanism of activation, or instead one of several contributing mechanisms.

      Taken together, these results support LBD heterodimer dissociation as a plausible and experimentally observable component of Nur77-RXRγ activation and, therefore, as a candidate shared activation mechanism for NR4A-RXR heterodimers. At the same time, because the quantitative evidence is less definitive than in the Nurr1-RXRα system, we agree that conclusions regarding Nur77-RXRγ should be stated more cautiously. This caution is reflected in both the title of our manuscript (“Towards a unified mechanism…”) and the language used throughout the text.

      (2) Some assays have relatively few replicates, with only two in some cases.

      We thank the reviewer for their attention to experimental rigor. For some assays, the findings were reproduced in two independent experiments, which we considered sufficient to confirm the presence and reproducibility of the effects observed in those particular assay formats. In the original manuscript, we used a general statement in the figure legends (“representative of two or more independent experiments”) across all assay data. In the revised manuscript, we now specify the number of independent experimental replicates for each assay in the corresponding figure legends to improve transparency.

      Reviewer #2 (Public review):

      Summary:

      This study explores the mechanisms by which binding of the nuclear receptor RXRg regulates its heterodimeric partner Nur77. Previously, this group made the interesting discovery that ligand-dependent activation of RXRg bound to a related partner, Nurr1, does not occur through a classical pharmacological mechanism but through agonist-dependent dissociation of the complex through disruption of their ligand binding domain (LBD) interactions. Here, they revisit this paradigm with Nur77. In contrast to Nurr1, the authors do not have the reagents to clearly support a role for LBD dissociation. Following the model of partial ligand-dependent dissociation of the LBD heterodimer, the experimental data (NMR, ITC, SEC) are interesting and quite complex.

      Strengths:

      The authors do a rigorous job of describing the data and providing possible interpretations and caveats. Revisiting the analysis of Nurr1, they identify the crucial role that selective Nurr1-RXRg agonists played in supporting the LBD dissociation model; without analogous compounds for the Nur77-RXRg complex, it is difficult to invoke this mechanism. Interestingly, treatment with the Nurr1-RXRg selective agonist HX600 suggests it can induce some LBD dissociation. Therefore, there may be some similarities between the regulation of Nurr1 and Nur77 by RXRg.

      We thank the reviewer for this thoughtful and balanced summary of our work. We appreciate the reviewer’s recognition of both our prior findings in the Nurr1-RXRα system and the interesting, but more complex, experimental behavior observed here for Nur77-RXRγ. We agree that the absence of Nur77-RXRγ-selective agonists currently limits how definitively the contribution of LBD dissociation can be resolved, and we have revised the manuscript to make this point more explicit and to further temper our conclusions accordingly.

      Weaknesses:

      Despite evidence supporting a partial role for RXRg LBD dissociation as a mechanism to activate Nur77, other data demonstrate that a fundamentally different regulatory mechanism likely exists in the Nur77-RXRg complex that involves the RXRg disordered NTD. The decision to describe further study of this as outside the scope of this work is unfortunate, as it closed off an avenue that could have provided fruitful data informing the apparently distinct regulatory mechanisms of the Nur77-RXRg complex. Given the uncertainty in the importance of the partial roles of the pharmacological mechanism, LBD dissociation, and the RXRg NTD, this study may have limited impact on the field.

      We thank the reviewer for this thoughtful point. We agree that the RXRγ NTD likely contributes to regulation of Nur77-RXRγ transcription, and that our truncation data suggest that regions outside the LBD can influence transcriptional output. At present, however, the effect of RXRγ NTD truncation is not sufficiently mechanistically resolved to distinguish among several plausible explanations.

      For example, the RXRγ NTD has been implicated in phase separation and biomolecular condensate formation in cells (PubMed ID 40392852, 40420113, 33971237, 31881311), and perturbing these properties (via RXRγ NTD truncation) could indirectly affect Nur77-RXRγ transcriptional activity. In addition, NTDs of nuclear receptors can participate in coactivator or corepressor interactions (PubMed ID 24284822), raising the possibility that removal of the RXRγ NTD alters transcription by changing recruitment of regulatory factors rather than by directly informing the LBD-centered mechanism examined here. We will clarify in the revised manuscript that these possibilities remain unresolved and represent important directions for future study.

      We also agree that defining how multiple RXRγ domains contribute to Nur77-RXRγ regulation would be valuable for the field. However, the focus of the present study is narrower: to test whether, as in our previous eLife study of Nurr1-RXRα, RXR ligands can influence heterodimer function through effects on LBD-LBD interactions. Because the available data do not yet allow a mechanistic dissection of the RXRγ NTD contribution, we believe that a definitive analysis of this question would require a separate set of experiments beyond the scope of the present work. We have revised the manuscript to better acknowledge this limitation and to frame the conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, this is a compelling body of work. Additional summary statements and clearer transitions would be helpful throughout.

      Here are some points that should be addressed or at least discussed by the authors:

      (1) It is unclear in the luciferase assays whether the truncated proteins are functional or not. Were there Western blots or other assays run to confirm protein concentrations?

      We thank the reviewer for this point. We did not perform Western blotting or other assays to confirm equivalent expression levels of the truncated RXRγ constructs, and we agree that this is a limitation of the luciferase assay data. As a result, the transcriptional effects observed with the truncation constructs should be interpreted cautiously.

      With that said, the increased transcriptional activity observed upon deletion of the RXRγ NTD/AF-1 region suggests that this region may exert a repressive effect on Nur77-RXRγ transcription. This effect could reflect multiple, non-mutually exclusive mechanisms, including altered phase separation or condensate-related properties of RXRγ, or altered recruitment of transcriptional coregulators through the NTD. Because our truncation strategy does not distinguish among these possibilities, we do not believe these data allow a definitive mechanistic interpretation of the NTD contribution.

      We have revised the manuscript to clarify this limitation. We also note that the primary focus of the present study is the role of ligands in modulating Nur77-RXRγ function through LBD-mediated interactions, in direct comparison with our previous Nurr1-RXRα study. A more complete mechanistic dissection of how RXRγ domain architecture influences Nur77-RXRγ transcription will require future work.

      (2) Why does the Nur77 construct lacking the NTD show increased luciferase activity?

      Please see our response above to Reviewer 2’s Public Review, which also addresses this point.

      (3) A case is made for the Nur77 LBD driving the activity, but it also could be inferred that the DBD is driving based on the data shown in Figure 1.

      We thank the reviewer for this point. We agree that the Nur77 DBD is required for binding to NBRE response elements, and we did not intend to suggest otherwise. The experimental approach in Figure 1 was not designed to dissect the relative contributions of Nur77 domains, since Nur77 was tested only in its full-length form. Instead, the purpose of this experiment was to examine how truncation of RXRγ domains affects Nur77-RXRγ transcriptional activity, in direct comparison with our prior eLife study of Nurr1-RXRα, where RXRα domain truncations helped define the importance of RXR-LBD-mediated regulation. We will revise the text to clarify that Figure 1 does not distinguish whether Nur77 DBD-dependent DNA binding is necessary, but instead addresses whether the pattern of RXRγ domain dependence is consistent with an LBD-centered mechanism of ligand-regulated heterodimer function.

      (4) It is stated that the HX600 coactivator recruitment requires further study. Why wasn't it studied here?

      We thank the reviewer for this point. The primary focus of this study was to determine how RXR ligands influence Nur77-RXRγ heterodimer activity, particularly in relation to ligand-dependent effects on heterodimer function. A more detailed analysis of HX600-dependent coactivator recruitment would require a broader mechanistic investigation of RXRα and RXRγ homodimer pharmacology and RXR-specific coregulator interactions, which extends beyond the central scope of the present manuscript. We agree that this is an important question and view it as a valuable direction for future work.

      (5) Figure 3B, the shifts in monomer populations, error bars aren't shown, the biggest shift is from 0.2 to 0.6, is that statistically meaningful?

      We thank the reviewer for this point. The reviewer is correct that error bars were not shown for Figure 3B. These NMR measurements were performed once (n=1), and therefore the shifts in monomer populations shown in Figure 3B cannot be assessed statistically. Because these studies required substantial NMR instrument time and isotopically labeled protein at high concentration, we were not able to perform experimental replicates for this dataset. We have revised the figure legend to explicitly state that these data were collected from a single experiment and have tempered the corresponding language in the manuscript accordingly.

      (6) Some ligands are shown in the figures but don't appear to be discussed in the text (at least that I can find), such as SR11237.

      We thank the reviewer for pointing this out. We used a panel of 14 commercially available RXR ligands with different pharmacological properties to probe Nur77-RXRγ function, as in our previous Nurr1-RXRα study. In the text, we emphasized ligands that were most informative for the mechanistic conclusions, rather than discussing every compound individually. SR11237, for example, behaved similarly to the broader group of RXR agonists and was therefore shown as part of the full ligand panel but not specifically highlighted in the text. We will clarify this in the revised manuscript.

      (7) There is a sentence in the discussion that says "these observations implicate that although RXRg LBD provides the protein-protein interaction interface to bind Nur77...." the authors did not show enough data to support this claim. It should be bolstered.

      We thank the reviewer for this point. We agree that this statement was stronger than was warranted by the data presented. Our intent was not to claim that the present study definitively establishes the RXRγ LBD as the sole or fully defined protein-protein interaction interface for Nur77 binding. Rather, based on the domain truncation data together with our prior Nurr1-RXRα study, we intended this statement as a working interpretation consistent with an LBD-centered mechanism. In our revised manuscript, we have softened this language to avoid overstating the conclusion and clarified that the current data support, but do not definitively prove, a role for the RXRγ LBD in mediating functionally relevant interaction with Nur77.

      Reviewer #2 (Recommendations for the authors):

      Even though this study is not able to make definitive claims about the mechanism(s) of activation of Nur77 in the Nur77-RXRg complex, the work presented here is rigorous and solidly interpreted. Identifying differences between Nurr1 and Nur77 regulation is important, and the work here shows that selective agonists are essential for supporting the non-canonical mechanism they identified before. Although they address potential implications of NTD regulation in the discussion, it feels like a lot of insight into Nur77 regulation is being missed. However, it is clear that addressing this experimentally would require substantially more work. I don't have any specific recommendations. Given current limitations on funding, I think it's fine to focus on the work completed with the acceptance that it likely limits the impact of the work on the field.

      We thank the reviewer for this thoughtful and balanced assessment of our work. The goal of this manuscript was to test whether the LBD heterodimer dissociation mechanism that we previously reported for Nurr1-RXRα may represent a conserved feature of NR4A-RXR heterodimers by extending these studies to Nur77-RXRγ. We agree that understanding the role of the RXRγ NTD in Nur77-RXRγ regulation is important and potentially highly informative. At the same time, resolving that question experimentally would require a distinct and more extensive set of studies beyond the scope of the present work. We have therefore chosen to focus this manuscript on the completed LBD-centered studies, while acknowledging that this narrower scope may limit the broader impact of the work.

      Minor points:

      (1) Without page and line numbers, it is not easy to point out specific text. On the bottom of page 6 of the document, there are two references to Figure 3a, and the arrows that help illustrate RXRg LBD-dependent CSPs; the second figure callout should describe the blue arrow, I believe.

      Thank you, we made this change.

      (2) Bottom of page 8, "...revealed two compounds [that] standout..."

      Thank you, we made this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We truly appreciate all the effort that the reviewer put into reading and understanding our work. With a total of 37 excellent questions, this is one of the most thorough reviews that we have received in a long time.

      R1.0: Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      R1.1: Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      First, a few points of clarification.

      (i) We worked with two-photon calcium imaging data (mice), and with the envelope of multi-unit activity (monkeys). While both of these types of signals are strongly correlated with spikes, neither of them can be truly considered to be a point process.

      (ii)The reviewer points to Figure 2AB. The signals that we worked with can be negative. The black traces are the actual signals and show clear negative bouts, especially noticeable in the middle panel in Figure 2B. Of course, this does not mean that there are negative spike rates. This has to do with the way the data are normalized and not with the specific prediction method. However, the reviewer is correct in stating that the method that we used could also yield negative values even for non-negative spike rates.

      (iii) We did not bin the macaque data into 200-ms time bins, but rather 25-ms time bins (line 548, Figure 1B legend). Additionally, we have now performed additional analyses with different window sizes, showing that the conclusions still hold (see Supplemental Figure 4 and lines 139-143).

      To further address the reviewer’s question, we implemented a Poisson GLM enforcing non-negativity on macaque MUAe data (without spontaneous activity subtraction, ensuring strictly positive values; lines 135-139, Supplemental Figure 1M). The model did not improve predictions over ridge regression, confirming our methodological choice. This method is not directly applicable to mouse calcium data, since the activity after baseline subtraction can be negative.

      We did not use Granger or any other causality methods. The question of causality is certainly important, and there are multiple methods developed to assess causality in neural signals. We do not make any claims about causality in our study. A rigorous evaluation of causality is an interesting line of research for future work.

      R1.2: In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We have now added data from 2 additional monkeys, including:

      (i) A second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights off condition (lines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental figures 1-6, 8, 11, 12, and 13; Table 2).

      (i) We collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Ponce lab (lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental figures 1-2, 4, 6, 9, 11, and 12; Table 2). The new data include responses to the same checkerboard and gray screen images as the original dataset, along with responses during lights-off conditions.

      R1.3: One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test, applied to neurons with EV>0.4. The test confirmed significant bimodality in two of the three mice (MP031 and MP032: p<0.001; MP033: p=0.687). These results are now included in the Results section (lines 307-311) and shown in Supplemental Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings), the same test yielded non-significant results (e.g., p=0.994), confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.4: Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We also agree with this comment. Our goal is not to provide any direct quantitative comparison between the two species. We emphasize (lines 494-497) that the experiments in the two species differ along multiple dimensions, including: (i) differences in recording modalities (calcium vs. electrophysiology), (ii) associated differences in temporal resolution, neuronal types, and SNR, (iii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we also emphasized that the aim of this work is to investigate inter-areal interactions within each species rather than to draw quantitative comparisons between species (lines 497-499).

      Reviewer #1 (Recommendations for the authors):

      R1.5 In the analysis of directionality, you stated that subsampling was done randomly. Presumably, there could be multiple subsamples that fulfill the control of split-trial r. Are you only showing results from one subsample or multiple subsamples?

      We show the median from 10 subsample permutations. This is now clarified in line 621.

      R1.6 About the measurement 1-vs-rest r2. Understanding the definition is important for interpreting the results, but the definition was not clearly written. In lines 195-196, could you be more clear about whether the correlation is between the predicted neuron and other neurons in the predicted population or between the predicted neuron and the mean activity of the predictor population? Also, in line 212, why do you call this self-consistency? Isn't this a correlation between a neuron and the others?

      The 1-vs-rest r<sup>2</sup> value, or self-consistency, is the correlation calculated for each neuron i and does not involve other neurons. Let indicate the response 𝑟 of neuron i during trial t (t=1,..., T where T is the total number of trials). For a given trial t, we compute the average activity of the neuron excluding this trial:

      Throughout, the superscript (rest)means “all repetitions excluding repeat 𝑡”. The one-vs-rest correlation for the held-out repetition 𝑡 is:

      We then average these correlations across all held-out repetitions:

      We now clarify this in the text (lines 304-306 and lines 642-647).

      R1.7 In Figure 6 G and I. The "all" condition contains more neurons than either of the other two. In this case, is this comparison fair or meaningful?

      The reviewer is also correct here. The comparisons between the <10% and >80% groups contain the same number of predictor neurons, and those are fair comparisons. The “all” condition contains more predictor neurons, and, therefore, those comparisons are not fair. We clarified this point in lines 360-364.

      We included the “all” condition here because we think that it is an instructive sanity check in terms of reporting how EV changes with more neurons, and also in terms of understanding why the EV values in the other two conditions are lower. Expanding on this point with a little bit of philosophy, ultimately, when considering a neuron in area B (e.g., V4) and the contributions from neurons in another area A (e.g., V1), one would like to have access to all the inputs (e.g., all the neurons in V1 that are monosynaptically connected to the target neuron in area V4). We do not have access to this type of information, and we do not make any claims about monosynaptic connectivity, let alone exhaustive sampling of inputs to a given neuron. The “all” condition merely provides a quantitative illustration of the fact that EV increases with the number of predictor neurons. This observation may be considered to be somewhat trivial, but it should be pointed out that the conclusion relies on the input neurons sharing information with the target neurons (e.g., perhaps one may not be able to predict V4 activity very well from the responses of millions of neurons in the cerebellum).

      R1.8 I believe the results section can be improved by adding some interpretation after each finding.

      We thank the reviewer for the suggestion. We generally like to separate results from interpretation. However, to honor the suggestion, we added brief interpretations throughout the results section (lines 142-143, 171-173, 272-273, 279-281, 331-333, and 361-364) and expanded on the interpretations in the Discussion section.

      R1.9 Line 52 - 74: It would be better to be more specific about what kind of neuronal interactions, e.g., noise correlation, synchrony, etc.

      We added a clarification on the types of interactions we study in lines 68-73.

      R1.10 Line 81. Something seems to be missing after "5500". 5500 trials? Neurons?

      We thank the reviewer for pointing this out. The number refers to neurons (fixed in line 87).

      R1.11 Line 94. The readers would appreciate more explanation of the method.

      We have expanded on the explanation, as suggested (lines 106-107).

      R1.12 Line 104. The fraction of visually responsive neurons seems to be small. Is this typically for mouse V1? Would this fraction be higher if you also used the peak, as you did for macaque data in your SNR calculation (line 412)? And what is this number for the recorded L4?

      The reviewer correctly points out the small number of visually responsive neurons.

      We note that we now refer to the subset of neurons used for prediction analyses as visually reliable (VR) neurons (lines 115-116, 125-126, 178-179, 183-184, 211-212, 214-216, 217-226, 283-286), defined conservatively as neurons with SNR > 2 computed from the mean across all stimuli (not the peak to any one stimulus) and split-half reliability >0.8 (Methods, lines 569–590). This choice emphasizes neurons that are consistently informative over the full stimulus set.

      Regarding the question of how typical the number of responsive neurons in mice is, the fraction of “responsive” neurons in mouse V1 varies widely depending on the definition and stimulus set but the fractions are substantially lower than those reported in monkeys (with different methods). For those of us more used to the macaque neurophysiology literature, this has been one of the biggest surprises coming from work in rodents. Many studies report a sizable group of non-responsive neurons in mouse V1 (e.g., as little as 37% percent of V1 neurons being responsive in at least 25% of the trials according to de Vries et al., Nat Neur, 2020). Our fraction of visually responsive neurons is small because it couples a conservative SNR metric with a high trial-reliability threshold.

      As the reviewer notes, a peak-based metric based on any stimulus would be a less conservative criterion that would increase the fraction of neurons labeled responsive.

      R1.13 Line 113. Why not also give an exact percentage number?

      We have given the exact percentage number (lines 125-126).

      R1.14 Line 128. Is this just because L2/3 has more neurons? If so, then isn't this trivial?

      Our intention was to illustrate the best prediction performance we could get in either direction, which means including all L2/3 neurons. We have reworded our text to clarify (lines 149-151).

      R1.15 Line 134. Isn't this expected? Since V1 have more units than V4?

      The reviewer is correct. As discussed in R1.7 in mice, we sought to report the best prediction performances in either direction. We have edited our text for clarity (lines 149-151).

      R1.16 Line 165-168. What's the logical connection between these two sentences? If the former is true, we should expect to see differences. Also, why the same population? Shouldn't you include non-visual neurons?

      The two sentences in question are: “The difference in predictability in the absence of a stimulus could in principle change according to the directionality in inter-laminar interactions.” and, “There was no statistically significant difference in the EV fraction between laminar directions (L4→L2/3 vs. L2/3→L4) using the same control population as in Figure 3B (Figure 5A-C and Figure Supplement 2H).”. The key point here was to control for similar reliability values in order to make fair comparisons. We have added an additional comparison between directionalities focusing on nonvisual neurons (SNR<2 & r<0.8), and have also found no statistically significant difference between direction of predictability (Supplemental Figure 3A, right, lines 221-224).

      R1.17 Table 2. The information of which session corresponds to which experiment can be put in the table, which would be easier to read.

      We have added which sessions correspond to which experiments in Table 2.

      R1.18 Figure 1, Captions for panel c and d. I don't see any colored arrows in the figure.

      We removed the color descriptions (Figure 1C-D).

      R1.19 Figures 3, 4, and others. The annotations of "n.s." are very hard to see.

      We changed the color so that it is easier to see now (Figures 3, 4, 6, and Supplementary Figures 1-4, 6, and 8-10).

      R1.20 Figure 5, panel A. The legend is too small.

      We increased the legend size (Figure 5A).

      R1.21 Figure S5, panel D. Why are some of the data points connected?

      The paired connections are illustrated specifically in the highly predictable neurons to highlight the two separate distributions of neurons. One group, the highly predictable and highly reliable group, maintains its inter-laminar predictability after projecting out the “non-visual” activity (lines 327-330), whereas the highly predictable yet unreliable group shows a sharp decrease in inter-areal predictability, which corroborates the idea of non-visual components influencing neurons in mouse V1, as shown by Stringer et al. 2019b and consistent with our results.

      R1.22 l.91 "Ope" -> open?

      We fixed the typo (line 100).

      R1.23 Fig. 3C+D: Why is only one session used for this?

      One session was used to illustrate the distribution of split-half reliability values per area. Figure 3D contains information about all 5 stimulus sessions (see legend to Figure 3D).

      R1.24 "Even without controlling for the number of predictors or their respective split-half correlation values (627-688 sites in V1, 86-115 sites in V4), we found better predictability in the V1 to V4 direction than the reverse ( 𝑝 < 0.001, Figure Supplement 2I)." -> What does "even" mean here? Isn't this simply the null result if there is no true difference and the real reason the authors controlled for size?

      The reviewer’s understanding is correct. We have edited our text for clarity (lines 157-160)

      R1.25 "We could predict V1 and V4 activity across all stimulus types ( 𝑝 < 0.001, paired permutation test of prediction vs. shuffled frames prediction)." -> better than chance? For all neurons on average? What does this mean? Isn't it trivial and 100% expected that neural activity in the visual cortex is above chance related to the visual input?

      We stated that sites in V1 and V4 could predict each other across all stimulus types before describing the differences between them. We agree that this observation is to be expected and indicated so now in the text (lines 185-186).

      R1.26 "The predictability was the highest in both directions for neuronal activity in response to a full field checkerboard images (Figure 4D). In the V1 → V4 direction, the EV fraction was higher when predicting a slow-moving small thin bar compared to a fast-moving large thick bar (Figure 4D, left), whereas the opposite was true for the V4 → V1 direction (Figure 4D, right)." -> What does this mean? Is this expected or not? Under what theories of cortical processing?

      The differences between EV prediction directions (V1→V4: slow thin bars > fast thick bars; V4→V1: fast thick bars > slow thin bars) could be because V4 responses are more reliable for the slow thin bars whereas V1 responses are more reliable for the fast thick bars (Supplemental Figure 5H–I). To account for this possibility, we controlled for differences in target-related properties by regressing out covariates like SNR, split-half correlation, and variance. In monkey L, regressing out reliability/drive within direction using these covariates, the V4→V1 bar difference between slow thin bars and fast thick bars was not significant and the difference in the V1→V4 difference direction was reduced (Supplemental Figure 5K, lines 198-203). This suggests that the asymmetry primarily reflects stimulus‑dependent reliability of the target population rather than a strong directional selectivity.

      To the best of our knowledge, there are no clear predictions that match these observations from existing theories of visual cortical processing, especially given the paucity of computational models that include stimulus velocity when describing the responses in area V4. There has been extensive work on theories of surround suppression, but it seems unlikely that the thick bars would elicit surround suppression given the size of the V4 receptive fields. Many current computational models that aim to fit the responses of neurons in the visual cortex use neural networks that take an image as visual input and yield activations. Most of these models do not incorporate stimulus movement, and even those that do incorporate stimulus dynamics, only indirectly map onto interlaminar stimulus transformations or even between-area stimulus transformations. We hope that the results in this manuscript will help inspire and constrain better models of visual cortical processing.

      R1.27 Shouldn't all the predictability analysis be done conditioned on the stimulus in order to tell us more than the trivial "both V1 and V3, or L2/3 and L4, are driven by visual inputs"? (The spontaneous activity analyses are essentially that, for a small subset of the stimuli.)

      The key goal of this study is to quantify inter-areal interactions both under visual input and without visual input. This type of analysis is important because inter-areal interactions may depend both on visual inputs but also on neuronal inputs that are not triggered by visual signals. For example, extensive work in mice has now shown that neuronal responses in V1 depend on an animal’s running speed, independently of any visual input. Even within the visual input conditions, we present analyses where we shuffle trial order (e.g., Figure 7, Supplementary Figure 11) to estimate the contribution of trial-by-trial variations that are independent of visual inputs and other analyses where we project out non-visual activity (e.g., Supplementary Figure 7).

      R1.28 "In visually responsive neurons, there was a significant reduction in EV during gray screen compared to visual stimulus presentation" -> perfectly expected. But the report-worthy result here is how much is left, not whether EV is decreased!

      We have changed the wording on the results to highlight the sustained predictability (lines 211-212). It is important to note that, although the reduction in EV during gray screen may be expected, this observation does not hold for all neurons. In fact, there are some neurons for which the EV during visual presentation is comparable to that during gray screen (Figure 5B,C,E: neurons that lie on the diagonal line).

      R1.29 "Similar to the conclusions drawn from the mouse data, the predictability of neuronal activity was higher in response to stimulus presentation than to gray screen presentations" -> Really? Conditioned on stimulus, or explainable by the well-known fact that both V1 and V4 are visually driven?

      As discussed in R1.28, in mice, there are many neurons where the EV during gray screen is comparable to that during stimulus presentation. In monkeys, most sites were visually driven. As the reviewer points out, we expected that EV during stimulus presentation would be higher than during gray screen; this observation is a reasonable sanity check. The difference between unshuffled trials and shuffled trials (Figure 7, Supplementary Figure 11) provides an estimate of the interactions that are not purely explained by visual inputs alone in monkeys.

      R1.30 "Unlike the mouse, macaque correlation of visual predictability between stimulus presentation and spontaneous activity was high across all types of spontaneous conditions" -> Why? Is this simply explainable by a lower mean response in the spontaneous condition in the mouse? Are these mouse and monkey experiments truly comparable? Isn't it surprising that spontaneous activity in the monkey visual cortex compared to evoked activity is higher than in the mouse?

      With respect to the question of whether spontaneous activity (or stimulus-evoked activity) in monkeys is higher than in the mouse, it is difficult to make these comparisons. We emphasize in the text the multiple differences between the experiments in both species. Our goal is not to perform any quantitative comparison across species (see R1.4). We changed the wording to remove any inference of comparison between species (lines 248-250).

      R1.31 Occasionally imprecise presentation. Ex "To further examine the non-stimulus driven component, we reasoned that if the shared information between areas were strictly driven by the visual stimulus, then using the activity of a stimulus presentation repeat to one specific image could be used to predict the responses to any other stimulus repeat of the same image. On the other hand, if the shared activity does not have any stimulus-response information, then the prediction model would not work when considering responses across repeated presentations of identical stimuli in different trials. To test these two opposing ideas, we compared the inter-areal prediction EV fractions using unshuffled versus shuffled trials." -> Sets up two extreme strawmen (100% driven by stimulus vs 0% driven by stimulus). What does "model would not work" mean? EV=0? Hypotheses not ideas.

      Our intent was to set up two extreme hypotheses, not to claim that neurons must fall exclusively into one or the other. The two extremes help better interpret the results.

      The reviewer indicates that these are straw-man hypotheses. This may well be the case. But note the responses to R1.12, R1.27, R1.28, and R1.29. The reviewer seems to assume that all or most neurons in the visual cortex should be mostly or exclusively driven by visual stimuli.

      We also replaced “ideas” with “hypotheses”, as suggested. We have expanded the discussion of these points in the manuscript (lines 480-493). Many neurons occupy intermediate positions between these two extreme hypotheses. We clarified that “model would not work” refers to prediction accuracy approaching chance (EV ≈ 0).

      R1.32 "In both species and in both directions, inter-areal prediction EV fraction persisted (𝑝 < 0.001," Doesn't persist mean EV is unchanged? But the test is EV>0 or not in both cases.

      We meant that EV values remained significantly above chance, not that they were unchanged. The statistical test was indeed whether EV > 0 as the reviewer indicated. We have revised the text accordingly (lines 375-380).

      R1.33 "In mice, neurons showed a bimodal distribution in terms of their response predictability in shuffled and unshuffled trials" -> I don't see any bimodality in the figure, nor is there a statistical test provided for bimodality.

      In Figure 7C, a group of neurons lay essentially along the horizontal axis, whereas the other group is dispersed closer to the diagonal line. Specifically, the neurons that lay on the horizontal axis are also the ones whose responses are best predicted during gray screen activity. We have changed the text to clarify this point (lines 380-382).

      R1.34 "In the macaque V4 → V1 direction, there was a large proportion of neurons with peak EV when considering 25 ms to 50 ms offsets in the positive direction (i.e., V4 after V1, Figure 7I, right)." -> So what does this mean? Is this compatible with anything we know? This is the anti-causal direction so some kind of explanation would be warranted.

      In the V4→V1 panel, a positive offset means we use V4 at t+Δt to predict V1 at t (and conversely in the V1→V4 panel). Therefore, the fact that the peak EV occurs at +10–20 ms indicates that V1 leads V4 by ~10–20 ms: in other words, V1’s earlier response best predicts V4’s slightly later response. This observation is not anti-causal, but rather it is consistent with the canonical largely feed-forward V1→V4 latency (e.g., Schmolesky et al., 1998 among many others). We clarified this in text (lines 400-404).

      R1.35 L. 307: "In monkeys," plural!?

      While this was not correct in the original version, we have now added data from two more monkeys.

      R1.36 L. 313: "we observed an approximately bimodal distribution of neuronal responses, with a large subset of neurons that do not show reliable responses to visual stimuli both in L4 and L2/3" -> where?

      The bimodal distribution can be appreciated in Figure 6B (1-vs-rest r2, third panel, note neurons along the y-axis, see also R1.33) and Supplementary Figure 7B (lines 307-312). Additionally, as stated in R1.3, we have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test (lines 310-313); see also Supplementary Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings) the same test yielded non-significant results, confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.37 Random subsampling to control for population size done with how many subsamples? How are they combined? Variability across subsamples interpreted how?

      We performed 10 permutations and used the median distributions across permutations (line 621).

      Reviewer #2 (Public Review):

      R2.0: “Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.”

      R2.1: To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree with these points, and we discuss these specific limitations in capturing the variability of individual neurons in the Discussion section (lines 500-504). We have now also added analyses based on local field potentials (LFP). LFPs do not directly reflect the activity of individual neurons either.

      R2.2: From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We agree that these datasets do not lend themselves well to directly separating and quantifying all the different sources of the predictive signals. We expand on this point in the Discussion section (lines 509-511).

      R2.3: The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We also agree with this comment. We emphasize that our goal is not to attempt a direct quantitative comparison across species (lines 497-499).

      R2.4: Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We have expanded on the Discussion section to explicitly state the points raised by the reviewer (lines 494-509).

      In mice, we have now also analyzed a separate dataset in which behavioral measurements were available, including running speed and facial motion (FaceMap SVDs). We used these to build behavioral-only and combined models to predict neural activity. We found that behavioral variables explained a modest but consistent portion of the variance across both spontaneous and stimulus conditions (Supplementary Figure 10A,C, lines 268-273).

      For the macaque data, we analyzed pupil size as the only available behavioral measure in the macaque dataset. We focused specifically on the “resting state, eyes open” condition, where both neural activity and pupil measurements were available. Using ridge regression, we assessed the extent to which pupil size predicted neural activity in V1 and V4. Pupil size alone explained only a small fraction of the variance (Supplementary Figure 10E, lines 274-276).

      R2.5: Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      Without going into specifics about the numbers, it is hard to fully address this question. As the reviewer noted in R2.1, the mouse data analyzed here do not lend themselves to evaluating predictability on scales of tens of milliseconds. In the macaque data, we have now conducted additional analyses where we binned the activity across a range of bin sizes (10 ms to 200 ms). The new analyses are shown in Supplementary Figure 4, and described in lines 140-143, 160-163.

      R2.6: The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7–13).

      Second, we note that our mouse preprocessing standardized responses by spontaneous mean and SD per neuron, controlling baseline scale across conditions (lines 535-538). Because of this standardization, spontaneous traces have unit scale (mean = 0, SD = 1).

      To test whether differences in variance underlie our findings, we calculated the variance for both species. For mice, we computed variance across repeats (visual) and across timepoints (lines 286-291). For the macaque moving-bar sessions, we computed variance across the concatenated held-out samples pooling timepoints, repeats, and bar identities (lines 291-292).

      The V4 population showed a higher overall variance distribution compared to the V1 population (Supplementary Figure 2I-J), and L2/3 variance was also overall higher than L4 (Supplementary Figure 2D-E). We also see a modest monotonic relationship between EV fraction and this variance (mouse visual: Spearman ρ = 0.43–0.52, p < 0.001; macaque stimulus responses: ρ = 0.50–0.56, p < 0.001; macaque gray-screen responses: ρ = 0.38, p < 0.001, Figure 6A,D), indicating variance contributes to (but is not the primary driver of) EV prediction fraction. We then adjusted for variance by fitting, within each stimulus condition, a linear regression of EV on variance (excluding shuffled-control rows) and conducted all comparisons on the resulting residual EV values, thereby isolating effects not attributable to variance (see Supplementary Figure 3E-G, lines 165-171).

      Reviewer #2 (Recommendations for the authors):

      R2.7 Overall I found this manuscript to be very clearly written and the results compelling, although I found myself wanting a little more. I believe these datasets also include information about eye movements, pupil diameter, and maybe locomotion and whisking in the rodent work. I think it could be informative to ask the degree to which the predictability, particularly during the spontaneous activity, is attributable to these other known sources of variance in trial-by-trial measures. My concern is that during visual stimulation, the space of cortical responses is limited to a very narrow scope (observing a visual stimulus during fixation) whereas spontaneous activity includes a broader range of possibilities (different states of arousal, eye movement).

      We analyzed the role of behavioral variables that could explain the neural activity in mouse V1 (including the variables suggested by the reviewer, running speed, facemap SVDs). The open dataset authors warned not to use pupil size since in the dark, the measurements were not accurate. In terms of the contribution to the predictability of mouse V1 activity, these behavioral variables showed a weak yet significant contribution (Supplementary Figure 10A,C, lines 260-270).

      R2.8 By controlling for eye movements or pupil diameter during spontaneous measurements, would you improve your measure of predictability?

      When predicting neural activity in the lights-off eyes open condition, combining neural data of the predictor population with information of pupil size did not result in a statistically significant increase in EV fraction when predicting the target population (Supplementary Figure 10E, lines 276-278).

      R2.9 Also, there is work that shows feed-forward correlations between V1 and higher visual areas are observed in higher frequency activity, whereas feedback is associated with lower frequency activity. If you compared your predictability measure over bandpasses with different timescales, would you find the direction of V1-V4 interactions changes consistent with this previous work?

      To address this question, we extended our analyses to the local field potential signals (LFPs) in monkeys, using band-limited LFP power (2–12, 12–30, 30–45, 55–95 Hz). We reran the lag sweep analyses (10-ms steps; 200-ms windows slid every 10 ms) in both directions. The Gamma band showed a feed-forward signature in the early evoked period: the V1→V4 predictability peaked at negative offsets (∼10–30ms; V1 leads), and the V4→V1 predictability peaked at positive offsets, consistent with previous findings. The results for low and beta frequency bands are also presented in the text (Supplemental Figure 13, lines 412-423).

      Reviewer #3 (Public review):

      R3.0: Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      R3.1: "A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4."

      We have now added the second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights-off condition. In addition, we collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Carlos Ponce lab (monkey A: seelines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental Figures 1-6, 8, 11, 12, and 13; monkey D: see lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental Figures 1-2, 4, 6, 9, 11, and 12. The conclusions for the new monkeys are qualitatively similar to the ones reported previously. The main quantitative differences are due to the very large difference in the number of predictor sites (Table 2, lines 127-134).

      R3.2: The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7-13).

      In Figure 4, differences in target-population responsiveness could influence predictability across stimulus types, as the reviewer points out. We therefore controlled for this by modeling EV as a function of the following neuron properties: split-half r, SNR, one-vs-rest r^2, and response variance. Regression was performed within each direction, where we then used residuals for inference_._ When comparing residuals, the predictability of checkerboard responses remained statistically higher than the predictability of the responses to moving bars (p<0.001, permutation test, Supplementary Figure 5K, lines 196-203), suggesting that the differences in predictability cannot be exclusively attributed to differences in the target population neuronal properties.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

      We appreciate the positive reception to our manuscript. In the revised manuscript, we have further clarified the logic underlying the task design, adopted a more cautious tone in interpreting the behavioral and neuroimaging results, and enhanced the transparency of reporting and presentation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

      We thank the reviewer for these helpful comments. In the revised manuscript, we have provided additional clarification regarding the SI and RI conditions (page 29), expanded the justification for the behavioral trial exclusion criteria (page 32), and clarified how the null condition was modeled and incorporated into the analyses (page 29). In addition, we have revised the description of the behavioral results to adopt more cautious wording, particularly given the absence of a significant interaction effect. For detailed responses to these specific points, please refer to the "Recommendations for the Authors" section below.

      Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      We thank the reviewer for raising these important points. In the revised manuscript, we have further clarified the rationale of the experimental design and the motivation for introducing cognitive conflict.

      First, we clarified that previous neuroimaging studies relied primarily on SOA-based contrasts, which capture the temporal dynamics of attentional orienting but do not directly distinguish the functional processes of integration and segregation. We therefore established the direct comparison between cued and uncued targets in the long SOA as the critical test required by the theory, as these conditions are hypothesized to engage integration and segregation processes, respectively (pages 6-7, “The Challenge of Neural Verification”). Crucially, to successfully implement this comparison, we highlighted the specific methodological advantage of our study: the use of a Genetic Algorithm (GA) to optimize the stimulus sequence. We explained how this design maximizes statistical power specifically for contrast detection (i.e., cued vs. uncued) while maintaining high estimation efficiency, thereby directly overcoming the power constraints that had likely obscured these subtle neural signatures in prior ER-fMRI work (pages 7-8).

      Second, we clarified that the manipulation of cognitive conflict was introduced with the additional aim of examining IOR expression mechanisms, specifically investigating how spatial attention modulates ongoing cognitive processing after target onset, rather than the generation of IOR itself. We have now provided a clearer rationale for embedding a modified Stroop task within the cue-target paradigm, and explained how this design allows us to dissociate semantic and response conflicts while avoiding methodological confounds present in previous studies (page 8).

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

      We thank the reviewer for pointing out the interpretational challenges in Figure 4. To address this, we have revised Figure 4 and provided a clearer and more precise interpretation of these interaction effects in the manuscript.

      First, we have added explicit panel titles to Figure 4 (page 17). Panel A is now clearly labeled as the “Effect of IOR on Semantic Conflict”, while Panel B is labeled as the “Effect of IOR on Response Conflict”. We hope this visual labeling helps readers clearly identify the IOR modulation effects specific to each conflict type.

      Second, we have revised the figure caption to explicitly define the interaction contrasts used to quantify these modulations, providing specific formulas (e.g., [UncuedRI – Uncued-SI] > [Cued-RI – Cued-SI] for response conflict) to ensure transparency.

      Finally, regarding the reviewer’s comment on “deactivation”, we realized that our original figure terminology (e.g., “IOR effect under...”) might have caused confusion by mixing the interaction effect with the IOR effect itself. We have clarified that Figure 4 specifically illustrates the “Effect of IOR on the Semantic Conflict and the Response Conflict” (i.e., interaction effect between IOR and cognitive conflict). To interpret this interaction, we further examined the simple effects of conflict under each cueing condition. Specifically, we analyzed the neural signatures of semantic conflict (SI minus NE) and response conflict (RI minus SI) separately for the cued and uncued targets. Importantly, regarding the nature of the IOR effect itself (as displayed in Figure 3, page 14), it is not simply a uniform deactivation. Instead, by directly comparing the cued and uncued conditions for the neutral words, we observed neural changes in two directions: some specific regions exhibited an increased activation (Cued > Uncued), while others showed a reduced activation (Uncued > Cued). These differential patterns involved distinct brain networks and corresponded to the distinct integration and segregation mechanisms, respectively, rather than a global loss of activation (pages 20-21).

      Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

      We thank the reviewer for the positive and constructive assessment of our study. In response to the concerns raised, we have carefully revised the manuscript and addressed all points in detail below. In brief, we have clarified key interpretation issues in the Discussion section, including the complementary roles of dACC activation and the distinction between statistical clustering and functional interpretation of PHG and STG activations (pages 20-21). We have also improved transparency and reporting throughout the manuscript by providing more detailed sample characteristics, clarifying exclusion criteria and global variance computation, adding illustrative supplementary figures, and standardizing statistical reporting and abbreviations (pages 28, 33). Finally, we have added a concise paragraph on limitations of the study to provide a more balanced interpretation of the findings (pages 26-27). Detailed, point-by-point responses to all specific comments are provided below (see the “Recommendations for the authors” Section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) The figure caption contains an unclear sentence (lines 195-196): "The target was a 450-ms colored Chinese character presented 600 ms after the fixation cue onset at the two target locations with equal probabilities." This description is ambiguous and should be revised for clarity.

      Thanks for pointing this out. In the revised manuscript, we have rephrased the figure caption to improve clarity as follows (pages 9-10):

      “Each trial started with a 150-ms non-informative cue presented at one of the two peripheral boxes. After a 150-ms interstimulus interval (ISI), a 150-ms fixation cue was presented at the central fixation box. Following a further 450-ms ISI, the target, a colored Chinese character, appeared at one of the two target locations with equal probabilities and remained on the screen for 450 ms. The trial ended with a variable intertrial interval (ITI) of 850, 1050, 1250, or 1450 ms (with equal probabilities).”

      (2) Please provide a more detailed and clearer description of the SI and RI experimental conditions in the Methods section.

      Thanks for this helpful suggestion. We have revised the Methods section to provide a more detailed description of the SI and RI conditions. Specifically, we have further described the stimulus-response mapping and clarified how the SI and RI conditions are defined based on whether the ink color and the character meaning fell into the same or different response categories under this mapping. In addition, we have added a clarification in the Methods section to make it clearer that the SI trials involved semantic conflict without response conflict, whereas RI trials involve both semantic and response conflicts (page 29).

      (3) As the data were collected across two research centers, please clarify the number of participants enrolled at each site.

      Thanks for this suggestion. We have now explicitly stated in the Apparatus and Data Acquisition section that 16 participants were enrolled at each site. The revised text reads (page 31):

      “The imaging data were acquired at two research sites following comparable protocols, with equal numbers of participants scanned at each site (n = 16 per site).”

      (4) In the behavioral data analysis, please provide the rationale or justification for the criteria used to exclude trials.

      Thanks for this comment. In the revised manuscript (page 32), we have clarified that reaction times (RTs) shorter than 150 ms were excluded as anticipatory responses, and RTs longer than 1,300 ms were excluded to limit the influence of unusually slow responses. These exclusion criteria are commonly adopted in RT research and were applied consistently across all conditions (Ratcliff, 1993; Whelan, 2008).

      (5) Given that the behavioral interaction effect was not statistically significant, the conclusion on lines 236-237, "These data clearly isolated the two distinct conflict effects in the Stroop effect, namely the semantic conflict (SI-NE difference) and the response conflict (RI-SI difference)" appears overstated and should be softened accordingly.

      We thank the reviewer for this important comment. We have clarified that our original statement was intended to highlight the successful isolation of conflict types based on the significant main effects of congruency (validating the task design), rather than implying a significant interaction effect. However, we agree that the original phrasing appeared unclear in this context. We have therefore revised the sentence to adopt a more cautious tone in the revised manuscript (page 12):

      “These data demonstrated typical Stroop interference effects (Veen & Carter, 2005) in both the semantic (SI-NE difference) and response conflicts (RI-SI difference).”

      (6) The statement on lines 281-282, "Although the IOR effect showed no effect on either the semantic conflict difference (SI-NE) or the response conflict difference (RI-SI) in the behavioral performance" lacks supporting statistical evidence. Please report the relevant test statistics.

      We appreciate the reviewer’s careful reading and note that the relevant statistical evidence was missing from the original manuscript. This has now been added in the revised version. Specifically, we examined the interactions between cue validity and semantic conflict (SI vs. NE) as well as between cue validity and response conflict (RI vs. SI). Neither interaction was significant (see revised Results for full statistics on page 12), supporting our original statement that cue validity did not modulate either conflict component in behavioral performance.

      (7) The manuscript mentions that a null condition (with no Chinese character presented) was included to increase statistical power for detecting differences across conditions. However, it is unclear how this null condition was actually used in the data analyses. Please clarify the role of the null condition in both the behavioral and neuroimaging analyses.

      Thanks for this comment. We regret that this was not sufficiently clear in the original manuscript. The null condition was included for neuroimaging purposes and was not used in the behavioral analyses, as no response was required in these trials. In the fMRI analyses, null trials served as the implicit baseline and were not modeled as regressors of interest. Task-related activities for all experimental conditions were therefore estimated relative to this null baseline, facilitating estimations of task-related responses in randomized event-related designs (Burock et al., 1998; Friston et al., 1999; Liu, 2004). We have clarified this point in the revised manuscript (page 29).

      References

      Burock, M. A., Buckner, R. L., Woldorff, M. G., Rosen, B. R., & Dale, A. M. (1998). Randomized event-related experimental designs allow for extremely rapid presentation rates using functional MRI. NeuroReport, 9(16), 3735-3739. https://doi.org/10.1097/00001756-199811160-00030

      Friston, K. J., Zarahn, E., Josephs, O., Henson, R. N. A., & Dale, A. M. (1999). Stochastic designs in event-related fMRI. NeuroImage, 10(5), 607-619. https://doi.org/10.1006/nimg.1999.0498

      Liu, T. T. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part II: design of experiments. NeuroImage, 21(1), 401-413. https://doi.org/10.1016/j.neuroimage.2003.09.031

      Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3), 510-532. https://doi.org/10.1037/0033-2909.114.3.510

      Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3), 475-482. https://doi.org/10.1007/BF03395630

      Reviewer #2 (Recommendations for the authors):

      (1) The paper is a bit too lengthy, with a lot of information that is hard for non-experts to grasp.

      We thank the reviewer for this comment. We realized that the Introduction was the most challenging section for general readers. In the revision, we refined the text in the Introduction for a better structure and more reader-friendly wording to improve readability. In addition, following the reviewer’s suggestion (Recommendation 4 below), we have added short subsection titles to the Introduction, Results, and Discussion sections to better organize the content and highlight the main ideas. We hope these revisions make the manuscript more accessible and easier for a broader audience to follow.

      (2) Please double-check the stats, as some of the results presented in the main text do not align well with the figures. Take Figure 2 as an example.

      We appreciate the reviewer’s concern and have double-checked all statistics. All the results are consistent between the figures and the main text. Take Figure 2 as an example (page 12), the perceived discrepancy probably was caused by the fact that the descriptive values reported in the main text are marginal means for the main effects (i.e., the overall average of one factor, collapsed over the other factor), whereas Figure 2 shows the mean for each Congruency × Cue Validity condition (i.e., simple effect).

      (3) The reasoning that the neuroimaging findings support the dissociation between integration and segregation needs to be improved.

      We thank the reviewer for this important comment. In the revised Discussion (pages 1921), we have strengthened the reasoning linking our neuroimaging findings to the dissociation between the integration and segregation processes. Specifically, we make it clear how the distinct activation patterns observed for the cued and uncued targets map onto the different functional demands proposed by the integration-segregation theory. The cued targets were theorized to recruit the frontoparietal attentional control networks, consistent with the re-engagement of an existing object file (integration). On the other hand, the uncued targets should engage the medial temporal and temporal association regions responsible for novelty detection and episodic encoding, consistent with the creation of a new object file (segregation). We hope the reviewer finds that the revision offers a clearer explanation of how the observed neural patterns are consistent with a dissociation between the integration and segregation processes.

      (4) Please use short section titles to organize the introduction, results, and discussion sections. For instance, the discussion section is a long chunk of text (almost 9 pages) and is pretty dense, making it hard to quickly grasp the ideas the authors want to convey.

      Thanks for this helpful suggestion. Following the reviewer’s recommendation, we have now added short subsection titles to the Introduction and Discussion sections to improve structure and readability. For the Results section, we have maintained and further refined the existing subheadings to ensure consistent organization.

      Reviewer #3 (Recommendations for the authors):

      I found this manuscript to be a timely and substantive contribution to the study of attention and cognitive neuroscience. To my knowledge, it provides the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention, a framework that has been influential in behavioral work for more than two decades but has lacked clear neural support. The study is conceptually well motivated, methodologically solid, and generally clearly reported. The findings differentiate neural substrates associated with integration and segregation processes and further show how inhibition of return (IOR) interacts with semantic and response conflicts at the neural level.

      The manuscript is well organized, the writing is mostly clear, and the progression from theory to hypotheses and methods is easy to follow. The combination of IOR with a modified Stroop paradigm is a clever choice that extends the theoretical scope of exogenous attention research. The use of an optimized event-related fMRI design based on a genetic algorithm is also a strength and reflects careful attention to design efficiency.

      The main results are internally consistent and theoretically meaningful. Integration related activity in the fronto-parietal attention network (including FEF, IPS, TPJ, and dACC) and segregation-related activity in medial temporal areas (PHG and STG) it well with the proposed framework, and the pattern of activations is coherent across analyses.

      Overall, I think this is a carefully executed study that offers much-needed neural evidence bearing on the integration-segregation theory of exogenous attention. I would recommend the following revisions.

      Suggestions:

      (1) In the Discussion (pp. ~17-18), dACC activation is described both in terms of general cognitive control demands and as reflecting a possible inhibitory bias toward the cued direction. It would help the reader if you could briefly indicate whether you see these as complementary (e.g., dual roles within the same region) or as more competing interpretations.

      We thank the reviewer for this helpful comment. We have clarified in the revised manuscript that dACC exerts general cognitive control demands and biasing against the cued direction are complementary rather than competing interpretations. Specifically, we described how the dACC is involved in both the cognitive control required for target integration and the inhibitory bias toward the cued location, thereby highlighting its dual roles within the same region. The revised section reads as follows (page 20):

      “Furthermore, the observed increase in the left dACC activity under the cued relative to the uncued condition likely reflected the engagement of cognitive control mechanisms (Botvinick et al., 2004; Chung et al., 2024; Mayer et al., 2012; Veen & Carter, 2005), particularly in resolving the conflict between the task-driven requirement of target integration and the reduced accessibility of the cue-initiated representation. In this context, the heightened activation of dACC may also reflect its role in fulfilling the inhibitory bias toward the cued location (Mayer et al., 2004) and discouraging inefficient integration attempts at a location marked as less relevant.”

      (2) In the Discussion, you could consider adding a short paragraph explicitly acknowledging a few limitations and how they might constrain generalization of the findings. A concise reflection of this kind would give a more balanced picture without undermining the main conclusions.

      We appreciate this helpful suggestion. In the revised manuscript, we have added a concise paragraph explicitly addressing a key limitation of the present study (pages 26-27). Specifically, we acknowledge that the absence of behavioral interactions alongside clear neural effects requires cautious interpretation. We discussed how this dissociation may reflect differences in measurement sensitivity between behavioral and neural indices, consistent with prior findings (Chen et al., 2006; Wilkinson & Halligan, 2004). We also note that the use of a GA-optimized sequence, while improving statistical efficiency, may have introduced unintended regularities in event order that could influence behavioral strategies.

      (3) Since the dataset is hosted on GitHub, adding a short note in the Data Availability section about whether the repository will also include analysis scripts or future replication data would further enhance transparency and long-term usefulness.

      Thanks for this helpful suggestion. We have revised the Data Availability section (page 35) to clarify that the GitHub repository contains the processed data used in the final analyses. Analysis scripts and additional materials for replication are available from the authors upon reasonable request.

      (4) In the Results section, the formatting of statistics is not fully consistent. For example, some reports use spaces around symbols (e.g., "η<sup>2</sup> = 0.301") whereas others do not (e.g., "p< .001"). It would be good to standardize this (e.g., "p < .001", "η<sup>2</sup> = .30") across the manuscript.

      Done as suggested.

      (5) A few abbreviations appear before they are defined-for instance, SPC (superior parietal cortex) shows up in the Results (response conflict section) before the full name is given. Ensuring that each abbreviation is defined at first mention would help readers who may be less familiar with all of the regional acronyms.

      Thanks for this comment. We have conducted a thorough check of the manuscript and ensured that all abbreviations are defined upon their first occurrence.

      (6) The text sometimes refers to "PHG/STG" as a combined cluster, while at other points, PHG and STG are described separately. It would be useful to clarify under what circumstances they are treated as a single functional cluster versus distinct regions of interest, and to keep the nomenclature as consistent as possible between the main text and the tables.

      Thanks for raising this point. In the revised manuscript, we have clarified this issue by distinguishing between statistical clustering and functional interpretation. In the whole brain analysis, activations in the left hemisphere formed a single continuous cluster spanning the PHG and STG; therefore, this cluster is labeled as “PHG/STG” in Table 1. We have explicitly noted the continuous nature of this cluster in the Results section (page 15) to ensure clarity:

      “Notably, in the left hemisphere, these activations formed a continuous cluster spanning both regions (labeled as PHG/STG in Table 1).”

      (7) It would be helpful to provide a bit more detail about the sample characteristics (e.g., age range, handedness, and inclusion/exclusion criteria) and to state explicitly how many participants, if any, were excluded from the analyses and for what reasons. This would help readers better evaluate data quality and generalizability.

      Thanks for this helpful suggestion. We have revised the Participants section (page 28) to provide the full details regarding our sample:

      “32 healthy participants with normal or corrected-to-normal vision and normal color vision were recruited. All participants were right-handed and reported no history of neurological or psychiatric disorders. Data from three participants were excluded due to excessive head movements and high global variances (see fMRI Data Analysis), leaving 29 participants for analysis (18 female, 11 male; aged 18-30 years, M = 22.69, SD = 2.58).”

      Furthermore, we have provided a clearer description of the exclusion criteria in the Data Analysis section (pages 33-34) as follows:

      “Runs with motions exceeding one voxel length in any direction were excluded (resulting in the exclusion of two runs) …Runs with global variance equal to or over 0.1% were excluded, resulting in the exclusion of eight runs (see Supplementary Information for details). Ultimately, three participants were excluded because neither run met the quality criteria. All remaining participants retained both runs, except for three individuals who each contributed only one valid run.”

      (8) Given that participants were excluded based on global variance exceeding 0.1%, it would be very informative to include, in the Supplementary Materials, an illustrative figure showing the signal time series (or global signal variance over time) for excluded participants.

      We appreciate this valuable suggestion. In the revised Supplementary Materials, we have included a new figure (Figure S2) that plots the global signal time series for the excluded runs to illustrate the signal patterns that led to their exclusion based on global variance.

      (9) Relatedly, it may help to more explicitly describe how global variance was computed (e.g., over which time window, after which preprocessing steps, and whether it was calculated on whole-brain signal or within specific masks). A concise clarification would make the exclusion criterion easier to interpret.

      Thanks for this helpful suggestion. We have now clarified in the manuscript how global variance was computed (page 33) and have also provided a more detailed description of the computation procedure in the Supplementary Materials (page 4). Specifically, after the standard preprocessing (slice timing correction, 3D motion correction, spatial smoothing, linear trend removal, and high-pass temporal filtering), the global signal was computed for each run as the mean signal across voxels with intensity values greater than 100 in each volume. Global variance was then quantified as the temporal variance of this run-wise global-signal time course across all volumes, providing a quality-control index of signal stability.

      (10) Rather than only reporting a single overall exclusion rate (e.g., 5.52% of total trials), it would be informative to break this down by source, reporting separately the proportion of trials excluded as RT outliers and the proportion excluded due to response errors. This would further improve transparency regarding the behavioral preprocessing pipeline.

      Thanks for this helpful suggestion. We have now broken down the overall exclusion rate by source in the revised manuscript. Specifically, we reported that 4.29% of trials were excluded due to incorrect responses, and 1.24% of trials were excluded as RT outliers (page 32).

      References

      Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: an update. Trends in Cognitive Sciences, 8(12), 539-546. https://doi.org/10.1016/j.tics.2004.10.003

      Chen, Q., Wei, P., & Zhou, X. (2006). Distinct neural correlates for resolving stroop conflict at inhibited and noninhibited locations in inhibition of return. Journal Of Cognitive Neuroscience, 18(11), 1937-1946. https://doi.org/10.1162/jocn.2006.18.11.1937

      Chung, R. S., Cavaleri, J., Sundaram, S., Gilbert, Z. D., Del Campo-Vera, R. M., Leonor, A., Tang, A. M., Chen, K.-H., Sebastian, R., Shao, A., Kammen, A., Tabarsi, E., Gogia, A. S., Mason, X., Heck, C., Liu, C. Y., Kellis, S. S., & Lee, B. (2024). Understanding the human conflict processing network: A review of the literature on direct neural recordings during performance of a modified stroop task. Neuroscience Research, 206, 1-19. https://doi.org/10.1016/j.neures.2024.03.006

      Mayer, A. R., Seidenberg, M., Dorflinger, J. M., & Rao, S. M. (2004). An event-related fMRI study of exogenous orienting: supporting evidence for the cortical basis of inhibition of return? Journal Of Cognitive Neuroscience, 16(7), 1262-1271. https://doi.org/10.1162/0898929041920531

      Mayer, A. R., Teshiba, T. M., Franco, A. R., Ling, J., Shane, M. S., Stephen, J. M., & Jung, R. E. (2012). Modeling conflict and error in the medial frontal cortex. Human Brain Mapping, 33(12), 2843-2855. https://doi.org/10.1002/hbm.21405

      Veen, V. V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. Neuro Image, 27(3), 497-504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Wilkinson, D., & Halligan, P. (2004). The relevance of behavioural measures for functional imaging studies of cognition. Nature Reviews Neuroscience, 5(1), 67-73. https://doi.org/10.1038/nrn1302

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public reviews:

      Reviewer #1 (Public review):

      The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

      As noted in our initial resubmission, we fully accept that the sample size is limited. We have already toned down statements related to this, to say that our findings “strongly suggest” that the cells with dense Trpm8 input are cold-selective (both in the Abstract and Results)

      Reviewer #2 (Public review):

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

      Please see our response to Reviewer #1 (Public review):

      Reviewer #3 (Public review):

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

      Again, please see our response to Reviewer #1 (Public review):

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 609. The authors used the Trpm8Flp;RCE:FRT;Ai9 mice in some electrophysiological experiments. What is the function of the Ai9 allele (a Cre-dependent reporter) in this cross? Should not be a Cre line as well?

      One of the mice used for electrophysiological experiments was Trpm8Flp;RCE:FRT;Ai9, and this animal received an injection of AAV encoding Cre into the caudal ventrolateral medulla, resulting in tdTomato expression in spinal projection neurons. This part of the Methods was inadvertently omitted from the resubmitted version (see next point). This has been corrected, and in addition, this information is shown in the cartoon in Fig 4A and is explained in the figure legend.

      (2) Line 860. Phrase is incomplete

      We apologise for this – 3 lines from the original version had been deleted inadvertently. This has now been corrected.

      (3) Line 103 "These results are therefore consistent with the transcriptomic findings described above (36,37)."

      I would revise the references used to support this claim. Reference 37 is a transcriptomic atlas of the brain. I could not find TRPM8 expression data in DRG in this reference.

      Figure S4 of reference 37 deals with the mouse peripheral nervous system and describes Trpm8 classes of primary afferent. More detail on these cells (including expression of VGLUT3, Tac1, Calca and Trpv1) can be found in the associated website: mousebrain.org/adolescent/genesearch.html. We have therefore left this reference as it is.

      (4) Line 242. "neurons with dense Trpm8 input had significantly lower sEPSC frequencies compared to those that lacked dense Trpm8 input".

      This is an interesting paradox because cold thermoreceptors (i.e. the presumed direct monosynaptic input to these projection neurons) are known to be spontaneously active at physiological skin temperatures. This is well characterized in trigeminal corneal endings (DOI: 10.1038/nm.2264). In fact, the decrease in this spontaneous activity can be used by mice to faithfully detect warm stimuli (DOI: 10.1016/j.neuron.2020.02.035). This reviewer likes to remark that this low spontaneous frequency may be due to the non-physiological temperature of this preparations, leading to partial adaptation/desensitization of the afferents. Perhaps, it also influences the amplitude (e.g. release probability) of EPSPs (I do not expect you to do anything about my remark).

      These are interesting points, but we do not feel that we can add anything here.

      (5) Figure 3A. It would be useful to include orientation references (dorso-ventral, mediolateral) in the images. Same comment applies to Figure 5C.

      Since these are horizontal sections, the axes are medio-lateral and rostro-caudal. Corresponding orientation markers have been added to both figures.

      (6) Figure 3F. If I understood correctly, the light pulse used for optogenetic activation is delivered directly through the objective used for recording the cell. Thus, the distance between pre and postsynaptic neuron should be minimal. That being the case, I do not understand how a monosynaptic input can have a delay of 5 or 7 ms. Am I missing something?

      The relatively long duration of latency is likely to reflect a slow rise time of depolarisation in the Trpm8 terminals, so that although channels will open very rapidly, there is a delay until the boutons reach action potential threshold. Hachisuka et al (2016) recorded from Nts<sup>Cre;</sup>Ai32 mice (i.e. coding for channelrhodopsin) and found typical latencies of >5 ms (Fig 5E in that paper). We believe that this delay is exacerbated by the low levels of expression of ChR2 that we were able to achieve with the neonatal i.p. injection approach. We have provided a brief explanation for this, and cited the reference in the Results section (lines 197-198).

      (7) Figures 4E/H. To be meaningful, the pie charts should include the n (total number of neurons). See, for example figure 5J.

      Numbers have been added to the pie charts.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Speed and mismatch between locomotion and visual stimulation.

      The authors do not clearly describe the definition of locomotion versus the resting state. The speed should, by itself, have an impact on neuronal responses, especially at the onset of locomotion. Several published studies show that the mismatch between a visual stimulus and the speed of the animal induces specific responses in V1, both in excitatory and subtypes of inhibitory neurons. The authors should address these points upfront in the manuscript, since it is likely a major variable explaining their results.

      We will clarify in the methods that a trial was considered as locomotion when an animal ran at a minimum of 3 cm/s for at least 80% of the 10 s stimulus presentation, and was considered rest when running under 3 cm/s during the same fraction of time. Trials with abrupt changes from locomotion to rest were rare and excluded following these criteria.

      Locomotion speed and visuomotor mismatch can influence neuronal responses in V1 but in the large majority of our trials mice either run continuously at a stable speed or remained still

      i.e locomotion onsets or offsets did not occur (see Hinojosa et al. 2026 for example running traces). Furthermore, sensitizing and depressing neurons were typically recorded simultaneously within the same field of view, experiencing identical locomotor behaviour. For these reasons, we think it is unlikely that differences in speed or mismatch alone can account for the different increase in amplitude observed between depressors and sensitizers.

      To directly address this point and further explore the role of speed on V1 neurons, we will quantify the relationship between running speed and amplitude increase in both PCs and interneurons, and include these analyses in the revised version of the manuscript.

      (2) Use of deconvolution with MLSpike.

      Some results (Figure 2) exclusively depend on the deconvolution of calcium signals into spikes (since the initial peak is not seen in calcium transients). The authors should validate this result either with electrophysiological recordings or with the use of another deconvolution method (e.g CASCADE), emphasising the limitations of this approach and the limitations of the time resolution of calcium imaging.

      A similar initial increase in amplitude followed by fast depression has been observed previously with electrophysiological recordings in V1 (Chance et al., 1998; Jin & Glickfeld, 2020; Varela et al., 1997). We will further validate our results using an alternative spike inference method like CASCADE (Rupprecht et al., 2021), as well as expanding on the limitations of our approach.

      (3) The manuscript is centred around a specific increase in visual responses in sensitizing neurons during locomotion, both in the fraction of responsive neurons and response magnitudes.

      It is hard to tell whether this difference is due to a greater scaling effect of locomotion, a difference in responses during the resting state, or both. The manuscript should further explore and discuss the differences in responses between sensitizing and depressing neurons, both during the resting state and locomotion. Adding metrics and direct comparisons of the magnitudes of fast responses, slow responses, and time integrals between sensitizing and depressing neurons in resting and locomotion states would help to clarify this. Same for fractions of responsive neurons of each type in each condition. E.g., the slow phase is harder to judge from the plots, but the DeltaF/F integral shown in Figure 1G seems to suggest the difference in response magnitude between sensitizing and depressing neurons is largest in locomotion state, rather than resting state. How do these integrals look for inferred firing rates shown in Figure 2?

      We will further explore the response dynamics of adaptive types within the locomotion and resting state, highlighting the differences between calcium signals and inferred spikes. We will then include our findings in the new version.

      (4) There is something counterintuitive about how the changes in inhibition onto sensitizing and depressing neurons during locomotion explain the reported activity changes.

      Sensitizers receive reduced SST input and increased PV input during locomotion. If SSTs depress and PVs sensitize (and this is the main reason why sensitizers, which receive dominant input from SSTs sensitize, and vice-versa), how is it possible that this switch does not alter the sensitizing or depressing nature of these neurons' responses in locomotion? Are these changes insufficient to flip the dominant SST-PV drive? Figure 6D-E seems to show there is a flip, at least for sensitizers. How do authors explain this? Do authors think this is related to the narrowing of the adaptive index distribution shown in Figure 1C?

      This result is only counterintuitive if we consider exclusively the internal connections within V1. The PV:SST ratio changes from 0.9 during rest, dominated by SST induced sensitization, to 1.2, dominated by PV depression. Although adaptation is strongly driven by the opposing inhibition of PV and SST in PCs during locomotion, its origin is more easily explained by an external input (SS) that targets VIPs, PVs and PCs. As a result, when locomotion increases the drive coming from SS input, it injects a source of sensitization that partly balances the decrease in PV:SST ratio, preventing a switch in their adaptive properties which, although reduced, remain sensitizing. We will include these calculations in the revised version.

      (5) Presentation of the experimental data and the model.

      The manuscript introduces the results of interneuron recordings during the description of the model. Similarly, the results of optogenetic manipulations are presented inside the model's description. It would be clearer to present all experimental data first and introduce the model later, fitting it to all experimental evidence previously presented.

      We understand that a clear separation between experimental and modelling results is often preferred in papers that combine these approaches but in our case modelling and experimental data are highly interdependent and we believe that an overlapping presentation make it easier for the reader to appreciate the links. One example is Fig. 2G-L that shows experimental results validating a key feature of the model - the use of average response dynamics for each population of interneuron. Similarly, the results in Fig. 3 validate the use of the VIP response dynamics as the template for the slow modulatory input to layer 2/3. Then the results of optogenetic experiments in Fig. 4 are used to narrow down fits to the model. For these reasons, we have chosen to present experimental results and the model in this more integrated manner.

      Reviewer #2 (Public review):

      In the model, they postulate that synapses within the 6-cell-type network - sensitizing, intermediate, and depressing E cells, and PV, SST, and VIP I cells - and from three sources of external input to each of the six types all change between rest and locomotion (except that connections between the E cells don't depend on their types). There are a lot of degrees of freedom, and this makes interpretation of the results difficult. I would have liked to have seen more efforts to constrain the degrees of freedom. For example, there seems to be very little difference between the three E cell types in any of the three types of external input received. Why not constrain them all to get the same external input and see if it significantly affects model fit? Or what if synapses from the three types of external input are left unchanged, and only change their strengths between rest and locomotion? How well could this do? During optimization, why not constrain the changes between rest and locomotion, for example, by putting an L1 penalty on the changes or the relative changes, trying to force them to be sparse, and see whether there are roughly equally good fits? And then, if the main changes are in a small set of synapses, can the authors isolate changes to that small set and do roughly equally well? What about looking at the principal components of the weight changes across models, to isolate patterns of change that are most important?

      To reduce the number of degrees of freedom and ease interpretation we did limit the model fitting for adaptive subtypes by fixing the PC-PC (𝑤<sub>𝑃𝐶_𝑃𝐶</sub>) and restricting the external inputs weights (𝑤<sub>𝐹𝐹_𝑃𝐶</sub>, 𝑤<sub>𝑆𝑆_𝑃𝐶</sub>, 𝑤<sub>𝐹𝐵_𝑃𝐶</sub>) to changes of ± 10 %. We will explicitly explain these constrains in the methods and discuss its limitations.

      We thank the reviewer for their suggestions of testing different conditions to find those providing the best fit for sensitizing and depressing PCs. We tried an approach similar to that described by Dipoppa et al. 2018 by using the locomotion weights as initial conditions for the rest traces and introducing penalties at later stages. However, the local optimization algorithms failed to reach distant regions of parameter space containing minimum solutions for the rest condition. We finally opted for repeating the same process of initial condition searching for locomotion and rest, making the L1 penalty approach impracticable in our case. We believe this approach is effective because it has both allowed us to describe circuit changes during internal-state transitions (the present paper) and, more recently, it has made a series of predictions about different learning states that have been confirmed by optogenetic tests (Hinojosa et al., 2026). We will nevertheless explore this and other of the reviewer suggestions to further optimize the fitting in the revised manuscript.

      In terms of comparing to previous works, when optogenetic manipulations of SST and PV are done to test various hypotheses, I would like to see some discussion of what is already known from the authors' 2022 paper and what they are adding or testing that wasn't known or tested from that paper. And Dipoppa et al (2018) also found weight changes to account for the difference between rest and locomotion. They were looking at a fixed point of responses of neurons across retinotopic space to stimuli of various sizes with only one E-cell type, whereas they are accounting for trajectories across time considering 3 E-cell subtypes but without variation in stimuli or retinotopic position of neurons, so the efforts are somewhat different, but still, it would be good to see a bit more discussion of what is in agreement or in contradiction in the conclusions.

      Thanks for this prompt. We will add further discussion of this work in light of the Heintz et al. (2022) and Dipoppa et al. (2018) papers.

      (1) The main result is that sensitizers increase their responses with locomotion ~2X (for dF/F) or about 3.5X (for spikes) more than depressors. But there are other differences between sensitizers and depressors, for example sensitizers have smaller initial stimulus responses at rest, and depressors have larger. What if cells were divided into tertiles by initial stimulus response at rest? Would the authors see the same differences in the effects of locomotion? If so, can they establish whether the difference is really attached to the adaptation properties rather than to, for example, the initial responses, for example, by comparing the regression of response increase against AI vs the regression of response increase against initial resting response? And there might be other controls to be done for other features in which sensitizers and depressors differ.

      We will explore the possibility that initial response influences the increase in amplitude. Preliminary data suggest that initial amplitude is higher in depressors than in sensitizers.

      (2) Lines 103 and following: the authors refer to a "second notable change" which is the narrower distribution of adaptive effects, but I think this is trivial. The adaptive index is AI=(R1-R2)/(R1+R2), where R1 is response 0.5-2.5s after stimulus onset and R2 over 8-10s. But if the change is additive, as suggested by the dF/F figures (and I believe the distributions of AI here are based on dF/F measurements) -- adding the same constant to R1 and R2 will shrink |AI| without changing the sign of AI. So this would seem to just be a signature of a change that is primarily additive rather than multiplicative.

      Also, if the authors do decide that they are going to focus on spikes after showing the raw dF/F, then this analysis should be repeated for spikes.

      We agree with the reviewer and will change the text accordingly to highlight the additive nature of the change in amplitude. We will also show the analysis with spikes (this shows similar results as the calcium data).

      (3) Figure 2, F is supposed to be D minus E, but it doesn't look like it. For example, the initial response under locomotion is very similar in sensitizers and depressors, so the initial difference in F should be small, but it's not; and at rest, depressors initially have larger responses than sensitizers, whereas later depressors have smaller responses than sensitizers, yet the difference at rest is positive at all times. Something seems wrong here.

      We apologize for the confusion this has caused. Figure 2F does not represent the difference between sensitizing and depressing PCs from panels D and E. Instead, it shows the time-varying difference between locomotion and rest states of sensitizers (blue, in figure 2D) and depressors (green, in figure 2E). Thus, panel F shows within-population modulation by behavioural state, rather than differences between sensitizing and depressing neurons. We will amend the figure legend and main text to explain this point and avoid misinterpretation.

      Reviewer #3 (Public review):

      (1) Key concern is the usage of dF/F signals for all analyses, especially when comparing responses.

      (1a) Figure 1G: Comparison of sensitisers and depressors. It is important to consider what the baseline rates are when making these comparisons, especially when comparing the degree of effects between different cell types. For example, if baseline rates for sensitizers were overall higher, it would mean the difference in gain of response would be lower, and could affect the results in the opposite direction of what is claimed. One option to account for this would be to z-score the overall responses, using the same normalization for locomotion and rest. We also suggest plotting differences in sensitisers, intermediates, and depressors as a function of firing rate. Matching for firing rate across each PC categorization and calculating delta AI for each matched firing rate bin.

      (1b) Figure 2A-F: The above is an even more significant issue when it comes to estimating spiking rates. The methods do not state how dF/F is calculated. If these are based on using the pre-stim as the reference, the algorithms for spike rate used might not be appropriate if this were used. Using pre-stimulus referencing could result in the estimate going into the wrong range in the calculation of the spike rate.

      (1c) In both cases above, it could be a problem if baseline firing rates are different between cell types, or states (locomotion/stationary). The latter is established to have effects on many cell types measured, and so needs to be account ted for very carefully.

      The DF/F0 trace was calculated using the mode of the whole trace as F0. While this approach is less sensitive to biases than subtracting the pre-stimulus, it does not consider noise levels like the z-score suggested by the reviewer. We will, therefore, normalize the calcium traces to z-score to further account for changes in the baseline. Spike inference using MLSpike, however, explicitly models baseline noise and subtracts its effect from that of the spikes calculated from the calcium signal (Deneux et al., 2016). This transformation preserved the difference in amplitude triggered by locomotion between depressing and sensitizing PCs while revealing their similar baseline activity (see Figs. 2D,E and F). These results indicate that the distinct changes in response amplitude between sensitizing and depressing PCs during locomotion are not driven by baseline differences. We will add this explanation to the methods section.

      We will also plot the changes in activity with locomotion across cell types as a function of firing rate and add these results to the revised manuscript.

      (1d) It would be informative to see per-neuron comparison for adaptive indices during rest and locomotion states. This could be visualized using a scatter plot with AI-rest vs. AI-locomotion for Figures 1D- 1F and 2J- 2L.

      (1e) Are neurons more strongly modulated between locomotion and rest, also more likely to experience a shift in AI indices (i.e. delta AI). Is there a correlation between the change in firing rate between behavioral states and Delta AI (Loco-Rest)? If so, is this present for all neuron subtypes (e.g. VIP, SST, and PV)?

      Sorting was carried out separately on locomotion and rest data sets to capture the adaptive properties of the network under each condition. When assessing the change in adaptive index in individual cells there was a weak but significant correlation (r = 0.10, p<0.05), probably due to trial to trial stochasticity in the network which has been shown to be present in V1 (Carandini, 2004; Lee et al., 2010). Although adaptation profiles of individual PCs are not fully conserved across rest and locomotion, the observed overlap exceeds that expected by chance, suggesting that stochastic fluctuations modulate an underlying, stable circuit organization. Despite including the stochastic component of the responses, the conclusions hold: sensitizers undergo a larger gain modulation than that of depressors. We will include this analysis and the correlation between change in firing rate and Delta AI in the revised version of the paper.

      (1f) Optogenetic inhibition of VIP neurons on average abolished the slow depressive effects of adaptation in SST (Figure 3). The strength and prevalence of this effect are unclear. Perhaps one can perform a bootstrap control and opto AI indices and calculate whether AI was significantly reduced following optogenetics inhibition, and if so, on average, how likely was this to occur for the recorded SST neurons? This is important in knowing that the average effects (Figure 3D) aren't driven by a portion of SST neurons, especially as this is later used to confirm the region of parameter space and affects the subsequent results in Figure 4.

      The strength and prevalence of the effect are reflected in the distribution of AI changes across SST neurons, which is centred at AI = -0.3 ± 0.3, indicating a consistent reduction in AI across the population instead of being driven by a small portion of SST neurons. To further clarify this, we will report the proportion of SST neurons showing a reduction in AI and include statistical analyses on the changes.

      (2) Statistics for the effects. There is a mention of Liner mixed models, but no information is given on the actual models being used and tested. This is particularly for the case of Figure 1G, where there is a composition of effect sizes between different populations. What precise significance test is being used? Are the stats on paired cells when considering locomotion and rest?

      We used Linear mixed models to test for statistical significance between different conditions composed of hundreds of cells from several mice, i.e. nested analysis (cells nested within mice; see (Judd et al., 2017)). For analyses such as Fig. 1G, we considered locomotion state, adaptive type and their interaction (loco’adap) as fixed effects and mouse number as the random effect. The p-values depicted in the legend indicates the interaction between locomotion and adaptive type, i.e. the increase in amplitude during locomotion is significantly different in sensitizers compared to depressors with p < 0.0001. We will revise the method section and figure legends to explicitly describe the model and statistical test used.

      (3) Model parameters: It is acknowledged that there is a large range of parameters that can model the responses effectively, up to 11% of initial conditions. At 9000 initial conditions, this is around 1000. The parameter estimates are then considered as the mean of each parameter. This seems like a strange choice for a few different reasons:

      (3a) A mean solution might not be one of the solutions. Let's say the parameters range over a large dimensional space. They could occupy non-overlapping / discontinuous subspaces. In that case, the mean parameters do not necessarily fall within the solution subspaces. Therefore, this reduction to means might not be valid.

      (3b) Compare distributions rather than means. There are multiple distributions of parameters between conditions. All stats should be on the comparison of distributions rather than just the means.

      To test for the presence of subsets of solutions grouped around different parameter values we plotted the distribution of each parameter across all the good solutions found. Most of the weights were a gaussian distribution centred around the mean and, most importantly, none of them had two peaks. Furthermore, after computing the mean weight values we plotted the solutions given by them in the model, and it rendered a good fit as shown in the figures. We will include those distributions in the new version and base the overall comparison on these distributions.

      (4) Visualizing weight matrices: It is very challenging to interpret the weight matrices. Furthermore, it appears that the stationary and locomotion conditions fit independently, and given the large parameter spaces, it is even harder to interpret. Can the fitting instead be done by fitting on one and using those at the initial conditions for the other state? Figure 7 shows an initiative cartoon, but it is not clear how the matrices in Figures 5 and 6 lead to the summary shown in Figure 7. It is also not clear why the connections between inhibitory neurons are not shown in Figure 7. One option is to perhaps run some kind of dimensionality deduction on the parameter space to better interpret the data. When showing deltaWeights, was the model initialised with 'Rest' weights and allowed to change? It is not obvious what the difference is between 'relative change in connection weights' and 'relative change in synaptic weights'.This needs to be clarified.

      Thanks for raising this concern. We will firstly try to make the weight matrices clearer to interpret.

      Regarding the fitting of rest and locomotion conditions, we fitted the locomotion traces first and used those solutions as initial conditions for the rest traces. However, this rendered no good solutions as minimums in the parameter space were too far from the initial starting points. We opted, therefore, for repeating the same process of initial condition searching for locomotion and rest. This approach is less biased in satisfying our aim of finding solutions that fit the data and can explain their dynamics, which are different for each condition. We believe this approach is effective, as not only has it allowed us to describe circuit changes during internal-state transitions but has also made a series of predictions under different learning states that were confirmed by optogenetic tests (Hinojosa et al., 2026).

      We simplified Fig. 7 for clarity but we will make it more accurate and explain it more in detail in the legend, including connections between interneurons.

      Interpreting high-dimensional parameter spaces can be challenging. In this study, we focused on low-dimensional summaries of the parameter space (e.g., average connection weights and their distributions across populations), which revealed consistent and interpretable differences between sensitizing and depressing neurons. Importantly, our conclusions do not rely on individual parameter values, but rather on systematic differences across populations that are robust across solutions. Additionally, we ran clustering analysis and found that there is no parameter that can be removed. We focused, therefore, on the larger and more robust differences. We will explore additional dimensionality reduction approaches and include these results if they provide further insight beyond the current analyses.

      Finally, the change in weights was calculated with equation 4, in which the weight from locomotion and rest, obtained through independent fits, were used to calculate the relative change from rest to locomotion. These were either connection weights (equation 2) which consider the strength of the connection between cell j and i, or synaptic weights (equation 3) which express the weight of individual synapses by dividing connection weights by the number of presynaptic cells and probability of connection. This distinction arises because we used average traces from all the neurons imaged to fit the model, requiring considering the number of cells to know the strength of individual synapses. We will add this explanation in the results and methods sections.

      (4a) Model parameters were reduced differently for locomotion and rest (Figure 4). We suggest evaluating the results for locomotion and rest using the same chi-square value of 3 for both behavioral states (at least in controls).

      Thank you for this prompt, this is an important point that we tried to resolve during our analysis. We used the reduced chi-square () to evaluate model fits within locomotion and rest condition independently. As defined in equation 12, reduced chi-square is inversely proportional to the standard error of the data which is higher in the rest dataset. As a consequence, setting the same threshold across conditions would not correspond to an equivalent goodness-of-fit criterion, and would impose a disproportionately strict constraint on the condition with lower variability, where deviations between model and data are more heavily penalized. For this reason, we used condition specific thresholds to ensure comparable fit quality relative to the noise level in each condition. In addition, to enable direct comparison across conditions independent of their noise levels, we used the RMSE as a complementary metric.

      References

      Carandini, M. (2004). Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS Biol, 2(9), E264. https://doi.org/10.1371/journal.pbio.0020264

      Chance, F. S., Nelson, S. B., & Abbott, L. F. (1998). Synaptic Depression and the Temporal Response Characteristics of V1 Cells. The Journal of Neuroscience, 18(12), 4785–4799. https://doi.org/10.1523/JNEUROSCI.18-12-04785.1998

      Deneux, T., Kaszas, A., Szalay, G., Katona, G., Lakner, T., Grinvald, A., Rózsa, B., & Vanzetta, I. (2016). Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications, 7(1), 12190. https://doi.org/10.1038/ncomms12190

      Dipoppa, M., Ranson, A., Krumin, M., Pachitariu, M., Carandini, M., & Harris, K. D. (2018). Vision and Locomotion Shape the Interactions between Neuron Types in Mouse Visual Cortex. Neuron, 98(3), 602–615.e608. https://doi.org/10.1016/j.neuron.2018.03.037

      Heintz, T. G., Hinojosa, A. J., Dominiak, S. E., & Lagnado, L. (2022). Opposite forms of adaptation in mouse visual cortex are controlled by distinct inhibitory microcircuits. Nature Communications, 13(1), 1031. https://doi.org/10.1038/s41467-022-28635-8

      Hinojosa, A. J., Dominiak, S. E., Kosiachkin, Y., & Lagnado, L. (2026). Distinct Disinhibitory Circuits Link Short-Term Adaptation to Familiarity and Reward Learning in Visual Cortex. bioRxiv, 2026.2003.2024.713929. https://doi.org/10.64898/2026.03.24.713929

      Jin, M., & Glickfeld, L. L. (2020). Magnitude, time course, and specificity of rapid adaptation across mouse visual areas. J Neurophysiol, 124(1), 245–258. https://doi.org/10.1152/jn.00758.2019

      Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power. Annu Rev Psychol, 68, 601–625. https://doi.org/10.1146/annurev-psych-122414-033702

      Lee, J., Kim, H. R., & Lee, C. (2010). Trial-to-trial variability of spike response of V1 and saccadic response time. J Neurophysiol, 104(5), 2556–2572. https://doi.org/10.1152/jn.01040.2009

      Rupprecht, P., Carta, S., Hoffmann, A., Echizen, M., Blot, A., Kwan, A. C., Dan, Y., Hofer, S. B., Kitamura, K., Helmchen, F., & Friedrich, R. W. (2021). A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imaging. Nat Neurosci, 24(9), 1324–1337. https://doi.org/10.1038/s41593-021-00895-5

      Varela, J. A., Sen, K., Gibson, J., Fost, J., Abbott, L. F., & Nelson, S. B. (1997). A Quantitative Description of Short-Term Plasticity at Excitatory Synapses in Layer 2/3 of Rat Primary Visual Cortex. The Journal of Neuroscience, 17(20), 7926–7940. https://doi.org/10.1523/JNEUROSCI.17-20-07926.1997

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you very much for the positive and constructive feedback on our manuscript. We have revised the manuscript accordingly and have added a substantial number of additional experiments and have extended the data.

      Questions of the reviewers were focused mostly on mechanical insight into organoid formation, touching following aspects of lens organoid formation presented in the manuscript:

      - Cellular arrangements/re-arrangements during the process of lens formation including potential contribution of differential adhesion-mediated cell sorting to the cellular arrangement in the organoid and characterization of individual contributions of lens- and retina- committed progenitors to this process.

      - Activity of BMP and FGF signaling pathways during organoid formation, namely identification of tissue responding to the signaling withing forming organoids.

      - Contribution of externally supplemented Matrigel to the differentiation process and cellular arrangements in ocular organoids. 

      To address those points in detail we included additional experiments that are now presented in revised version of the manuscript, namely in revised Figure 2-figure supplement 1 (addressing contribution of Matrigel); new Figure 4-supplement 1/Video S5 (addressing contribution of differential adhesion-mediated cell sorting); revised Figure 4/Video S6/Video S7 (addressing contribution of lens-committed progenitors); revised Figure 6 (addressing BMP and FGF signaling pathway activities).

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing.

      To address this point, we included additional experiments in the revised manuscript. As proposed by the reviewer, we performed dissociation and re-aggregation experiments of day one organoids at the timepoint, when retinal cell fate is already established and first cells with early lens fate (Foxe3::GFP positive) start appearing (see new Figure 4-figure supplement 1).

      After dissociation we followed Foxe3::GFP cells over time and observed that initially equally dispersed GFP<sup>+</sup> lens-committed cells gradually sort and establish contact with other GFP<sup>+</sup> cells, ultimately resulting in the formation of a central GFP<sup>+</sup> sphere within a retinal neuroepithelium (AcTub<sup>+</sup>) localized on the surface of the organoid (see new Figure 4-figure supplement 1e and new Video S5). This data show that differential adhesive properties of lens/retinal precursor cells can enable the formation of a spherical lens in the center of the organoid. This is now clearly stated in the revised version of the manuscript. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLife). We provide evidence that the formation of cup-looking structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at specific regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the “cup-like” shape is acquired by an extrusion-like process of the lens from the center of the organoid.

      To address the cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4d, new Videos S6, S7 and S8). Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) established in the periphery display repeated short distance movements restricted to the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina. In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery. These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4d, new Videos S6, S7 and S8) in the revised version of the manuscript.

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to FGF signaling. To do so we analyzed the presence of phosphorylated ERK (pERK1/2) as FGF signaling target in ocular organoids from day 1 to day 2. At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure 6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain) (at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure 6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6d).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that clarify the fate of those cells with the required certainty. Rather than speculating, we are currently following up on that question by scRNA sequencing, however we see that beyond the scope of the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      The referee is right, figure 5e indicates the thickness of the cell sheet expressing Rx3 positioned at the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript. We have taken care to remove ambiguities related to that point in the revised version of the manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question.

      To address this point, we labeled Noggin treated organoids at day 2 and day 3 with forebrain and olfactory placode markers. We could identify an increase in the domains expressing Lhx2, HuC/D and Otx2 in Noggin-treated organoids, showing a shift of the preferential differentiation of the neurons of anterior forebrain identity (see attached figure for reviewer). However, the available markers Lhx2, HuC/D and Otx2 found in the olfactory placode are in addition also co-expressed in further neuronal cell types of the anterior forebrain. While the speculation is tempting, the shift in expression does not allow to conclusively state the expansion of the olfactory placode.

      Author response image 1.

      Expression of forebrain and olfactory placode markers.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how the optic cup-like arrangement of ocular organoids is achieved on the cellular level is indeed highly interesting. In the revised manuscript we now provide evidence that the formation of cup-like structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at distinct regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the cup-like shape is acquired by an extrusion process of the lens from the center of the organoid.

      To address cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4e, new Videos S6, S7 and S8).

      Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) display repeated short distance movements within the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina.

      In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery.

      These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4e, new Videos S6, S7 and S8) in the revised version of the manuscript.

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      We agree with the reviewer that this is a highly interesting question and in the revised manuscript we followed the advice and dedicated a part of the discussion to this topic. We believe that the arrangement is due to the induction of central lens fates by signal emanating from the retinal epithelium and discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments addressing the target tissues of FGF and BMP signaling in the organoid have been provided in response to Reviewer #1. Interfering with FGF signaling that is essential for lens fiber cell differentiation interestingly did not impact on the lens size arguing against an immediate proliferative effect. Although the analysis of the respective proliferation rates at the surface or in the central region of the organoid might show some differences, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on the acquisition/specification of Foxe3-expressing lens placode progenitors. In the absence of Foxe3-expression, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). Organoids that do not have a lens, do not contain Foxe3-expressing cells.

      In the absence of a lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup like shape (for details of such phenotypes please see Zilova et al., 2021, eLIFE). We took care to state that clearly in the revised manuscript.

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock). How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLife). The absence of the structure of the retinal epithelium indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon in the absence of Matrigel (Figure 3-figure supplement 1d-e). Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions, with and without Matrigel supplementation, Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. HEPES is mainly used to regulate the pH of the culture media which on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which goes beyond the scope of the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      We followed the reviewer’s advice and have included a systematic analysis of the contribution of ECM (Matrigel) to the process of lens formation. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium in turn indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon by the absence of Matrigel (Figure 3-figure supplement 1d-e).

      Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions (with and without Matrigel supplementation), Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed intriguing and currently under investigation. HEPES is mainly used to adjust the pH of the culture media, which, on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which clearly goes beyond the scope of the current manuscript.

      The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      We have taken care to show according stages in embryo and organoid side by side. We provide additional data to highlight the expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens and lens placode) markers in earlier developmental stages. For the presumptive eye field within the region of the anterior neural plate (S16, late gastrula) Rx3 represents one of the earliest markers (see revised Figure 3-figure supplement 1). Already before an apparent lens placode is formed (see revised Figure 3d) Foxe3::GFP expression is detected within the presumptive lens ectoderm, demonstrating that Foxe3 is ideally suited as an early marker for placodal progenitors in medaka. The onset of Rx3 and Foxe3-driven reporters is clearly early enough to support the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids now represented in the revised figures.

      The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Assessing the activity of BMP and FGF signaling (cross-reference to Reviewer #1) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to BMP and FGF signaling. To do so we analyzed the presence of phosphorylated SMAD1/5/8 (pSMAD1/5/8) and phosphorylated ERK (pERK1/2) as BMP and FGF signaling target in ocular organoids from day 1 to day 2. BMP signaling activity was detected in the center (region of establishment of lens-committed progenitors (Figure 3e)) of the organoid at day 1 (see revised Figure 6a). At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure S6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain, at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure S6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6b).

      Related to the presence of the corresponding ligands we can state that they are indeed expressed in the organoids at the matching stages based on RNA seq and RT-PCR analyses, however we could not find them specifically localized. This may be due to a widespread, ubiquitous expression or may simply relate to technical problems.

      While we can state with confidence that the ligands are present at the relevant time points and trigger the downstream pathways in a localized manner, the question whether the response is due to a localized signal or localized competence remains to be addressed.

      The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the shift of the lens in vivo is indeed very relevant suggestion and we have taken care to address this in the revised manuscript.

      To clarify this process, we included additional experiments and followed the movements of lens cells (see new Figures 4c, 4d and 4e, new Videos S6 and S7). Foxe3::GFP lens progenitor cells were found to actively move over long distances from center to the organoid periphery. This movement was accompanied by profound cell shape changes of lens progenitor cells with the active extension of lamellipodia and filopodia strongly arguing for an active movement of lens cells to the organoid periphery (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their time and consideration of the manuscript. We have added new data to Figure 5 (Figure 5a) to address concerns regarding the conservation of the Hsp70 phosphorylation in yeast. Additionally, we have changed the title of the manuscript to “Hsp70 is phosphorylated in a conserved response to DNA damage and contributes to cell cycle control” to more accurately represent the conclusions we draw.

      Public Reviews:

      Reviewer #1 (Public review):

      The strength of evidence of the mechanistic and "conserved checkpoint" claims that this site is directly activated by DNA damage is inadequate and fundamentally incorrect.

      We respectfully disagree with the reviewer’s characterization of our conclusions. Our data demonstrate that DNA damage induces this phosphorylation in a cell-cycle–dependent manner. We do not claim to have defined the direct kinase or full mechanistic pathway; rather, we establish that site activation is damage-responsive and functionally linked to cell-cycle regulation. Consistent with this, phospho-mutants in yeast exhibit clear cell-cycle defects, supporting a conserved functional role. We address each of the reviewer’s specific concerns below.

      Specific comments:

      (1) Activation of T495:

      The author's premise for the site being activated by DNA damage is Albuquerque et al, where PTMs on MMS treated yeast are analyzed. T492 (the yeast equivalent of human T495) is observed as phosphorylated. However, the authors fail to note that there is no untreated sample analysis in this study, and it is likely that T492 phosphorylation is also present in untreated cells. This is also backed up by later evidence from the same lab (Smolka et al), where they do not identify T492 as being dependent on Mec1/Tel/Rad53 kinases.

      We agree with this assessment of the Albuquerque study. Accordingly, we used their data to generate the hypothesis that this site is phosphorylated, and we took it upon ourselves to more rigorously demonstrate phosphorylation with appropriate controls. The validated antibody that we had previously generated[1] to track pHsp70 was the enabling technology to directly track this phosphorylation event. We now directly show phosphorylation of this site (Figure 5a, lines 276-284). Of note, as Reviewer 1 suggested, there is a smaller amount of pHsp70 in the untreated cells, which corresponds with findings from Holt et al 2009 [2]. This could reflect a baseline role of Hsp70 phosphorylation for normal growth that is accentuated upon MMS insult.

      (2) The kinase(s) directly responsible for T495 phosphorylation are not identified. Instead, the authors show that knockdown or pharmacological inhibition of DNA-PKcs, ATM, Chk2, and CK1 attenuate pHsp70.

      We agree with reviewer 1 that identifying the direct kinase would be an exciting finding, and we believe our manuscript will provide the foundation for future studies to address these questions. While these findings will be impactful, we do not believe their lack detracts from the observations we have made.

      (3) ATM siRNA knockdown has no effect, while ATM inhibitors do, which the authors acknowledge but do not resolve. This discrepancy raises concerns about off-target drug effects.

      We agree with reviewer 1 that off-target drug effects are always a concern when employing pharmacological inhibitors. To that end, we tested structurally distinct inhibitors of ATM (Figure 3b) to decrease the likelihood of the same off target effect. While complementing this with a genetic knockdown would be ideal, the discrepancies between pharmacological and genetic inhibition of ATM have been well reported (lines 214-216).[3,4] Parallel discrepancies in other kinases have been mechanistically explored by other groups.[5] The preponderance of pharmacological evidence in conjunction with RNAi suggests the most likely interpretation of our data is that ATM is involved in signaling upstream of Hsp70 phosphorylation. Thus, our data compel future work to use more sophisticated genetic methods to more specifically determine how ATM connects with pHsc70.

      (4) No in vitro kinase assays, motif analysis, or phosphosite mapping confirming these kinases as direct T495 kinases are presented. Thus, the proposed signaling cascade remains speculative.

      We agree that we should carefully circumscribe our conclusions about the potential signaling cascade. To communicate our conclusions more clearly, we rewrote lines 223-226 to highlight that our findings implicate these kinases in upstream signaling rather than direct phosphorylation of Hsp70.

      (5) Smolka and many other labs characterized DDR sites as SQ/TQ motifs, and T492 doesn't fit that motif.

      We agree, and our response to comment 4 addresses this point. Briefly, we do not claim that Hsp70 is a direct target for DDR. Notably, the SQ/TQ motifs mentioned specifically pertain to ATM and DNA-PK[6], though we would like to note several studies have demonstrated DNA-PK phosphorylation outside of these motifs.[7] Chk2 and CK1 do not prefer SQ/TQ motifs[9]. Additionally, Chk2 is known to phosphorylate non-consensus sequences as well[10].

      (6) No genetic tests in yeast (e.g., BER mutants) are used to connect Ssa1 T492 phosphorylation to BER in that system, despite the strong BER-centric model.

      We agree that it would be interesting to study BER mutants in yeast, and we believe this will be an exciting prospect for future studies to better establish the signaling cascade. We have included a Western blot (Figure 5a) showing that MMS treatment causes increased Hsp70 phosphorylation in yeast. MMS damage is repaired through BER in S. cerevisiae,[11] and the pathway itself is highly conserved.[12] Our experiments demonstrate that the phosphorylation of Hsp70 occurs as a conserved response to alkylation damage, which is the major conclusion of our paper.

      (7) Overexpression of MPG gives only a modest increase in pHsp70, while APE1 overexpression has no effect, and Polβ overexpression does not decrease pHsp70. These mixed results weaken the central claim that Hsp70 phosphorylation is a tuned sensor of BER burden.

      We appreciate this incisive question. Though not immediately intuitive, we do not believe these results are necessarily ‘mixed’. The lack of APE1 over-expression having an effect could be attributed to APE1 activity being necessary for the phosphorylation, but not rate-limiting. Regarding Polβ, it is important to note that not its binding, but rather its dRP lyase activity is rate-limiting in base excision repair.[13] As such, if binding sites are already saturated or near saturated, but the lyase activity remains slow, we may not observe a decrease in BER intermediates. While we do claim that phosphorylation of Hsp70 is triggered by BER intermediates (lines 193-194), we do not claim that pHsp70 is a tuned sensor of BER burden.

      (8) A major concern is that pHsp70 is only convincingly detected after very high, prolonged MMS (10 mM, 5 h) or 0.5 mM arsenite treatments. Other DNA-damaging agents (bleomycin, camptothecin, hydroxyurea) that robustly activate DDR kinases do not induce pHsp70. This suggests to me that the authors are observing a side effect of proteotoxic stress. This is likely (see Paull et al, PMID: 34116476).

      Our data indicate that pHsp70 specifically occurs downstream of base excision repair. Therefore, it is not surprising that drugs that do not activate BER (bleomycin, camptothecin, hydroxyurea) do not elicit the same response. While pHsp70 may arise due to DSBs generated through BER, the fact we do not see phosphorylation after bleomycin treatment could be explained by the cell-cycle dependencies we report (Figure 4e). It is also important to note that MMS-induced pHsp70 occurs primarily in the nucleus, and Western blots of whole cell lysate will contain large amounts of cytosolic Hsp70 that could dilute the signal. Indeed, in our nuclear extraction (Figure 4d), we see faint pHsp70 signal as soon as 1 h after treatment, though it increases in robustness as the time-course progresses. These data are both concordant with a model in which high BER-induced lesion burden in mitosis leads to Hsp70 phosphorylation in late M/G1.

      We would like to add that, in the review article cited by Reviewer 1, the authors specifically cite studies implicating a loss-of-function in DDR pathways leading to increased proteotoxic stress (e.g. ATM deficient cells producing higher levels of aggregated proteins compared to WT). However, we find that inhibition of DDR kinases decreases, rather than increases Hsp70 phosphorylation. We thus believe that DNA damage rather than proteotoxic stress is the parsimonious cause of Hsp70 phosphorylation.

      (9) A recent study in Nature Communications (Omkar et al., 2025) demonstrates rapid phosphorylation of yeast T492 in a pkc1-dependent manner, diminishing the impact of these findings.

      We were excited to see this paper when it was published 3 months after we posted a preprint on bioRxiv, which was released three weeks after our submission to eLife. Rather than diminishing the impact of this paper, we believe that independent lines of evidence from different groups mutually reinforces the impact of the work. We have added a sentence to say that during the review of our work, this group independently observed this phosphorylation event in response to a different stress (lines 421-423). We believe in celebrating the scientific process arriving at consistent results, and the editorial policies of eLife reinforce that philosophy by offering ‘scoop protection.’

      We would also like to highlight several differences between the scope of our papers. The phosphorylation reported by Omkar et al. appears highly constrained to yeast as part of the Cell Wall Integrity pathway, whereas ours occurs as a more highly conserved response. Additionally, our paper provides additional biochemical insight into the consequences of this phosphorylation, which is lacking in Omkar et al. If anything, this paper highlights the important regulatory capacity of this residue on Hsp70, and suggests it may serve multiple functions in the cell.

      (2) Downstream Effects of T492/T495:

      (10) The manuscript's central conceptual advance is that pHsp70 is a cell-cycle-regulated brake on G1/S. Yet in mammalian cells, the authors show only that pHsp70 appears late, after cells have traversed mitosis, and that blocking CDK1 (G2/M) prevents its accumulation.

      We would like to clarify the central contribution of this study. Prior work identified this phosphorylation in yeast, but its existence and conservation in human cells had not been established. A primary advance of our study is demonstrating that this site is phosphorylated in mammalian cells and that its accumulation is cell-cycle regulated — coinciding with late M/G1.

      We further show that phosphorylation depends on cell-cycle progression, as CDK1 inhibition prevents its accumulation. While these data establish regulation, we agree that they do not by themselves define causality in mammalian cells. To address functional consequences, we leveraged the genetic tractability of S. cerevisiae. Phosphomimetic Ssa1 T492E increases the proportion of G1 cells in the absence of MMS and enforces a stronger G1 arrest following MMS treatment. Together, these findings support a conserved, cell-cycle–linked role for this phosphorylation and provide a foundation for future mechanistic work in mammalian systems.

      (11) There is no functional test in human cells: no knockdown/rescue experiments with T495A or T495E, no cell-cycle profiling upon altering Hsp70 phosphorylation state, and no demonstration that pHsp70 actually causes any delay in S-phase entry, rather than simply correlating with late damage responses. The strong conclusion that pT495 "stalls cell cycle progression" (e.g., Figure 6 model) is therefore not supported in the human system.

      We agree that we did not directly test the functional consequences of Hsp70 phosphorylation in human cells. Our intent was not to claim that we have demonstrated causality in the mammalian system, but rather to establish that this conserved phosphorylation exists in human cells and is cell-cycle regulated.

      We instead used S. cerevisiae to interrogate this due to its increased genetic tractability. In this system, phosphomimetic mutation increases the proportion of G1 cells under basal conditions and enhances G1 arrest following MMS treatment, mirroring the damage-associated phenotype observed in human cells. These findings support a conserved functional role for this modification, although we agree that direct mechanistic testing in mammalian cells will be important for future work.

      While we intended the cartoon model to be a speculative illustration of what may be occurring in order to motivate future studies. We now see how this may lead to confusion, so to improve clarity, we have removed Figure 6 from the manuscript.

      (12) All functional conclusions rely on T492A/E point mutants at the endogenous SSA1 locus, usually in an ssa2Δ background, in a family of highly redundant Hsp70s. Without showing that this site is actually modified during their MMS treatments, the assignment of phenotypes to loss of a physiological phospho-switch is premature. The authors need to repeat their studies in an Ssa1-4 background, as in https://pubmed.ncbi.nlm.nih.gov/32205407/.

      Thank you for this feedback. We have included a Western blot to Figure 5 (Figure 5a) addressing this comment. Briefly, we show that, in yeast, Hsp70 phosphorylation increases upon MMS treatment and is not detectable in the point-mutants in the ssa2∆ background. The latter data suggest that Ssa3-4 modification is negligible in our system.

      (13) The authors infer that T495E "locks" Hsc70 in a pseudo-open state based on reduced J-protein-stimulated ATPase activity, unchanged ATP binding, altered trypsin sensitivity, and retained tau binding. However, there is no direct comparison of phosphorylated vs T495E protein (e.g., via in vitro phosphorylation with LegK4 followed by side-by-side biochemical assays, or structural analysis). Thus, it remains unclear to what extent the glutamate substitution mimics a phosphate at this position.

      Previously we did show that phosphorylation impacts the ATPase cycle of Hsp70.[1] In this paper, with the phosphomimetic mutant we see an even greater decrease of activity. This is consistent with incomplete phosphorylation yielded by in vitro phosphorylation with LegK4.[1] Due to this incomplete phosphorylation in vitro, we determined that the phosphomimetic mutant would be more useful for the assays we performed, as they rely on bulk readouts.

      (14) No client release kinetics, co-chaperone binding assays, or in vivo chaperone function tests are provided, yet the discussion builds a detailed model of a "pseudo-open" state that simultaneously resembles ATP-bound conformation and allows persistent substrate engagement.

      We have shown that the conformational cycle of Hsp70 (T495E) is uncoupled from nucleotide state, and that the overall conformation resembles ATP-bound Hsp70. This is consistent with prior studies on AMPylation of the same residue.[14] Additionally, we demonstrate that substrate engagement is similar between WT and T495E. This is consistent with our previously published work showing increased pHsp70 on polysomes,[1] as well as our observations that the phosphomimetic mutant in yeast exerts a phenotype even in the presence of the compensatory isoform SSA2. This dominant-like phenotype is consistent with those seen in mutations locking Hsp70 in a ‘closed’ conformation.[15] We agree that future studies examining client release kinetics and co-chaperone binding would be useful for future structural studies validating and elaborating on our findings.

      Reviewer #2 (Public review):

      Weaknesses:

      The kinase(s) responsible for the phosphorylation have not been identified (and hence remain inaccessible to experimental i.e., genetic or pharmacological manipulation). The mechanistic links to DNA damage repair and the fitness benefits of this proposed adaptation remain obscure. Of greater concern, the data provided in the paper fail to exclude the trivial possibility that the phosphorylation event described (and characterized through biochemical proxies) is biologically neutral, reflecting nothing more than a bystander event in which kinase(s) activated by application of high concentrations of a powerful alkylating agent (MMS) phosphorylate, at meaninglessly low stoichiometry, an abundant protein (Hsp70) on a surface exposed residue. Failure to exclude this (plausible) scenario is this paper's weakness.

      We agree that we have not directly quantified the absolute stoichiometry of Hsp70 phosphorylation. However, several lines of evidence argue against the interpretation that this represents a biologically neutral, bystander modification.

      First, our pulse-chase experiment (Figure 4e) shows that, after MMS removal, pHsp70 levels increase as cells progress through the cell cycle. Notably, total Hsp70 levels remain constant. This indicates that the fraction of phosphorylated Hsp70 increases in a regulated, cell-cycle dependent manner, rather than through a bystander event during acute stress.

      Second, functional perturbation of the homologous site in yeast produces phenotypic consequences. The phosphomimetic Ssa1(T492E) mutant exhibits reduced growth, increased G1 accumulation, and impaired cell-cycle re-entry following MMS treatment (Figure 5). These phenotypes argue that the modification of this residue is functionally consequential.

      While the upstream kinase remains to be identified, the genetic and cell-cycle phenotypes observed upon site perturbation argue that this phosphorylation is functionally consequential.

      Reviewer #2 (Recommendations for the authors):

      (1) The biochemical characterization of the phosphomimetic mutation (T495E) is thorough, relying on ATPase assays and conformational analysis. Figure 1b demonstrates reduced J-protein-stimulated ATPase activity, and Figure 1d shows an ATP-like proteolysis pattern consistent with an open conformation. As the authors are well aware, Hsp70 chaperones act on their substrates via a dynamic cycle that includes binding, ATP hydrolysis, and conformational shifts. One wonders, therefore, at the relevance of the measurement shown in Figure 1f. While it is highly plausible that the T495E mutation mimics the phosphorylation event (BiP T518E mimics key aspects of AMPylation), the lack of a biochemical characterisation of Hsp70 with pThr495 is an important limitation of this paper. Even if such a preparation cannot be accomplished with the endogenous kinase(s) whose identity remains unknown, a characterisation of LegK4-phosphorylated Hsp70 should suffice.

      We agree with Reviewer 2 that the rationale for figure 1f does not logically follow the results of 1b and 1d. Rather, this experiment was motivated by the prior findings that phosphorylation of Hsp70 by L.p. lead to an increase occupancy on polysomes[1] (lines 137-139). We sought to better understand the discrepancy between this finding and our own by assaying the capacity of the T495E mutant to bind substrate.

      Reviewer 2 raises a valid point in that phosphomimetic proteins do not necessarily behave the same as truly phosphorylated proteins. Previous work from our lab characterized the ATPase activity and in vitro folding capacity of Hsc70 that had been directly phosphorylated by LegK4[1] (lines 114-115). We were motivated to turn to a phosphomimetic mutant as LegK4 only phosphorylates around half of the Hsc70 present in solution[1] (line 116); this mixture of species makes batch analysis difficult. As we had previously published with the in vitro phosphorylated Hsc70, we didn’t believe it necessary to include along with our future analyses.

      (2) As noted, the kinase(s) that phosphorylate T495 remain to be identified and is inaccessible genetically. The phenotypic consequences of impaired pThr495 are therefore assessed by a T495A mutation. This most certainly eliminates phosphorylation at that site however, Figure 5C shows quite clearly that the T/A mutation is not neutral. This is expected, given the role of an H-bond network centered upon the homologous residue in the ADP-bound configuration of Hsp70's. Importantly, the biochemical non-neutrality of the T/A mutation also compromises the interpretation of the associated phenotype, as this cannot be attributed solely to a loss of phosphorylation; it may reflect features of the T/A mutations exposed by MMS, but unrelated to the inability of the residue to undergo regulated phosphorylation.

      We appreciate this insightful critique. We agree that the alanine substitution may perturb the local H-bond network, and have added a sentence to our discussion to highlight this caveat (lines 379-381). That being said, our conclusions do not solely rely on the T to A mutant. The phenotypes observed in our phosphomimetic mutant overlap with the TA mutant (increased sensitivity to MMS; defects in cell cycle re-entry after MMS treatment) (Figure 5). While the alanine mutation may not represent a purely ‘loss-of-phosphorylation’ state, our findings do implicate the importance of this residue in cell cycle control after DNA damage.

      (3) It thus remains formally possible that pThr495 arises as an irrelevant side reaction due to activation of a kinase (with other relevant substrates).

      This dismal interpretation of the data would be dispelled somewhat if the stoichiometry of pThr495 were substantial, whereas very low stoichiometry of phosphorylation should leave one wary of the possibility that the surface-exposed Thr495 of ATP-bound Hsc70 is a physiologically irrelevant bystander target of a kinase activated in DNA-damaged cells.

      We have included a Western blot in Figure 5 showing pHsp70 in our yeast samples. Here we can see low abundance of Hsp70 phosphorylation in untreated WT yeast, with a clear increase in MMS treated yeast. Additionally, as mentioned in a previous response, Figure 4e shows the accumulation of pHsp70 in human cells even after MMS removal, indicating it is not simply the byproduct of over-activation of the DNA damage response.

      Unfortunately, the study does not quantify the stoichiometry of Hsp70 phosphorylation; detection relies on phospho-specific immunoblotting, leaving open the question of whether this modification occurs at physiologically significant levels. This worry is compounded by Figure 2a,f that suggests that phosphorylation occurs only under high-dose MMS or arsenite, raising concerns about physiological relevance.

      We agree that we did not quantify absolute phosphorylation stoichiometry. While a precise measurement would be informative, our conclusions are based on regulated dynamics and functional perturbations rather than magnitude alone. Specifically, our pulse-chase (Figure 4e) shows that total Hsp70 levels remain constant while pHsp70 increases in a cell-cycle dependent manner following MMS removal. This indicates a regulated modification rather than a side-effect of kinase over-activation during acute stress. Additionally, perturbation of the homologous site produces cell-cycle phenotypes (Figure 5) in yeast, supporting functional relevance.

      However, as mentioned in responses to Comment 3, our pulse-chase assay in Figure 4e indicates the stoichiometry of pHsp70 increases after MMS removal in a cell-cycle dependent manner. Furthermore, as discussed in response to Reviewer 1 Comment 8, Figure 4d highlights a technical limitation with regards to detection of pHsp70 by Western blotting. Namely, as pHsp70 accumulates in the nucleus, signal appears to be diluted by unmodified Hsp70 in the cytosol when whole-cell lysate is probed, thereby reducing detection capacity. It is therefore possible that less stringent doses do lead to phosphorylation, but due to the experiments being run in asynchronous cells and on whole cell lysate we failed to detect it.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Figure 1e - Which antibody was used to probe this blot?

      Thank you for catching this omission. This was stained with Coomassie. We have edited the figure legend to reflect this.

      (2) Figure 1c- Do the authors have the data of the WT and T495E with DJA2?

      The assay was performed with increasing concentrations of DJA2 for both constructs (from 0 µM to 4 µM) (lines 118-119, Figure 1c).

      (3) Figure 2- The labeling of the right side of the immunoblots is missing.

      We apologize for the confusion. The labeling is on the left. The lines on the right are intended to demarcate blots that came from the same membrane (for easier comparison of loading controls).

      (4) Figure 2d- Does MMS treatment lead to a heat shock response?

      We have not directly tested this. However, we do not see the massive upregulation of HSPs that would be expected from a heat shock response.

      (5) Figure 4c and e - Total protein level of some of the phospho-proteins is missing.

      We used housekeeping proteins as loading control. We do not have antibodies for all the non-phospho proteins. For those we have, blots not included in the publication do not show any marked discrepancies between the non-phospho form and the housekeeping proteins.

      (6) Figure S1A- Although the authors suggest that the phosphorylation event is reversible, they have not integrated it into the final model in Figure 6.

      In line 403 we postulate that dephosphorylation may permit client release. In the interest of clarity, we have now removed the model figure.

      (7) Yeast genotype is missing.

      We used W303a yeast (line 612).

      (8) It is unclear which phosphatase inhibitor was used in their assay (Figure S1A).

      We repeated the experiment with both Halt Phosphatase Inhibitor Cocktail (Thermo Scientific 78440) and Roche PhosStop (Roche 04906837001) (lines 524-525).

      (9) Please add this most recent and up-to-date reference (PMID: 40976416) related to your study.

      We have now added that reference

      (10) Can the authors speculate on whether Hsp70- T495E is expected to primarily reside in the nucleus?

      We have no data to indicate whether or not phosphorylation at T495 or a phosphomimetic mutation in this site would directly affect nuclear import or export. In cells expressing the Legionella kinase LegK4, pHsp70 exists in the cytoplasm,[1] indicating the phosphorylation in of itself does not force nuclear localization. We thus imagine that the nuclear localization seen in Figure 4d is more likely due to the location of the kinase rather than as a consequence of the phosphorylation. In an over-expression system or in the case of a genomic mutation, we believe the protein is most likely to exist in both the cytoplasm and in the nucleus, though we did not directly test this.

      References

      (1) Moss, S. M. et al. A Legionella pneumophila Kinase Phosphorylates the Hsp70 Chaperone Family to Inhibit Eukaryotic Protein Synthesis. Cell Host Microbe 25, 454-462.e6 (2019).

      (2) Holt, L. J. et al. Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325, 1682–1686 (2009).

      (3) Choi, S., Gamper, A. M., White, J. S. & Bakkenist, C. J. Inhibition of ATM kinase activity does not phenocopy ATM protein disruption. Cell Cycle 9, 4052–4057 (2010).

      (4) Menolfi, D. & Zha, S. ATM, ATR and DNA-PKcs kinases—the lessons from the mouse models: inhibition ≠ deletion. Cell Biosci. 10, 8 (2020).

      (5) Weiss, W. A., Taylor, S. S. & Shokat, K. M. Recognizing and exploiting differences between RNAi and small-molecule inhibitors. Nat. Chem. Biol. 3, 739–744 (2007).

      (6) Kim, S.-T., Lim, D.-S., Canman, C. E. & Kastan, M. B. Substrate Specificities and Identification of Putative Substrates of ATM Kinase Family Members*. J. Biol. Chem. 274, 37538–37543 (1999).

      (7) Jette, N. & Lees-Miller, S. P. The DNA-dependent protein kinase: A multifunctional protein kinase with roles in DNA double strand break repair and mitosis. Prog. Biophys. Mol. Biol. 117, 194–205 (2015).

      (8) O’Neill, T. et al. Determination of Substrate Motifs for Human Chk1 and hCds1/Chk2 by the Oriented Peptide Library Approach*. J. Biol. Chem. 277, 16102–16115 (2002).

      (9) Fulcher, L. J. & Sapkota, G. P. Functions and regulation of the serine/threonine protein kinase CK1 family: moving beyond promiscuity. Biochem. J. 477, 4603–4621 (2020).

      (10) Craig, A. et al. Allosteric effects mediate CHK2 phosphorylation of the p53 transactivation domain. EMBO Rep. 4, 787–792 (2003).

      (11) Xiao, W., Chow, B. L. & Rathgeber, L. The repair of DNA methylation damage in Saccharomyces cerevisiae. Curr. Genet. 30, 461–468 (1996).

      (12) Memisoglu, A. & Samson, L. Base excision repair in yeast and mammals. Mutat. Res.Fundam. Mol. Mech. Mutagen. 451, 39–51 (2000).

      (13) Srivastava, D. K. et al. Mammalian Abasic Site Base Excision Repair IDENTIFICATION OF THE REACTION SEQUENCE AND RATE-DETERMINING STEPS*. J. Biol. Chem. 273, 21203–21209 (1998).

      (14) Preissler, S., Rato, C., Perera, L. A., Saudek, V. & Ron, D. FICD acts bifunctionally to AMPylate and de-AMPylate the endoplasmic reticulum chaperone BiP. Nat. Struct. Mol. Biol. 24, 23–29 (2017).

      (15) Fontaine, S. N. et al. Isoform-selective Genetic Inhibition of Constitutive Cytosolic Hsp70 Activity Promotes Client Tau Degradation Using an Altered Co-chaperone Complement*. J. Biol. Chem. 290, 13115–13127 (2015).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Weaknesses:

      While the analyses were overall competently conducted and well-justified, I was not entirely convinced by a few methodological choices, specifically i) the computation of PAC surrogates, ii) details of the linear mixed-effects model, and iii) the electrode selection for linking phase-amplitude coupling to apnoea frequency.

      Thank you for your kind comments and helpful review of our paper. We have now clarified computation of PAC surrogates, added further details of the linear-mixed effects models and calculated the link between the strength of the cortico-respiratory coupling (phase-amplitude coupling) and apnoea rate with data acquired at all electrodes. We provide further details for each of these in response to your ‘Recommendations for the authors’.

      Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling. 

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      I did not identify any major weaknesses in the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

      Thank you for your useful comments. We have revised the manuscript to address these comments and improve the clarity.

      (1) We agree that proceeding data leading to the hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is limited. We have clarified in the introduction that adult studies have previously suggested that cortical motor activity may prevent hypoventilation and apnoea seen in patient groups. We have also added further clarification to our hypothesis. In the introduction we now state:

      “In adults with congenital central hypoventilation syndrome or obstructive sleep apnoea, a respiratory-linked increase in cortical motor activity suggests that the motor cortex plays an important role in maintaining autonomous respiration, with the authors postulating that cortico-respiratory drive whilst participants are awake may prevent the hypoventilation/apnoea observed in these patients whilst they are asleep.”

      And later:

      “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (in line with the suggestions made from studies of adults with congenital central hypoventilation syndrome[6] and obstructive sleep apnoea[7]).”

      (2) We agree that this was the first research question we examined. We have clarified this in the introduction, now re-writing the hypothesis and aims to state “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (…). To this end, we first examined whether cortico-respiratory coupling exists in both premature and term infants.”

      (3) Figures 2 and 3 used the full dataset. We have clarified this in the Figure captions by stating: “For all panels, data included is from 68 infants (28-42 weeks postmenstrual age [PMA] at time of recording) on 104 recording occasions. See Table 1 for further clinical and demographic characteristics.”

      (4) We used a cross-frequency version of the phase-slope index to quantify the directionality and strength of information flow between cortical and breathing time series (Figure 3C,D). The phase-slope index investigates phase lags and how these change over narrow frequency ranges by examining the slope of the phase spectrum of their complex coherency. This indicates whether one signal leads or trails another signal (and thus indicating directionality). However, we agree (and as was also noted by Reviewer 3) that this analysis does not ‘prove’ directionality as other factors may influence the analysis. We have added the following to the text to address this point:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (5) We have revised Table 1 so that Apgar scores are provided as median and interquartile range.

      Reviewer #3 (Public review):

      Summary:

      This is a strong and important report that presents a framework for understanding cortical contributions to neonatal respiration. Overall, the authors successfully achieved their goal of linking cortical activity to respiratory drive. Despite the correlational nature of this study, it is a crucial step in establishing a foundation for future work to elucidate the interaction between cortical activity and breathing.

      Strengths:

      (1) The introduction and use of workflows that establish correlational relationships between breathing and brain activity.

      (2) The execution of these workflows in human neonates.

      Weaknesses:

      Interpretations related to causal inference, confounds of sleep and caffeine, and the spatial interpretation of EEG data need to be addressed to ensure that the data appropriately support the conclusions.

      Thank you for your useful comments. We have now substantially revised the manuscript in relation to causal inference and our interpretations of the data, in particular adding further detail to the discussion with regards to the limitations of our approach and revising wording that has causal implications. We provide more detail in response to your ‘Recommendations for the authors’.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I want to elaborate on the three points of methodological criticism, and my apologies in case I have some misconceptions:

      (1) It seems like the surrogate distribution to determine PAC significance was computed by shuffling EEG segments and recomputing PAC each time. Surrogate computations are a difficult topic when handling signals as regular as respiration time series. However, random shuffling of data segments is almost always an overly liberal approach (except for trial-based data) since it destroys the temporal autocorrelation of the underlying signal. As the resting-state data in the present study were per sé continuous (and just segmented for analytical purposes), I am not convinced that random shuffling provides an adequate control. Could the authors either a) provide convincing evidence that the temporal autocorrelation of verum and surrogate time series did not differ from one another, or b) conduct additional control analyses based on an alternative approach, e.g., by constructing surrogate respiration phase vectors and recomputing PAC accordingly? We have had good experiences with the IAAFT approach (outlined in Kluger et al., Nat Comms 2023), but others certainly exist.

      Thank you for this important comment on the construction of surrogates. We agree that it is essential for any surrogate approach that it destroys the cross-signal coupling whilst preserving the signals’ internal structure (e.g., autocorrelation, spectral profile, and non-stationarities) as much as possible. We apologies for not describing this clearer in the initial manuscript, but we want to clarify that in the surrogate analysis, we did not shuffle time points/segments within EEG trials itself. Instead, we permuted the trial order so that respiration trial T1 was paired with an EEG trial other than T1. This leaves the 4-sec segments used in the PAC analysis unaltered. This surrogate technique preserves the important internal properties of each signal (within-trial autocorrelation, auto-spectra and power distribution of the signals) while destroying the cross-signal alignment across trials, and thus the trial-wise phase locking (e.g., coherence) between respiration and EEG. We have clarified this in the manuscript as follows:

      “The surrogate analysis was performed by randomly permuting the trial (4-s segment) order of the EEG amplitude while leaving the respiration trial order unchanged (i.e., respiration segment S1 was paired with an EEG segment Sj, j ≠ 1). Importantly, no temporal samples were shuffled within segments. Thus, the full within-segment temporal structure, including autocorrelation and spectral profile (auto-spectra), was preserved for both signals. This permutation destroys trial-wise cross-signal phase alignment (and therefore coherence) while retaining the intrinsic dynamics of each signal.”

      (2) The LMEM approach is very sound, but it seems like ID was the only random effect included in the model. Could the authors clarify whether multiple sessions from individual neonates were considered or whether each ID was only represented once? In case of the former, one possibility would be to include 'session' as an additional random effect; otherwise, the group statistic could be biased. Many thanks in advance for providing insight on this.

      Thank you for this important point. Of the 68 infants included in the study, 49 only had a single session. The remaining 19 infants had between 2 – 5 sessions included. Given that most infants only had a single session it is not possible to identify random effects of session reliably and so we have not included session as a random factor. Moreover, postmenstrual age [PMA] (which is related to session order within a subject and is likely a more reliable indicator of variance given that sessions were not at fixed intervals) is already included as a factor in the analysis. Indeed, session ID is not a distinct source of clustering and will be indistinguishable from subject and PMA variance.

      In relation to this question, we carefully checked the analysis and realised that we had included infant with a random effect of both slope and intercept. Given that most infants have only one session the random effect of slope cannot be estimated and so we have now removed this from the analysis leading to very minor changes in the results (and no changes in the interpretation). We have clarified in the manuscript that “Infant ID was included as a random effect acting on the intercept.”

      (3) It is not entirely clear to me why the authors selected the two electrodes with the strongest overall PAC for the analysis of apnoea frequency. Why not consider all electrodes individually? What is the worry/hypothesis regarding electrodes with low PAC - would one not expect simply to find no relationship with apnoea frequency, and would that information not be instructive? Again, I want to thank the authors in advance for their take on this comment.

      We initially included only the two electrodes with the strongest coupling as we would not expect a relationship with apnoea rate at those electrodes without significant coupling (as you say). For completeness, we have now included the relationships with all electrodes individually in Supplementary Figure S4. As expected, the relationship between apnoea rate and coupling (coherence) was not significant for the electrodes without strong coupling.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Causal Language and Overinterpretation are evident throughout the manuscript. The manuscript repeatedly uses language suggesting causality (e.g., "cortical motor activity reduces apnoea"), despite the correlational nature of the findings.

      It is recommended that the authors reframe their claims in the abstract and discussion to clarify that the observed associations do not establish causal influence. For example, Abstract: "...revealing novel mechanistic insight....". This correlational observation does not reveal a mechanism but rather supports the concept of mechanistic interactions.

      Thank you for this important point. We have now rephrased the manuscript throughout, particularly in the abstract and results/discussion. We have also added the following sentences to the discussion to address the point on causation:

      “Nevertheless, it is important to recognise that a limitation of this analysis is that correlation does not imply causation, and future mechanistic studies are required to determine whether and how cortico-respiratory coupling plays a role in reducing apnoea in infants.”

      And later:

      “The limitations of our study need to be considered, and in particular, directionality of the cortico-respiratory coupling, improved spatial localisation, and a direct mechanistic link between cortico-respiratory coupling and apnoea rate, should be investigated in future work.”

      (2) Potential Confounding by Sleep State and Caffeine. Sleep state is a significant determinant of apnoea occurrence and EEG frequency composition, yet no objective sleep-state classification is incorporated. Similarly, caffeine, administered in ~50% of recordings, is a potent respiratory stimulant. A reanalysis of the data, incorporating sleep proxies (e.g., EEG spectral ratios, delta/theta dominance) and caffeine exposure as covariates or stratification factors in the PAC-apnoea models, should be performed.

      Sleep state: A limitation of our work is that we did not record sleep state and unfortunately, we do not have anyone trained to annotate sleep states from EEG recordings in our research group. We have added the following to the discussion to address this:

      “It is known that most apnoeas in infants occur during active sleep[6][30] and delta- and theta-band frequencies in EEG are strongly related to sleep state[31]. A limitation of our study is that we did not record the sleep state of the infant.”

      Caffeine: We agree that caffeine is a respiratory stimulant and, hence, it is important to consider this effect. Moreover, those infants prescribed caffeine are those who are at greatest risk of apnoea and so it is of interest to determine whether the relationship between PAC and apnoea rate occurs in those infants receiving caffeine treatment. We conducted a stratified analysis to address this point, now providing an additional Supplementary Figure.

      (3) Directionality Inference from Phase-Slope Index. While PSI suggests a lead-lag relationship, it does not confirm causality and may be influenced by signal-to-noise or preprocessing steps. Validation PSI findings using additional metrics (e.g., time-lagged cross-correlation or Granger causality) or, at a minimum, temper interpretations of cortical "driving" respiration.

      We agree that the PSI (and other metrics such as Granger causality) may be influenced by a range of factors. We have therefore changed the wording throughout and also added the following:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (4) Limited EEG Spatial Resolution. The attribution of CRC to "cortical motor areas" is overstated, given the use of only 8 EEG electrodes, which provides limited spatial coverage. Avoid overly precise interpretations regarding cortical localization unless source localization or higher-density EEG data are available.

      We have added the following to specifically address this limitation.

      “It is important to note that the number of electrodes in our montage is limited (with only 8 recording electrodes), and so source localisation was not possible; higher-density recordings are warranted to confirm whether the motor cortex plays a role.”

      We have also changed the wording in the summary paragraph and abstract to add this limitation and reworded throughout the manuscript to highlight the limitations of our study.

      Minor Comments

      (1) Consider color-coding individual points in Figure 4A by PMA or caffeine status to visually disambiguate potential age-related or pharmacological effects.

      We agree that this provides additional visual information and have colour-coded the points in Supplementary Figure S6 according to caffeine status.

      (2) Clearly define PAC versus CRC. These are used interchangeably. Readers may benefit from a more consistent and precise usage, especially in the abstract.

      Thank you for noticing this. We have revised the terms where necessary throughout, and the abstract and introduction to read:

      “Using simultaneous electroencephalography (EEG) and impedance pneumography we investigated interactions between cortical and respiratory activity (known as cortico-respiratory coupling) using phase-amplitude coupling.”

      “Recently, it was proposed that communication between the cortex and lungs, known as cortico-respiratory coupling, can be identified and quantified through phase-amplitude coupling. This functional coupling arises when the amplitude of cortical activity is modulated by the respiration phase, or vice versa. Phase-amplitude coupling is typically quantified using non-invasive recordings capturing respiratory and neural activity (e.g., magneto- or electroencephalography [EEG]).”

      (3) Clarify the overlap with previously published datasets (line 358). Are any EEG-apnoea associations re-analyses of data published in Zandvoort et al., 2024?

      We have amended this sentence to explain that the previous study did not investigate respiration/apnoea. We now state:

      “Parts of this dataset have been reported earlier in Zandvoort et al. [33] to address a different research question (this study investigated the development of sensory-evoked potentials, which were also recorded in these infants; it did not explore respiration).”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Joint Public Review:

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using Dictyostelium discoideum as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability (a stochastic factor) enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors demonstrate that cell fate decisions in D. discoideum depend on a combination of deterministic and stochastic factors, i.e., cell cycle phase and gene expression variability, respectively. They then identify Set1 - a key regulator of gene expression variability - indicate the mechanism through which it modulates this variability, and link it to a phenotype in D. discoideum development. Finally, they confirm that gene expression variability contributes to the robustness of the cell's response to environmental disruptions that interfere with the cell cycle.

      Strengths:

      The authors are careful in the choice of their experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression.

      Weaknesses:

      However, in terms of mathematical modelling, it would be important to rule out sources of stochasticity (other than gene expression variability), and also to consider cases where stochastic factors are not necessarily completely independent of the deterministic ones.

      We thank you and the reviewers for the insightful comments that have helped clarify the findings presented. We have addressed all comments and feel that the revised manuscript is much improved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Minor typographical mistakes:

      (a) in the title: Linage -> lineage

      Corrected as suggested

      (b) on page 19: use a full stop in "...are biased towards the stalk fate, Use of the cell cycle position..."

      Corrected as suggested

      (c) on page 20: become -> becoming in "...(and end up biased towards become stalk)..."

      Corrected as suggested

      (d) on page 16: "mu = G p k". Perhaps it should be x instead of k?

      Corrected as suggested

      (2) Regarding the abstract:

      (a) This work tries to outline general principles (coordination/integration of deterministic and stochastic factors) in cell fate choice, especially when cells are faced with (near) identical environmental conditions. Perhaps the abstract, especially the first line, could be rephrased to reflect the generality of symmetry breaking and differentiation that is studied in this article/work. e.g., as was done in the first paragraph of the discussion.

      Corrected as suggested

      (b) It might be worthwhile clarifying what "this" is in the sentence "We suggest this represents an adaptive mechanism that increases developmental robustness against perturbations that affect deterministic signals." in the abstract.

      Corrected as suggested

      (3) Regarding the model:

      (a) The model tries to combine the stochastic and deterministic parts to explain the propensity for stalk fates. It is assumed that the cell cycle-associated factors (CCAF) provide the deterministic part while the cell cycle-independent factors (CCIF) provide the stochastic part. The net result is an addition of the two, which is then compared against a threshold to decide the propensity for stalk fates. However, another simple way to introduce stochasticity would be to make the CCAF decay stochastic. Reasons to consider this scenario would be: (i) the decay process (especially in the biological context) is generally stochastic, (ii) it would not be inconsistent with the fact that cell cycle dependent genes are also variable, and (iii) this way of introducing stochasticity would also provide expression level characteristics/plots similar to the ones outlined in Figure 1C, i.e. with a probability distribution of CCAF values for a given amount of time after mitosis. Would there be arguments or experimental evidence to rule this possibility out? For instance, would the results shown in Figure 7 contradict this model?

      We agree that there could be stochasticity the CCAF decay process. In this scenario, the expected value of CCAF (which would reflect the mean of a noisy distribution) would show a deterministic pattern of decay through time, representing the average value of CCAF across cells that are in the same phase of the cell-cycle. The noisiness around such a pattern of deterministic decay in the mean value of CCAF (i.e., the residual variation) would then represent CCIF since it would be, by definition, cell-cycle independent. Hence, the present model is fully consistent with this possibility since it would still lead to some variation being cell-cycle associated and some variation being cell-cycle independent. Therefore, this scenario could be viewed as a different functional/biological process leading to the same ultimate distribution we model. To clarify this, we have added text justifying the hypothesis that the noisy distribution is due to gene expression differences, rather than decay itself:

      “Protein levels can vary widely between cells because it is regulated at multiple levels, including transcription, translation and stability. The position of the noisiest step in a pathway affects the overall noise dramatically, because each step usually amplifies noise in the previous steps (Alon 2007). Consistent with this idea, theory and single-cell experiments have shown that a major contributor to cell-cell variation is the bursty expression of low-copy mRNAs. We therefore hypothesized that this noisiness across cells arises from stochastic expression of a set of genes contributing to CCIF levels.”

      (b) On page 7, the formula for total CCIF variance assumes independence of the genes g_i. Is this a reasonable assumption?

      This concerns the argument that a set of stochastically expressed genes will yield an approximately Gaussian distribution of CCIF. Our results do not depend on the solution for the mean and the variance, only that noisy genes will generally yield such a Gaussian distribution.This is because independence is not strictly required for the central limit theorem to yield a Gaussian distribution. The distribution will still be Gaussian under a broad range of conditions (especially since gene expression is bounded, so there is no chance of the total ending up generating an infinite variance). The primary requirement is that the expression of any given gene is independent from that of most other genes. As a result, most of the variation in expression across genes is independent (even if any given gene is not independent from all other genes).

      The most likely pattern of non-independence will be the case in which gene expression is ‘modular’, where there are co-expressed blocks, meaning that non-independence is limited in scale so that genes within a co-regulated block show correlated expression, but their expression is uncorrelated to genes in other blocks. This pattern is functionally analogous to what is known as m-dependence in sequences of random variables (e.g., time series), where variables close together in sequence are correlated (but otherwise uncorrelated). Derivations of the central limit theorem have shown that the means (and hence the sum) of these sorts of variables still follow an approximately Gaussian distribution over a broad range of scenarios. In the case of non-independent gene expression, this means that we can view the independent random variable as being the expression value of a group of co-expressed genes (instead of individual genes). Hence, the means (or sums) of these values will still conform to the central limit theorem.

      This problem is addressed in:

      Diananda, P. H. 1955. The central limit theorem for m-dependent variables. Proc. Combin. Philos. Soc. 51:92-95

      Hoeffding, W. & H. Robbins. 1948. The central limit theorem for dependent random variables. Duke Math. J. 15:773-780

      Orey, S. A. 1958. Central limit theorems for m-dependent random variables. Duke Math. J. 25:543-546

      Rosén, B. 1967. On the central limit theorem for sums of dependent random variables, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 7:48-82

      To clarify this, we have added the following text and references:

      Although this derivation implicitly assumes that stochastically expressed genes are independent, this assumption is not strictly required for the distribution of CCIF to be approximately normal. If stochastically expressed genes show clustered co-expression owing to shared regulation, then the sum across these co-expressed blocks is still expected to be approximately normally distributed (as long as there are a reasonably large number of co-expressed clusters) (Diananda 1955; Hoeffding and Robbins 1994; Rosén 1967).

      (4) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells":

      Regarding the statement: "We first determined the coefficient of variation (CV2) of expression for all genes. As expected, this tends to decrease as average expression level increases (Supplementary Figure 2).":

      It would be good to specify how the "expected variation" was calculated exactly. For instance, it was hard to discern from Supplementary Figure 2 how CV^2 decreasing with average expression levels was used in the calculation of expected variation.

      This is described in the methods on page 38

      “A trend line was fitted to the data using non-linear least squares regression (Scran v1.15.9). Genes were defined as variable (2073 genes) based on a one-sided test assuming a normal distribution around the trend but one where deviation changed depending on the mean expression of a given gene (Scran v1.15.9 - modelGeneCV2) with a FDR of < 0.05.”

      (5) In section "Stochastically expressed genes are associated with cell fate determination"

      (a) For readers unfamiliar with the organism ‘Dictyostelium discoideum’, a short description of its life cycle with growth and development/differentiation phases would be useful to provide the right context.

      Corrected as suggested

      (b) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells", it was shown that cell cycle dependent genes are also highly variable (in other words, ‘stochastic’). It would, therefore, be useful to elaborate on the definitions of "stochastically expressed genes, cell cycle-associated genes, and non-variable genes", as used in this section. Admittedly, the distinction does get clearer towards the last section of Results, but some elaboration here would make the reading smoother.

      Corrected as suggested

      (c) If the "cell cycle associated genes" are the same as "cell cycle dependent genes", it would be good to use one term consistently.

      Corrected as suggested

      (d) The developmental index is divided into 10 bins from 0 to 1. Is there a rationale for the choice of a number of bins? Would this choice affect significance tests for "stochastic" vs others? <br /> (The same question may apply to the "Cell type index")

      Significance is robust to the number of bins chosen (e.g. 5-25). Of course, if there are too many bins (low number of genes) or too few bins (addition of noisy data) significance falls. In the case of developmental index, our choice of bins is also based on previous analyses (de Oliveira, et al 2019), which developed the index we used, and showed that a threshold of >0.9 can be used to identify ‘developmentally expressed genes’.

      (6) In Figure 5:

      (a) Does the statement "*** binomial test, p<0.01." (as seen in caption for part C) actually refer to part D?

      Corrected as suggested

      (b) Could the authors please specify what "mis-expressed" means in Figure 5D? Are these genes that are upregulated, downregulated, or both? From what set of genes was the random sampling done?

      Corrected as suggested

      (c) In Figure 5F, is the decrease in CV^2 explained entirely by the increase in mean (as shown in Figure 5E)?

      We appreciate the point made by the reviewer and recognise that disentangling changes in gene expression variation from changes in expression levels is extremely difficult (any changes in burst frequency will necessarily affect expression level). However, we do not think this affects our conclusions, which are supported by results with representative Set1 dependent reporter genes (Figure 5G and H) which suggest that the number of cells expressing (rather than the expression in each cell is affected) in these cases at least.

      (7) In Figure 6A: Could the authors please elaborate on the difference between the rows labelled "WT" and "set1-"? Are they two different types of chimera?

      Corrected as suggested

      (8) In Section "Cell cycle position and gene expression variation interact to control cell type proportioning":

      Is there a graph corresponding to the statement "However, the level of GFP expression in each responding cell did not significantly change."?

      Corrected as suggested

      (9) In section "Influence of stochastic variation on sensitivity to cell cycle perturbations" of the Supplementary text:

      (a) The model for cell cycle bias is not entirely clear. For instance, is the quantity N(t) = U(t) + Q_t U(t) also a probability distribution, like U(t) is? If so, there must be a normalization factor. It was difficult to understand the procedure behind this calculation. Perhaps some more elaboration (with words or a small schematic) on this model/method would help.

      The value of U(t) was originally being used to denote the uniform probability density function (for the uniform distribution), but for clarity this has been changed to follow the convention that U[a,b] denotes the uniform distribution over the interval from a to b (which, in this case would be U[0, 1]), while f(t) is now being used to make it clear that this is the probability density, where f(t) = 1 across the interval. Because the uniform distribution necessarily integrates to 1 over the defined range, it does not need to be normalised. The confusion here is perhaps due to the expression f(t) = 1 being interpreted as defining the probability of sampling a value of t (but in a continuous distribution we can only define the probabilities of sampling over an interval), instead of defining the probability density over the interval from a to b, where f(x) would be 1/(b – a), and hence over the interval of 0 to 1, f(x) would equal 1.

      To help clarify this issue, this section has been rewritten and a new figure (which appears as Supplementary Figure 12) has been added that illustrates the resulting probability density functions for biased sampling from the cell cycle.

      (b) References to Figure 8A, B seem to be indicating Supplementary Figure 12 instead. 

      Corrected as suggested

      Reviewer #2 (Recommendations for the authors):

      This manuscript seems quite interesting, but many sections are so unclear that I cannot follow what has been done. I would suggest slowly going through the manuscript and carefully explaining things. This will probably considerably increase the size of the manuscript, but many sections are too terse to follow even after many, many readings of the Results and figure legend.

      Corrected as suggested

      Some specific comments (this is not at all comprehensive, but rather illustrative)

      Page 2 - 'genes strongly associated with fate choice' - can you explain this a bit more - genes associated with one cell type or another, or genes that somehow regulate the choice?

      Corrected as suggested

      Page 2 - this abstract is quite vague, I would suggest being more specific to reflect what is in the manuscript.

      Corrected as suggested

      Page 3 - 'exhibit bivalent H3K4me3..' please explain 'bivalent' a bit more.

      Corrected as suggested

      Page 7 - 'Bernoulli process with probability that (meaning that is scaled to the size of the temporal interval)' (non-copying symbols deleted) could be simplified.

      Corrected as suggested

      Page 7 - please define all variables/ equation components. What is N? What is x bar? What is s2? The middle paragraph is very difficult to follow.

      This paragraph has been rewritten and a definition of the distribution added for clarity.

      Page 7 - 'genes might logically vary in the value of pi, such variability does not impact our results. Trying to decipher this paragraph, it seems that pi is a function of time, so this could affect the results.

      pi is the probability that a stochastically expressed gene is actually expressed in whatever interval is being considered for all genes. pi will necessarily increase if the time interval considered is increased. The key point is we are considering the probability that any given gene is expressed in the same time interval. In this case, genes could vary in pi, and thus some burst more often and others less often.

      Page 9 - '(it is 98.35 times more likely' there may be too many significant figures here.

      Corrected as suggested

      Page 10 - for the Area Under the Receiver Operating Characteristic Curve (AUROC), what are you classifying? AUROC is typically used for diagnostic tests to determine how well the test can discriminate between two completely different outcomes. What is the input, and what are the outcomes?

      Corrected as suggested

      Figures:

      What are the dashed lines in Figure S2A?

      Corrected as suggested

      What are the X-axes in Figure S3?

      Corrected as suggested

      I do not understand what you are showing in Figure S3.

      Corrected as suggested in results

      In Figure 2B, I cannot find in the text or figure legend any description or explanation of 'Group 1', 'Group 2', or 'Group 3'.

      Corrected as suggested

      Figure 3D needs a lot more explanation; I cannot understand this based on the text and the figure legend.

      Corrected as suggested

      The Set1 work should discuss the work in PMID: 39242621

      Corrected as suggested

      Figure 8 D needs a size bar

      Corrected as suggested

    1. Author response:

      The following is the authors’ response to the original reviews

      Public review:

      Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threatlearning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      We thank the reviewer for this comment. We have added more detailed information about the methods to clarify our procedure. In addition to the original description of our threat learning paradigm in humans, we included the following to page 39-40:

      “Experimental procedure

      Threat learning: Please see the original description in the manuscript.

      Shock level: The shock intensity used in the fear learning paradigm was determined during a preexperiment calibration. Electrodes were attached to the participant’s right hand, and stimulation began at a low level (0.1 mA), gradually increasing in small increments. After each increment, participants verbally rated their discomfort. The procedure continued until the participant identified a level they described as “highly annoying but not painful.” This individualized intensity was then used for that participant throughout the experiment. For safety and ethical reasons, the maximum intensity was capped at 20 mA, and no participant received a shock above this limit.

      Instructions to the participants: Each visual stimulus in our paradigm was first shown to participants for 6 seconds. This initial presentation served as habituation, allowing us to isolate the responses to genuinely new stimuli. Before the experiment began, participants were informed that they would see pictures illuminated with different colored lights, such as red or blue. During the experiment, some pictures might be paired with an electric shock, while others might not. Participants were instructed to pay attention to whether a specific color or pattern was associated with the shock. These instructions were adopted from previous studies in which our group developed this paradigm and found them highly effective for human learning. We therefore used the same approach in the current experiment. These instructions were provided throughout all phases of threat learning, and participants were informed that any shocks delivered would be at the same intensity determined on Day 1.”

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

      We thank the reviewer for this suggestion. We agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results in Figure 7). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across thalamic nuclei, rather than the distribution of individual data points per se. For this purpose, bar plots with standard errors provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      We acknowledge the limitations of fMRI studies and agree with the reviewer that causal claims cannot be made based on correlational neuroimaging evidence. Accordingly, we revised the text to reduce causal interpretations. Specifically, we reworded the sentence identified by the reviewer in the Results section and systematically updated language throughout the manuscript.

      Page 9: “At the block level, both the anterior pulvinar and MD showed increased activation to CS+ vs. CS− (anterior pulvinar: t<sub>(292)</sub> = 4.41, p = 0.00001, d = 0.25; MD: t<sub>(292)</sub> = 6.41, p = 5.83x10<sup>-10</sup>, d = 0.37; Fig. 1b–c), suggesting a possible involvement of these regions in early associative threat learning.”

      Throughout the manuscript, we replaced terms such as “reflects” with “likely reflects” and “indicating” with “consistent with,” and introduced explicitly correlational phrasing where appropriate (e.g., “apparently,” “closely align,” and “seems to”). All revisions are highlighted in green in the revised manuscript.

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      We thank the reviewer for highlighting this important pattern in Trial 3. The CS+ vs. CS− contrast in the third trial in the mediodorsal thalamus remained statistically significant after FDR correction and was correctly reported in the Supplementary Tables. However, we acknowledge that the statistical marker was inadvertently omitted from Figure 1. We have now corrected the figure to include the appropriate significance annotation.

      In addition, we now explicitly describe the attenuation of the CS+ vs. CS− difference by the third trial in the mediodorsal thalamus but not in the pulvinar (page 32):

      “This suggested rapid initial acquisition of the predictive value of the CS+ is thought to be pronounced during the first two trials. The attenuated CS+ vs. CS− differentiation on the third trial specifically in the pulvinar may reflect a decreased requirement for differential thalamic engagement once the initial association has been acquired, or an initial survival fear reaction is expressed. Notably, because the MD sustained the BOLD response to the CS+ in the third trial which may indicate involvement of this nucleus in the consolidation or stabilization of the learned association. This aligns with the wellestablished MD-PFC circuit involved in cognitive processes (Wolff and Halassa, 2024). Additionally, in a previous study using a similar paradigm, we observed sustained CS+ vs. CS− differentiation on the third trial in the nucleus reuniens, as well (Tuna et al., 2025). These findings suggest that trialdependent learning dynamics may vary across thalamic nuclei rather than reflecting a uniform thalamic learning signal. Together, while our paradigm does not inherently distinguish between different stages of learning, such as early acquisition and stabilization, our findings are consistent with stronger associative learning–related engagement during the first two trials, with a reduced differential response by the third trial that may reflect the involvement of different neural processes”.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      We thank the reviewer for this insightful comment. We agree that the network analysis in Figure 3 does not provide a direct anatomical account of pulvinar connectivity and cannot distinguish between direct inter-nuclear interactions and indirect coupling mediated via corticothalamic and thalamocortical pathways, including visual cortex.

      Our intention with this analysis was to characterize functional statistical dependencies among pulvinar divisions during conditioning, rather than to infer monosynaptic anatomical connectivity. Accordingly, the observed network structure should not be interpreted as evidence for direct excitatory projections between pulvinar subnuclei.

      We agree that including visual cortical regions in the network would substantially increase model complexity and could alter the inferred network structure. However, doing so would require a trial-wise, multiregional modeling framework that goes beyond the scope of the present intra-pulvinar analysis.

      In response to this comment, we have now explicitly clarified the assumptions, interpretational limits, and alternative explanations of the network model in the Discussion (page 33):

      “Yet, these intrapulvinar relationships should be understood as a functional and computational model, reflecting statistical dependencies among pulvinar divisions during threat learning, rather than as evidence of direct monosynaptic anatomical connections. Because detailed inter-nuclear anatomical connectivity within the pulvinar remains incompletely characterized, our analysis does not presuppose strong direct excitatory projections between subnuclei. Instead, our findings are intended to highlight candidate functional relationships within the pulvinar during conditioning with different level of data processing, rather than to provide a definitive anatomical map.”

      We also included the following in the Limitations and Future Directions section (page 36):

      “The observed relationships among pulvinar divisions during conditioning are purely functional and do not distinguish direct inter-nuclear interactions from indirect coupling mediated by corticothalamic and thalamocortical pathways, including visual cortical regions. Thus, the pulvinar model may reflect indirect cortical loops, weak or currently undocumented inter-nuclear interactions, or a combination of both.”

      Finally, we added this note to the legend of Fig. 3:

      “Note: The functional relationships among pulvinar divisions during threat learning should be interpreted as computational dependencies derived from statistical associations. These effects may reflect indirect interactions mediated by corticothalamic and thalamocortical pathways (e.g., via visual cortex), rather than direct inter-nuclear connectivity. Elucidating the underlying anatomical mechanisms will require future studies.”

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      We thank the reviewer for this suggestion. In this study, each phase (conditioning, extinction, recall, and renewal) was analyzed separately to characterize thalamic function within that specific phase. Our primary conclusions focus on differences between CS+ and CS− within each phase, rather than comparisons across phases or sessions. Direct statistical comparisons across phases were therefore not performed, as they fall outside the scope of our main hypotheses.

      We have clarified this in the revised manuscript to make the rationale for our analytic approach explicit. Added to page 8:

      “The purpose of this study is to investigate thalamic function during each learning phase separately, focusing on CS+ vs. CS− differences within phases rather than comparing activation across phases. This phase-specific approach allows us to characterize thalamic functional dynamics within each stage of learning and memory, avoiding potential confounds arising from the distinct processes of conditioning, extinction, and recall.”

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      We thank the reviewer for this point. Reciprocal connections between the visual cortex and the pulvinar are established, but the precise projections to specific pulvinar divisions remain unknown. We have added a note to the Figure 8a caption to clarify this (Figure 7a in the original version).

      “Note (panel a): Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      We thank the reviewer for this comment. We aimed to integrate our empirical findings within a broader conceptual framework to provide a complementary narrative, rather than presenting isolated observations without connecting them to theoretical context. This approach is intended to strengthen the interpretive value of the study while remaining grounded in primary data.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      We thank the reviewer for raising this important point. We fully agree that the fMRI BOLD signal reflects large-scale changes in population activity and may fail to capture more subtle or distributed neural representations. Accordingly, the absence of a CS+ vs. CS− BOLD difference should not be interpreted as evidence that a region is not involved in discriminating these stimuli. Rather, our inferences are limited to differences in aggregate activation at the spatial and temporal resolution of fMRI.

      To partially address this limitation, we analyzed anatomically defined thalamic subregions; however, we acknowledge that finer-scale subdivisions and cell-type– specific processing likely exist that are not currently resolvable in human fMRI. Such distinctions may be better investigated using invasive recordings or circuit-level approaches in rodents or non-human primates. This limitation has now been explicitly acknowledged in the Limitations section of the manuscript (page 36):

      “Pulvinar divisions, MD, and LGN each contain diverse neuron subtypes and finer anatomical subdivisions that may serve distinct functions. Importantly, the absence of CS+ vs. CS− differences in BOLD activity should not be interpreted as a lack of stimulus-specific processing, as such distinctions may occur without changes in overall activation detectable by fMRI…”

      (7) There is strong evidence that the BOLD responses to the threat-related and safetyrelated stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      We thank the reviewer for this constructive suggestion. In response, we have revised the discussion to present our interpretations as possible or plausible explanations, rather than definitive conclusions, to better reflect the strength of the current evidence. The changes are marked in green throughout the Discussion section.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

      Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learningautomatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      (a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      We thank the reviewer for raising this point. While the spatial resolution of fMRI (3 mm isotropic) does limit voxel-wise examination of very small nuclei, our analyses were not performed at the single-voxel level. Instead, signals were extracted using anatomically defined masks for each thalamic nucleus, which is a standard and widely used approach for studying small subcortical structures with fMRI. This strategy increases signal-to-noise ratio and mitigates partial-volume effects by aggregating activity across voxels belonging to the same anatomical region.

      (b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      We thank the reviewer for pointing out the availability of specialized thalamic atlases. In our study we used the Automated Anatomical Labelling Atlas 3 (AAL3 atlas), which includes thalamic subdivisions (including pulvinar and other nuclei) among its 150+ whole-brain regions and is widely used for ROI extraction in normalized fMRI analyses. This choice allowed us to define consistent ROIs across the entire brain such as the amygdala and hippocampus within the same parcellation framework and to extract functional signals at the resolution of our preprocessed fMRI data.

      While histology-informed probabilistic atlases offer finer microanatomical segmentation of the thalamus, they are implemented primarily for structural segmentation pipelines (e.g., FreeSurfer) and do not change the fact that AAL3’s thalamic subdivisions are established and anatomically reasonable ROIs for functional studies at standard fMRI resolutions. AAL3 thus provides a practical and valid choice for our whole-brain activation and connectivity analyses.

      (c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      We thank the reviewer for raising this concern regarding anatomical precision. The data were resampled to 2 × 2 × 2 mm resolution in SPM12, and a 6 mm FWHM Gaussian smoothing kernel was applied. Gaussian smoothing does not uniformly mix immediately adjacent voxels; rather, it applies distance-weighted averaging with a standard deviation of approximately 2.55 mm (FWHM = 2.355σ). At 2 mm resolution, this corresponds to ~1.3 voxels, meaning that signal contribution decreases smoothly with spatial distance rather than reflecting simple voxel averaging. Moreover, all statistical analyses were conducted at the ROI level using anatomically defined masks, rather than voxel-wise inference within nuclei.

      To empirically assess whether smoothing may have introduced boundary-driven spillover effects, we divided the mediodorsal (MD) thalamus into medial and lateral divisions and examined the CS effect separately in each. The CS effect did not differ between subdivisions (MD subdivision X CS interaction: F<sub>(1, 292)</sub> = 0.50, p = 0.48).

      Additionally, across trials, the CS+ vs. CS− effect was observed in both subdivisions and showed comparable magnitudes (see Author response image 1). The effect sizes were also comparable across MD divisions as presented in Author response table 1).

      Author response image 1.

      Mean activation in MD subdivisions during threat learning

      Author response table 1.

      Point estimates and 95% confidence intervals of effect sizes (Cohen’s d) for CS+ vs. CS− contrasts in MD, MDm, and MDl During Early Threat Learning

      If smoothing had artificially driven the MD effect via boundary spillover, one would expect consistent asymmetry or substantially larger effects in one subdivision relative to the other. Instead, the CS effect was distributed across both medial and lateral MD, supporting the interpretation that the observed activation reflects intrinsic MD signal rather than smoothing-related contamination.

      (d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      Our analyses are within-subject, so each participant serves as their own control, minimizing the impact of motion differences across conditions. Functional data were preprocessed with fMRIPrep 20.0.2, which estimates motion parameters. The motion estimations are included in the GLM to account for residual motion-related variance in SPM12. The connectivity analyses were conducted in CONN, which also includes these motion parameters as regressors and applies additional denoising steps to further reduce motion-related effects. Together, these procedures make it highly unlikely that motion systematically influenced the observed condition differences.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      We thank the reviewer for this important comment. We have now explicitly reported the number of participants and trials contributing to each analysis throughout the manuscript, including the main text, figure captions, and supplementary materials.

      Specifically, under Materials and Methods (page 38), we now clarify the sample sizes for each learning phase:

      “We analyzed fMRI data from 293 participants during fear conditioning, 320 during extinction, 412 during extinction recall, and 312 during threat renewal.”

      In addition, all figure captions now report the corresponding sample sizes and trial numbers. For example, the caption to Figure 1 (pages 7–8) states:

      “…Block-level comparisons were assessed using paired t-tests, while trial-level effects were examined using a 2 × 2 repeated-measures ANOVA, followed by post hoc comparisons between CS+ and CS− across four trials. Multiple comparisons were controlled using false discovery rate (FDR) correction. Conditioning sample size: n = 293. Detailed statistical parameters are provided in Supplementary Tables 1–2.”

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      Cross-validation strategies were applied to the mediation analyses, which are regressionbased and can be sensitive to extreme values or overfitting, ensuring that observed effects generalize beyond the sample. In contrast, the repeated-measures ANOVA tests within-subject condition differences, and is inherently robust to between-subject variability. For these inferential tests, cross-validation or sample-splitting is not typically applied.

      However, following the reviewer’s recommendation, we conducted a cross-validation analysis focusing on the anterior pulvinar and the mediodorsal thalamus, the primary regions of interest in this study. The full sample (N = 293) was randomly divided into three subsamples (n<sub>1</sub> = 106, n<sub>2</sub> = 91, n<sub>3</sub> = 96). For each iteration, we conducted a repeatedmeasures ANOVA (RM-ANOVA) within one subsample and then examined the stability of the CS+ vs. CS− difference in the remaining two subsamples combined. The CS+ vs. CS− difference was statistically significant in most folds for both the mediodorsal thalamus and the anterior pulvinar. Importantly, effect sizes were comparable across folds within each nucleus, indicating stable estimates of the CS effect.

      Finally, we observed a comparable pattern of CS+ vs. CS− differences at the trial level in both the mediodorsal thalamus and the anterior pulvinar. Critically, the effect sizes of these differences were stable across most cross-validation folds

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MDanterior Puv is reported).

      We thank the reviewer for raising this important point. We would like to clarify that the analyses were not limited to a single, selectively reported association. The relationship between the MD and the anterior pulvinar was evaluated while explicitly accounting for other pulvinar subdivisions, as well as for thalamic input outside the pulvinar.

      Specifically, potential contributions from other pulvinar nuclei were controlled by including them in the regression model (Fig. 2 in the manuscript), and the LGN was included as an additional control region. These analyses therefore test whether the MD–anterior pulvinar association is specific, rather than reflecting a more general thalamic or pulvinar-wide effect. With respect to hypothesis testing, the study was explicitly hypothesis-driven, grounded in functional evidence motivating a specific prediction about MD–anterior pulvinar interactions.

      Still, in response to the reviewer’s suggestion, we further examined pairwise relationships among thalamic subregions. Specifically, we assessed the association between the MD and each pulvinar subdivision using partial correlations, controlling for the remaining pulvinar subdivisions in each analysis. For example, the partial correlation between the MD and the lateral pulvinar was computed while controlling for the activation of the anterior, inferior, and medial pulvinar subdivisions.

      The partial correlation between the MD and the anterior pulvinar was consistent across all four trials of threat learning, whereas the other pulvinar subdivisions did not exhibit a consistent pattern. To evaluate the robustness of these effects, we applied a bootstrap procedure (10,000 resamples) to estimate 95% confidence intervals for each partial correlation. As presented in Figure 4b, only the anterior pulvinar–MD association remained robust, with confidence intervals that did not include zero. In contrast, the confidence intervals for most other pulvinar subdivisions included zero, indicating non-robust associations.

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      We thank the reviewer for this constructive suggestion. While the original manuscript already discussed key limitations in the Discussion section (page 36; e.g., “Although distinct thalamic roles in threat learning have been proposed, fMRI data do not fully capture the complexity of this structure…”), we agree that these considerations would benefit from clearer organization and visibility.

      To address this point directly, we have now added a dedicated “Limitations and Future Directions” subsection to the manuscript. This subsection explicitly summarizes the principal limitations of the study—including methodological constraints of fMRI and anatomical resolution—and outlines specific avenues for future research to address them. This change makes the limitations more transparent and allows readers to more easily incorporate them into their evaluation of the findings.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      We thank the reviewer for this important suggestion and fully agree with the value of data and code sharing for transparency and reproducibility.

      The data supporting the findings of this study are derived from a larger, actively used database that is currently involved in ongoing projects. For this reason, the full dataset cannot yet be publicly released. However, the data underlying the reported analyses are available upon reasonable request from the corresponding author, subject to standard data-use agreements.

      To facilitate reproducibility, all analysis scripts and pipelines used in this study—including preprocessing and analysis workflows implemented in SPM12, and CONN—are available upon request and can be shared with researchers seeking to replicate or extend the reported findings.

      We have clarified this data and code availability statement in the manuscript (page 46).

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editors for this important note. Full statistical reporting, including test statistics, degrees of freedom, exact (raw and corrected) p-values, effect sizes, and 95% confidence intervals, is provided for all key analyses in Supplementary Tables 1–9. In addition, uncertainty estimates and major statistics tests are now explicitly reported throughout the main text, as recommended by the reviewers, irrespective of statistical significance.

      During this revision process, we conducted a comprehensive internal consistency check of all reported statistics and figure annotations. We identified and corrected minor discrepancies between some statistical annotations in the figures and the corresponding results reported in the Supplementary Tables. All figures have now been updated to ensure full consistency with the reported analyses. These corrections do not alter the results or conclusions of the study.

      Reviewer #1 (Recommendations for the authors):

      (1) What is the significance of using two different head coils? Were the data comparable from each coil? How did the authors determine this?

      We thank the reviewer for this important question. Data were acquired using two different receiver head coils across participants. Receiver coils primarily influence signal-to-noise ratio (SNR) and spatial sensitivity profiles, rather than the physiological basis of the BOLD response itself (Triantafyllou et al., 2011).

      Importantly, all analyses were based on within-subject contrasts (CS+ vs. CS−), which are robust to global signal scaling differences that may arise from coil sensitivity variations. In addition, standard preprocessing procedures—including intensity normalization, spatial normalization, and nuisance regression—further minimized potential coil-related variability.

      To empirically evaluate whether acquisition differences influenced our results, we conducted a repeated-measures ANOVA testing the Trial × CS × Site interaction (where Site reflects acquisition location and associated scanning setup, including receiver coil configuration) during fear conditioning (N = 293). As shown in Author response table 2, none of the thalamic nuclei demonstrated a significant interaction effect, and all effect sizes were negligible (η<sup>2</sup>p ≤ .01)

      Author response table 2.

      Repeated-Measures ANOVA results for the Trial X CS X site interaction across all relevant thalamic nuclei during fear conditioning.

      (2) Why were the data smoothed? This could have a negative impact on the specificity of the signals averaged within the pre-defined thalamic ROIs.

      Spatial smoothing was applied to improve signal-to-noise ratio and statistical stability in small, deep thalamic subregions, which are particularly susceptible to noise. We acknowledge that smoothing can reduce spatial specificity. However, our analyses were based on anatomically predefined thalamic ROIs and focused on average activation within each region rather than voxel-wise localization. Under this approach, modest smoothing (i.e., a 6-mm full-width at half-maximum smoothing kernel, rather than the commonly used 8-mm kernel) primarily increases reliability while any signal mixing across adjacent regions would be expected to reduce regional specificity and bias effects toward the null, rather than produce spurious or false-positive differences.

      Additionally, we conducted robustness analyses to examine whether spatial smoothing artificially influenced our results. Specifically, we subdivided the mediodorsal thalamus into medial and lateral anatomical regions and compared activation across these subregions. The activation patterns were comparable across both subdivisions, indicating that the observed mediodorsal thalamus effect is unlikely to reflect boundary spillover resulting from smoothing. If smoothing had driven the effect, we would expect differential signal patterns across the subdivisions rather than comparable activation. (See full response to Weakness C, Reviewer 3, as well as Author response image 1 and Author response table 1 in our response).

      (3) Did the authors consider using any null models to determine whether the observed PPI results could have been observed by chance? E.g., block-resampling nulls scramble temporal order while preserving temporal autocorrelation, and can determine whether subtle differences in autocorrelation across regions can give rise to the observed signatures.

      We thank the reviewer for this thoughtful suggestion. All PPI analyses were conducted using the default CONN toolbox pipeline. In this framework, PPI effects are estimated within a GLM at the first level following standard denoising procedures that reduce motion- and physiology-related variance and apply temporal filtering. Importantly, PPI effects are modeled as subject-level contrast terms rather than computed from raw timeseries correlations.

      Group-level inference was performed on these subject-level contrast estimates using paired t-tests with FDR correction across regions. To further assess whether the observed effects could arise by chance, we additionally performed 10,000 bootstrap resamples of the CS+ vs. CS− differences to evaluate the stability of the effects. While we did not implement explicit block-resampling null models that preserve temporal autocorrelation, the combination of first-level GLM modeling following denoising, large sample size (N ≈ 300), and convergent inferential and resampling procedures provides a rigorous and standard assessment of PPI effects. We have revised the manuscript to clarify these procedures and their rationale.

      We added this language to directly address the reviewer’s concern and revised the connectivity analyses section to clarify the workflow (page 44):

      “Following standard denoising procedures—including regression of motion- and physiology-related confounds and temporal filtering—condition-dependent connectivity effects were inferred from subjectlevel generalized psychophysiological interaction (gPPI) contrast estimates rather than from raw timeseries correlations. This GLM-based framework reduces the likelihood that observed PPI effects reflect differences in temporal autocorrelation or spectral properties across regions rather than genuine task-dependent interactions.”

      (4) The authors may wish to report results in text, as there are currently many demonstrative statements that are not associated with requisite uncertainty estimates, making inference challenging.

      We thank the reviewer for this helpful suggestion. We have revised the Results section to explicitly report statistical outcomes in the main text for all key findings, including appropriate uncertainty estimates (e.g., test statistics, effect sizes, and p-values) alongside demonstrative statements. This ensures that all inferences in the text are directly supported by quantitative evidence.

      Additionally, the full statistical details, including test statistics, degrees of freedom, effect sizes, 95% confidence intervals, and both raw and FDR-corrected p-values, are provided in Supplementary Tables 1–9. These changes improve clarity and transparency while avoiding redundancy. Newly added text in the Results section is highlighted in green.

      (5) I could not find any information about the EBICglasso model in the Methods section, nor information about how the centrality measures were estimated. Given the lack of transparency, I recommend down-weighting the often overly-strong language regarding the conclusions of this analysis.

      We have revised and added these details along with other details to the Statistical tests section on pages 42-44:

      “Statistical tests

      All statistical tests were conducted using JASP versions 0.18.3 and 0.19.3(JASP Team, 2024).

      Activation Differences across all phases of threat learning

      In each threat learning phase, we used paired t-tests to examen the differences in activation of the thalamic nuclei in response to CS+ vs. CS- at the block level (average activation across trials), and 2x2 RM-ANOVA to estimate the differences in activation at the trial-wise level. Assumptions of sphericity were checked, and Greenhouse-Geisser corrections were applied where necessary. This model was followed by post hoc tests to estimate the differences at the trial level and False discovery rate (FDR) correction was applied for each question.

      Network analyses of the within pulvinar relationships during conditioning

      The network analyses examined functional relationships between pulvinar divisions. Nodes corresponded to block-level activation estimates of the CS+ minus CS− contrast for each pulvinar division, yielding four nodes (one per division). Networks were estimated using a Gaussian graphical model with EBICglasso (LASSO regularization) based on Pearson correlation matrices, with the EBIC tuning parameter set to γ = 0.5. Edge weights represent partial correlations.

      Three centrality measures were computed on the estimated weighted partial-correlation network: node strength, defined as the sum of the absolute edge weights directly connected to a node; closeness, defined as the inverse of the average shortest path length from a node to all other nodes; and betweenness, defined as the proportion of shortest paths between all pairs of nodes that pass through a given node. Shortest paths were computed using inverse edge weights, consistent with standard practice for weighted networks. Centrality indices were normalized.

      Network accuracy and centrality stability were assessed using nonparametric bootstrapping (10,000 iterations) to estimate confidence intervals for edge weights and centrality measures. All analyses were conducted in JASP (versions 0.18.3 and 0.19.3) using default settings unless otherwise specified, following the procedures described in Epskamp, Borsboom, and Fried (2018).

      Mediation analyses of within pulvinar relationships during conditioning

      Mediation models of the relationships between the activations in pulvinar divisions were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. All variables were zstandardized prior to analysis. Block-level activation estimates from the inferior and lateral pulvinar were entered as predictors, activation in the medial pulvinar was specified as the mediator, and activation in the anterior pulvinar was specified as the outcome variable.

      To assess the robustness and generalizability of the mediation effects, we conducted 3-fold crossvalidation. The full sample (N = 293) was randomly partitioned into three non-overlapping sub-samples (n = 91, 96, and 106). In each iteration, the mediation model was estimated in one sub-sample, while the remaining sub-samples were used to assess the stability of parameter estimates and indirect effects. This procedure resulted in six cross-validation iterations, allowing evaluation of whether the direction and magnitude of the indirect effect were consistent across independent subsets of the data. Mediation models were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. Indirect effects were evaluated using bias-corrected percentile bootstrap confidence intervals based on 10,000 resamples, as recommended by Biesanz, Falk, and Savalei (2010). An indirect effect was considered significant when the 95% confidence interval did not include zero (p < 0.05).”

      (6) Bar plots are not effective ways to report group-level data. I recommend replacing all bar plots with visualisations that expose the distribution of the data, such as a violin plot or a raincloud plot.

      We thank the reviewer for this suggestion. In general, we agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across conditions, rather than the distribution of individual data points per se. For this purpose, bar plots provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      (7) The thought bubbles are atypical of scientific figures.

      The figure has been revised to remove the thought bubbles.

      (8) Figure 7 - there are many connections not shown in this figure, suggesting that it is sufficiently oversimplified as to be potentially misleading. For instance, the authors offer no anatomical connections between pulvinar and the cortical hierarchy; however, these connections are ample and (likely) highly important for the functionality assessed here. Similarly, there is no room in the figure for the integration of the shock stimuli (presumably via the spinothalamic tract) and the visual stimuli (via the retina/LGn).

      We agree that the pulvinar has extensive cortical and sensory input/output connections that are not depicted in Figure 7. Our intention was not to provide a complete anatomical wiring diagram, but rather a simplified functional model derived from observed statistical dependencies. We have revised the figure and added an explicit note to the legend clarifying that pulvinar–cortical and sensory pathways (e.g., retina/LGN and spinothalamic inputs) are intentionally omitted due to incomplete subnuclear-level anatomical characterization, and that their omission should not be interpreted as a lack of importance. We added this to Figure 7 legend:

      “Note (panel a):

      Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      Reviewer #2 (Recommendations for the authors):

      (1) It's somewhat confusing that Figures 1,4,5 D and E are not in the text until later in the results section. Perhaps these should be presented in the figures in the same order they are discussed in the text, although this is a stylistic issue.

      We thank the reviewer for this comment. To improve clarity and align the figures with the structure of the Results section, we reorganized the figures. Specifically, we added a new figure (Figure 7) that consolidates all connectivity analyses. Figures 1, 4, and 5 now focus exclusively on activation results, while Figure 7 presents connectivity results only. This reorganization allows the figures to follow the flow of the text more closely and makes the narrative of each figure clearer.

      (2) Stylistic: I would strongly recommend adding n numbers and describing the basics of statistical tests used and how multiple comparisons were accounted for in the legend for Figures 1,4, and 5.

      We thank the reviewer for this recommendation. We have added the sample sizes (n) and brief descriptions of the statistical tests used, including how multiple comparisons were handled, to the legends of Figures 1, 4, and 5. In addition, we direct the reader to the Supplementary Tables, which were submitted with the original manuscript and provide full statistical details, including test statistics (t, F), degrees of freedom, effect sizes, 95% confidence intervals, raw p values, and corrected p values. Finally, we further elaborated on the statistical tests on pages 42–44, as detailed in our response to Recommendation 5 (Reviewer 1).

      Reviewer #3 (Recommendations for the authors):

      As previously indicated, please note that no information is included in the manuscript about data and code availability. Although you mainly use toolboxes for data analyses, any script(s) that you have used to run things would be great to upload for reproducibility purposes.

      Also, it would be good to include a limitations subsection in the manuscript.

      Thank you for these recommendations. We added limitations subsection to the manuscript. See our responses under Comments 5 and 6 (Reviewer 3, Public Review).

      In terms of data analyses:

      (1) It would be ideal if you quantify in-scanner motion for the different conditions to see if there were no differences in motion due to the task.

      Head motion was estimated at each time point as part of standard preprocessing, and motion parameters were included as nuisance regressors in all first-level models. Because motion estimates are defined per volume rather than per experimental condition, condition-specific motion metrics were not explicitly computed. Importantly, this approach removes motion-related variance uniformly across the time series and therefore controls for potential motion effects across all task conditions. Any residual motion would be expected to increase noise rather than systematically bias condition contrasts.

      (2) You also may want to indicate if normalization followed the SPM 12 default and the data was resampled to 2 x 2 x 2 mm, or kept the same. It is not stated in the data preprocessing subsection of the methods.

      We thank the reviewer for this suggestion. We have now clarified this point in the manuscript (page 41):

      “In addition, spatial normalization was performed with data normalized to Montreal Neurological Institute (MNI) space and resampled to a 2 × 2 × 2 mm<sup>3</sup> voxel grid, followed by spatial smoothing with a 6-mm full-width at half-maximum Gaussian kernel.”

      (3) It is important to indicate how many subjects went into each analysis. Also, it is not clear, based on the current methods section, how many observations per condition were used. That can be reported in the text or the figures.

      We thank the reviewer for this comment. This information has now been added to the Methods section and the relevant figure legends, as described in our response to Comment 2 (Reviewer 3, Public Review).

      References

      Triantafyllou C, Polimeni JR, Wald LL. 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55:597–606. DOI: https://doi.org/10.1016/j.neuroimage.2010.11.084, PMID: 21167946

    1. Author response:

      eLife Assessment

      This manuscript reports an important study in which the authors apply smFRET imaging to probe HIV-1 Env conformational dynamics in the presence of antibodies. Previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Through the cutting-edge application of smFRET imaging, the study provides convincing insights into the mechanisms of action of relevant antibodies.

      We appreciate this positive assessment and thank the reviewers for their time and constructive comments. We will make the following changes in the revised manuscript.

      (1) Clarify the distinction between suppression efficiency and functional cost.

      (2) Add controls: smFRET experiments in the presence of monovalent 10E8.4 and iMab individually and compare results with the bivalent 10E8.4/iMab that we currently have.

      (3) Increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling.

      (4) Add discussion on conformational populations probed by smFRET versus structural analyses, Env conformational heterogeneity, ligand effects, and how these approaches complement each other.

      (5) Further clarify the assignments of multiple conformational states by smFRET, the heterogeneity of Env spikes and virion morphology by cryoET, and the focus of the current smFRET-focused storyline.

      Please find below our provisional responses to the public reviews. We will provide detailed point-by-point responses upon submission of the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have considered a panel of antibodies that target epitopes at the gp120/gp41 interface (8ANC195 and PGT151), the fusion peptide in the gp41 domain (VRC34), and the MPER region of gp41 (DH511.2_K3 and VRC42). They also investigate 10E8.4/iMab, which is an engineered bispecific antibody that targets the MPER and the CD4 receptor. On a technical note, they have applied a double amber codon-readthrough strategy to incorporate the non-natural TCO*A amino acid, which gets labeled through click chemistry. This approach should result in less disruption of the native Env structure as compared to the peptide insertion previously used for smFRET imaging of Env. Furthermore, previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Altogether, through the cutting-edge application of smFRET imaging, the study provides novel insights into the mechanisms of action of interesting and clinically relevant antibodies.

      Thank you for the positive comments!

      In validating the functionality of the S401TAG/R542TAG Env, the authors performed infectivity assays and observed 20% infectivity as compared to wild-type (Figure S2A). However, the text equates this with "20% dual-amber suppression efficiency". This would benefit from some explanation. Why do the authors interpret infectivity as reporting on amber suppression efficiency, and not the functional cost of modifying Env, which is probably unavoidable? Or a combination of both? Is there data to suggest that 100% amber suppression would leave Env 100% functional? If so, this would be valuable to show. If not, the text should be clarified.

      We acknowledge this concern and will clarify the distinction between suppression efficiency and functional cost in the revision. The observed reduction in infectivity does not translate into the functional loss; instead, it more reflects the efficiency of suppression (one of the critical limitations of applying genetic code expansion in mammalian cells), as evidenced by reduced Env expression and incorporation on virions (Fig. 1B). In support of the preservation of Env functionality, tag-free and dual-ncAA-incorporated Env virions exhibited similar dose-dependent neutralization sensitivity against trimer-specific neutralizing antibodies (Fig.1D). We have previously discussed several limitations of amber suppression in mammalian cells combined with smFRET viral systems (PMID: 38232732; PMID: 40716060). In brief, orthogonal tRNA/aaRS pair–mediated amber suppression (reassigning/repurposing amber stop codons to non-canonical amino acids) of the introduced ambers in the target protein (Env in our case) must compete with the cellular translation system, particularly release factors that recognize amber codons and terminate translation. Readthrough of endogenous amber codons in virus-producing cells (in our case, HEK293T) can disrupt normal protein expression and virus production. Similarly, readthrough of preexisting amber codons in HIV-1 ORFs other than the targeted ambers in Env can disrupt virus assembly, which we addressed by generating an amber-free provirus (PMID: 38232732). Introducing two amber codons into Env further reduces efficiency, as dual suppression requires two sequential successful suppression events within the same Env molecule.

      The authors state that the contour plots in Figure 2E reveal "dynamic sampling" of the observed FRET states. Strictly speaking, as presented, the contour plots (and FRET histograms) provide no information on dynamics per se. They indicate only the relative thermodynamic stabilities of the FRET states; transitions between states are a matter of interpretation. The TDPs, shown later in Figure 5A, nicely display the dynamics. More importantly, interpretation of the contour plots is challenging, as some seem to suggest an evolution toward lower FRET states. This is especially evident in Figures 2F and 3D, which suggest that the system evolves into a stable 0.1-FRET state (CO) after about 3 sec. Unless the authors want to conclude something from this, I would suggest that they consider removing the contour plots, since their interpretations are fully supported by the FRET histograms alone.

      We agree and will remove the contour plots, as they do not add meaningful information beyond what the histograms show.

      The data indicating that Env conformation is manipulated by 10E8.4/iMab is interesting. If I understand correctly, 10E8.4/iMab is an engineered antibody with one Fab targeting MPER and the second Fab targeting CD4. In the absence of CD4, could the difference between 10E8.4/iMab and the other MPER antibodies be due to 10E8.4/iMab being monovalent with respect to MPER binding?

      We appreciate this question. To answer this, we will perform smFRET experiments in the presence of 10E8.4 and iMab individually and compare those with the bivalent 10E8.4/iMab.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Xu and co-workers unveil two distinct modes of neutralisation by gp41-targeted broadly neutralizing antibodies on HIV-1 Env. So far, it was unclear as to how the mechanism of neutralisation occurred for this subset of neutralising antibodies (that can target the fusion peptide or the membrane proximal external region of the gp41 subunit). Thanks to single-molecule FRET, the authors show that the majority of broadly neutralizing antibodies stabilize the closed Env conformation (named State 1 since the original work by Munro and colleagues PMID: 25298114). Interestingly, the bivalent 10E8.4/iMab stabilized in turn a CD4-bound open state of Env. The two modes of neutralization described for these antibodies show previously unknown allosteric mechanisms that stabilize closed and open Env conformation, stressing the importance of Env conformational dynamics and its efficiency during the process of fusion.

      Strengths:

      The article is well-written, and the figures fully depict the data in a convincing way. The authors have used smFRET, which is now established in the field as a good tool to assess Env dynamics.

      We appreciate these positive comments!

      Weaknesses:

      (1) The limited controls on how click chemistry affects Env (as labelled Env HIV virions were not evaluated).

      We agree. Our validation focused on ncAA-incorporated Env HIV-1 virions, but not the fluorescently labeled virions. To address this, we will increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling. We will attempt to do it. However, we expect that the additional handling time required for labeling and the centrifugation steps needed to remove free dyes, which can deform/disrupt viral membranes and degrade virions, together with the low dual-amber suppression efficiency, will make these experiments technically challenging as an additional layer of functional validation in live cells. On a related note, we have previously performed real-time tracking of single click-labeled Env virion internalization and trafficking in live cells (PMID: 38232732), supporting the retained functionality of click-chemistry-labeled Env.

      (2) Photobleaching of donor and acceptor molecules occurs right after 10sec exposure.

      We acknowledge this limitation and will include it in the corresponding section.

      (3) Other limitations are well described in the corresponding section.

      We appreciate this comment.

    1. Author response:

      Many thanks to the three reviewers and the editors for their comments and review. These are fair, consistent (across positives and negatives), and largely expected comments. On behalf of my coauthors, I use this letter as a provisional response to indicate what we can and intend to change in a revised manuscript.

      (1) A major comment from all three referees is that our single-nucleus RNA-seq data should be validated. The reviewers differ in the detail of exactly what they think should be validated, but they refer, individually, to (1) the discovery of ‘cell types’ themselves, (2) pathways inferred from trajectory analysis, (3) differentially expressed genes in plucked vs control condition at four time points and/or (4) inferred ligand-receptor pairs from cell-cell communication analysis, across the same time course. 

      I think we’re actually on pretty good footing for 1-3, because of work we’ve published in the cichlid fish model.

      I tally that in references cited in the manuscript, and highlighted below (References 1, 10, 11, 29, 30, 31), we present 29 figures with 273 individual figure panels of histology, in situ hybridization and immunohistochemistry featuring genes expressed across stages of tooth development and replacement. These genes are markers of dental competency and regenerative potential.

      In addition, in multiple of these papers, we use pharmacology to manipulate the role of key pathways (Hh, BMP, Wnt, Notch) in cichlid tooth development and replacement. Identification and validation of cell types make use of these published data in cichlids (for markers matched to mouse), as well as an unbiased computational approach (SAMap) that draws homology between cichlid and mouse dental cell types, based on shared global patterns of gene expression.

      In short, experiments to validate cell types, gene expression and pathways active in cichlid teeth are published and referenced herein. I noticed that these references (some of which include Gareth Fraser as an author, when he was a postdoc in my group; for Reviewer 2) were cited in the Introduction and not the Rationale/Methods or Results section (such that reviewers may have missed them). We will be clearer about this in the revision. 

      We have not validated nor analyzed functionally the ligand-receptor pairs inferred from cell-cell communication analysis, across four times points of accelerated replacement. This work is beyond the scope of the current paper, and we will include a statement that these computational inferences represent hypotheses to be tested (although many of these ligand-receptor pairs have been noted in other ‘tooth’ publications that we cite).

      (2) The biggest weakness of our manuscript, noted by referees, is that we do not provide serial histology to accompany our snRNA-seq time course after plucking. We describe this as a limitation in the “Study limitations and future direction” section of the Discussion, but we can and will be stronger about why this is a weakness (e.g., we do not explicitly know for instance, the degree of damage done to tissue in the plucking paradigm). We do know that the jaw recovers quickly, but we do not know how different the plucked side is from the control side (which is also undergoing active replacement and remodeling). Uniting reviewer comments 1 and 2 here, the best future approach is a spatial transcriptomics reference at distinct stages of the plucking<>recovery paradigm, as we framed in the Discussion section, because this addresses simultaneously the state of dental/jaw tissue and the in situ expression of thousands of genes.

      (3) Reviewers asked about the presence of stromal cells in our snRNA-seq data. Because of this and another comment on the posted preprint version of our manuscript, we will take another look at the mesenchymal compartment of the snRNA-seq data and trajectories built from it.

      (4) Multiple (minor) suggestions for clarification in text and figures will be adopted. 

      Generally, I don’t think we’ll require reviewer re-engagement on the revision; editor review should be sufficient.

      References cited in the manuscript, highlighted here:

      (1) Fraser, G. J. et al. An Ancient Gene Network Is Co-opted for Teeth on Old and New Jaws. PLoS Biol. 7, e1000031 (2009).

      (10) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. Common developmental pathways link tooth shape to regeneration. Dev. Biol. 377, 399–414 (2013).

      (11) Bloomquist, R. F. et al. Developmental plasticity of epithelial stem cells in tooth and taste bud renewal. Proc. Natl. Acad. Sci. 116, 17858–17866 (2019).

      (29) Streelman, J. T., Webb, J. F., Albertson, R. C. & Kocher, T. D. The cusp of evolution and development: a model of cichlid tooth shape diversity. Evol. Dev. 5, 600–608 (2003).

      (30) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. A periodic pattern generator for dental diversity. BMC Biol. 6, 32 (2008).

      (31) Bloomquist, R. F. et al. Coevolutionary patterning of teeth and taste buds. Proc. Natl. Acad. Sci. 112, (2015).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Gao et al. describes the effect of capsaicin on the NRF2/KEAP1 pathway. The authors carried out a set of in vitro and in vivo experiments that addressed the mechanisms of the protective effect of capsaicin on ethanol-induced cytotoxicity.

      The authors conclude that capsaicin activates NRF2, which leads to the induction of cytoprotective genes, preventing oxidative damage. The paper shows that capsaicin may directly bind to KEAP1 and that it is a noncovalent modification of the Kelch domain.

      The authors also designed new albumin-coated capsaicin nanoparticles, which were tested for.

      I appreciate the authors' experimental efforts to strengthen the study's conclusions. However, in my opinion, the paper is still not fully technically sound, which weakens the strength of the evidence.

      Thank you very much for your constructive review. We are truly gratified by your recognition of our key findings—that capsaicin activates NRF2 by disrupting the KEAP1–NRF2 interaction, as conclusively demonstrated through multiple methods including Pull-down, Co-IP, CETSA, SPR, BLI, deuterium exchange MS, CETSA, MS simulations and other target gene expression assays, and that albumin-coated capsaicin nanoparticles exhibit therapeutic effects in vivo. Your technical suggestions were particularly valuable. In this revised version, We have carefully and thoroughly addressed the points raised by you and the other reviewer by providing additional data, including nuclear-cytoplasmic fractionation assays performed with an alternative NRF2 antibody to strengthen and clarify the supporting evidence. We believe this revision have significantly enhanced the overall quality and rigor of the manuscript. Regarding the limitation of the insufficient number of animals used in this article, we have also explained it in the main text. This is the revision we have made with our utmost efforts, and we hope it can meet your expectations to a certain extent.

      Reviewer #2 (Public review):

      Summary:

      In this paper the authors wanted to show that capsaicin can disrupt the interaction between Keap1 and Nrf2 by directly binding to Keap1 at an allosteric site. The resulting stabilization of Nrf2 would protect CAP-treated gastric cells from alcohol- induced redox stress and damage as well as inflammation (both in vitro and in vivo)

      Strengths:

      One major strength of the study is the use of multiple methods (CoIP, SPR, BLI, deuterium exchange MS, CETSA, MS simulations, target gene expression) that consistently show for the first time that capsaicin can disrupt the Nrf2/Keap1 interaction at an allosteric site and lead to stabilization and nuclear translocation of Nrf2.

      Moreover, efforts to show causal involvement of the Keap/Nrf2 axis for the made cellular observations as well as addressing potential off target effects of the polypharmacological CAP appreciated.

      One point that still hampers a bit of full appreciation of the capsaicin effect in cells is that capsaicin is not investigated alone, but mostly in combination with alcohol only.

      Moreover, the true add-on value of the developed nanoparticles remains obscure.

      The partly relatively high levels of NRF2 in putatively unstressed cells question the validity of used models.

      The rationale for switching between different CAP concentrations is unclear /not entirely convincing.

      The language and introduction could be improved.

      Overall, the authors are convinced that capsaicin (although weakly) can bind to Keap1 and releases Nrf2 from degradation, with relevance for biological settings. With this, the authors provide a significant finding with marked relevance for the redox/Nrf2 as well as natural products /hit discovery communities.

      Thank you very much for your positive assessment of our work and for the constructive suggestions to make it better. We also believe that capsaicin (CAP) offers new insights into the activation of NRF2. In this revision, we have addressed the shortcomings with the following efforts:

      (1) The inclusion of a capsaicin (CAP)-only treatment group—covering the same doses and time points as the ethanol co-treatment—revealed that CAP alone can directly inhibit the KEAP1–NRF2 interaction (Figure 3d,3e and Figure 4g), and promote the entry of NRF2 into the nucleus (Figure 2c), resulting in moderate NRF2 activation (Figure 3h and Figure 4d) after carefully revision. However, this effect was significantly enhanced in the presence of ethanol. We attribute the results to the ROS-enriched environment generated by ethanol. Given that KEAP1 is a sensor highly susceptible to oxidative modification, the combination of CAP's allosteric regulation and ethanol-induced oxidative stress promotes a more robust and persistent dissociation of the KEAP1–NRF2 complex. These findings align fully with the established model in which KEAP1–NRF2 dissociation is markedly facilitated under oxidative stress conditions.

      (2) From a translational and industrial perspective, nanoparticle formulations offer improved palatability compared with CAP itself; based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage. Moreover, ELISA results indicate that HSA@CAP nanoparticles exhibit enhanced anti-inflammatory activity compared with CAP alone (Figure 8d). In light of these findings, we prefer to retain this part of the data.

      (3) Your question is highly professional and well taken. In GES-1 (Fig. 1i) and UC-MSC (Fig. 1l), the expression of NRF2 was low in unstressed conditions, and the transcription and translation of its downstream targets indicate no functional activation, supporting the validity of our model. Accordingly, the control groups in some experiments were suboptimal. We repeated these experiments with additional biological replicates and used cells with early-passage; the discrepancies likely relate to high passage numbers and serum batch effects, but they do not affect our main conclusions. We have replaced the relevant data in the revised manuscript (Fig. 2c and Fig. 3h) and hope this addresses your concern.

      (4) In GES-1 cells, 8 μM consistently yielded the optimal effect, and we therefore maintained this concentration in other experiments in the same cells. And for other experiments, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM (Figure 3h), consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. Therefore, 8 μM were insufficient in Fig. 2g as we repeated many times. This likely reflects cell line–specific differences and the experimental context in 293T cells, including transfection and overexpression of NRF2 and Ub-K48-Myc, which necessitated a relatively higher CAP concentration.

      (5) Thank you very much for noting that the language and Introduction could be further improved. We have rechecked the manuscript for grammar and style and revised the Introduction with a more comprehensive, up-to-date description of the NRF2 pathway. The main changes include rewriting the third and forth paragraph of the Introduction, consolidating/removing irrelevant sections for greater clarity and concision. We hope these updates meet your expectations.

      Figure 2C: It is still not clear why naïve (unstressed /untreated cells) already show rather high nuclear abundance of Nrf2 (shouldn´t Nrf2 be continuously tagged for degradation by Keap1)

      Thank you for your constructive comments. In response to the concern raised, we repeated the nuclear–cytoplasmic fractionation experiments in early-passage GES‑1 cells and performed three independent replications using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The results showed low nuclear NRF2 levels under basal conditions, consistent with the KEAP1-mediated continuous degradation mechanism. Accordingly, we have updated the relevant figure in Figure 2C. Nevertheless, NRF2 could still be detected in the control group, which is basically consistent with the reported baseline levels of NRF2 observed in GES - 1 cells and other cell lines [1,2,3]. Therefore, this does not indicate the failure of model construction.

      References:

      (1) Wang, R. et al. Costunolide ameliorates MNNG-induced chronic atrophic gastritis through inhibiting oxidative stress and DNA damage via activation of Nrf2. Phytomedicine 130, 155581, doi:10.1016/j.phymed.2024.155581 (2024).

      (2) Li, Y. F. et al. Construction of Magnolol Nanoparticles for Alleviation of Ethanol-Induced Acute Gastric Injury. J Agric Food Chem 72, 7933-7942, doi:10.1021/acs.jafc.3c09902 (2024).

      (3) Li, M., Wang, J., Xu, Z., Lin, Y. & Dong, J. Atraric acid attenuates chronic intermittent hypoxia-induced brain injury via AMPK-mediated Nrf2 and FoxO3a antioxidant pathway activation. Phytomedicine 148, 157261, doi:10.1016/j.phymed.2025.157261 (2025).

      Figure 2G-H: Why switch to rather high concentrations?

      To validate ubiquitin-mediated degradation in Figure 2G-H, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM, consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. These choices reflect intrinsic cell line properties and protein overexpression in 293T, but do not affect our investigation of capsaicin’s mechanism.

      Figure 2I: in the pics of mitochondria the control mitochondria look way more punctuated (likely fissed) than the ones treated with EtOH or EtOH + CAP. Wouldn´t one expect that EtOH leads to mitochondrial fission and CAP can prevent it?

      Thank you very much for your comments. We re-acquired and analyzed mitochondrial morphology by the Leica STELLARIS 5 Confocal Microscope Platform, which our school didn't have at that time. The earlier wide-field fluorescence images lacked sufficient magnification and resolution, which obscured details and may have caused confusion. In the revised manuscript, we have replaced them with confocal images showing EtOH-induced mitochondrial abnormalities, whereas CAP treatment restored the reticular network, as expected. We also added a CAP-only group, which shows no discernible effect on mitochondrial morphology.

      Figure 3H: High basal Nrf2 levels in unstressed/untreated HEK WT cells, why?

      Thank you for raising this concern. We repeated the experiments in HEK-293T (WT) cells in better condition, and validated the results using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The data consistently show relatively low NRF2 expression under basal conditions, in line with the KEAP1-mediated continuous degradation mechanism. We have corrected the corresponding figures accordingly.

      Figure 4a: Inclusion of an additional Keap1 binding protein (one with a ETGE motif) would have been desirable (to get information on specificity/risks of off-target (unwanted) effects of CAP).

      Thank you for this valuable suggestion. We have added CETSA experiments for DPP3, which contains an ETGE motif. The results show that endogenous DPP3 expression was low in GES-1 cells and does not bind CAP in vitro that BLI experiments indicated the KD was above 1 mM in Supplementary Figure 4h and 4i, and thus CAP does not thermally stabilize DPP3 at the cellular level. Therefore, the risk of off-target effects via binding to ETGE-containing proteins like DPP3 appears minimal.

      Figure 4D: Why is there no stabilization of Nrf2 by CAP in lane 2?

      Thank you for raising this concern. We repeated the experiment in GES‑1 cells and performed three independent replicates using an alternative, widely recognized Nrf2 antibody (Cell Signaling Technology, Cat. No. 12721). The data show that CAP alone increases NRF2 expression to some extent. We have updated the corresponding figures accordingly in Figure 4D.

      Figure 4f: 5% DMSO is a rather high solvent concentration, why so high (the solvent alone seems to have quite marked effects!)

      Thank you for raising this concern. Our original figure legend was misleading and has been corrected. Only the highest CAP concentration (500 μM) contained 5% DMSO as the vehicle; the other CAP concentrations, prepared by serial dilution in complete medium, did not contain such high solvent levels (e.g., 65.5 μM CAP contained 0.625% DMSO). This experiment included transient overexpression of NRF2-HA as purified recombinant NRF2 protein is prohibitively expensive, 10 ug needs about 900 GBP from Abcam. We therefore conducted a preliminary assay by incubating purified Kelch-Flag protein with cell lysates overexpressing NRF2-HA and measured NRF2 levels in the supernatant and pellet in Figure 4f. Nevertheless, the conclusion that CAP disrupts the NRF2–KEAP1 interaction is better supported by SPR (Figure 3d), Co-IP (Figure 3e) and Pull-down (Figure 4g).

      Figure 6/7: not expert enough to judge formulations and histology scores. However, the benefit of the encapsulated capsaicin does not become entirely clear to me, as CAP and IRHSA@CAP mostly do not significantly differ in their elicited response.

      Thank you very much for the valuable suggestion. Although histopathology suggests only modest differences between the two treatments, the nanoparticle group showed markedly lower inflammatory cytokine levels than pure CAP: IL-1β, IL-6, TNF-α, and CXCL-1 were significantly reduced, while the anti-inflammatory cytokine IL-10 was significantly increased (Figure 8d). These changes are important for maintaining a healthy gastric environment and may better support digestive function in vivo. Accordingly, from a translational and industrial perspective, nanoparticle formulations also offer improved palatability compared with capsaicin itself. Based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage.

      Figure 7: Rebamipide was introduced as positive control in the text with an activating effect on Nrf2, but there is no induction of hmox and nqo in Figure 7f, why? It does not look as the positive control was wisely chosen.

      Thank you for your insightful comment. We agree that this result was suboptimal and sincerely apologize for the oversight. We are currently facing significant constraints: the original cDNA is depleted, and funding cuts have severely limited our resources for reagents and animal studies. A full repetition of the rat experiment at the original scale and quality is not feasible in the short term. To ensure the scientific rigor of the paper, we have made the difficult decision to remove Figure 7f. We believe this is necessary to base our conclusions on the most robust evidence. We apologize for any inconvenience and hope this solution is acceptable. We have revised the manuscript accordingly and appreciate your understanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors did not provide data validating the NRF2 antibody for in vitro studies, particularly for IF data where there is no molecular mass indication for NRF2. The IF data suggest that NRF2 is primarily located in the cytoplasm under control conditions (Fig. 2A), whereas the WB data show a strong band in the nucleus (Fig. 2C). What is the reason for this inconsistency?

      We sincerely appreciate your valuable comments. Previously, we used an NRF2 antibody (Cat. No. 16396-1-AP, Proteintech); the vendor’s data show that shRNA knockdown in HeLa cells markedly reduces NRF2 at the expected molecular weight and IF data in HepG2 cells show a trace amount of cytoplasmic localization in controls and clear nuclear translocation after MG-132 treatment, which indicates that this antibody can be used for immunofluorescence (IF) to indicate the subcellular localization of NRF2, and our experimental results are also in line with expectations in Figure 2A. In addition, to address the reviewer's concern, we purchased another NRF2 antibody (Cat. No. 12721, Cell Signaling Technology), which was also highly validated. In this version, we repeated nuclear-cytoplasmic fractionation experiments and other important experiments using this antibody. Together, these data confirm the low basal level of NRF2 in the absence of stress and also show that CAP could improve the expression of NRF2. We have corrected the Figure 2C so that the WB and IF results are now consistent. We wish to reiterate our deep appreciation for the professionalism and rigor of your review.

      (2) Additionally, I could not find Supplementary Figure 4F-I, which concerns TRPV1. These figures are mentioned in the response to reviewers but are missing from the manuscript-please double-check.

      The supplementary figures were initially submitted as a compressed archive. We recognize that there might have been an issue with the transfer of this file to the reviewers. As shown in Supplement Figure 4f to Supplement Figure 4i, we further explored the TRPV1 and DPP3 to detect the potential off-target effects of CAP respectively. Capsazepine (CAPZ), which is TRPV1 receptor antagonist did not affect the protection of CAP against GES-1 (Fig S4f and S4g), which may indicate that CAP activation of NRF2 does not have to depend on TRPV1. The binding of CAP with DPP3, containing an ETGE motif and can bind to KEPA1, was detected by BLI, and we found that the KD between CAP and DPP3 was 1.653 mM(>100 μM), which may indicate the potential off-target effect of CAP is low because CAP had a relatively strong binding force with KEAP1 about 31.45 μM (Fig S4h and S4i).

      (3) I am also somewhat unconvinced by the data obtained from NRF2 KO mice. First, it appears that some NRF2 KO mice respond to CAP treatment well while others do not, resulting in a high standard deviation. To strengthen the conclusions, it would be advisable to use a larger number of animals to confirm or exclude the effect. This is precisely why I still believe that three rats per group are insufficient for the in vivo studies. Please emphasize in the manuscript that a limitation of this study is the use of only three rats per group for the in vivo experiments.

      Thank you very much for your question and suggestions. As for the rat experiments in Figure 7 and Figure 8, there are many other references available, as noted in the introduction: “Recent experiments conducted in rats have demonstrated that red pepper/capsaicin (CAP) possesses significant protective effects on ethanol-induced gastric mucosal damage , and the mechanisms involved may relate to the promotion of vasodilation[6,7], increased mucus secretion[8] and the release of calcitonin gene-related peptide (CGRP)[9,10]. However, it is important to note that the specific role of the antioxidant activity of CAP has not been thoroughly investigated.” Therefore, we conducted extensive literature research and preliminary experiments to ensure that our formal experiment with 8 groups could yield relatively stable results. Of course, we admit that using more rats in vivo would make the conclusion more reliable. Unfortunately, the project was delayed due to funding issues. We are currently facing significant resource constraints: reductions in research funding from the National Natural Science Foundation have severely limited our funding for reagents and animal experiments in this study. As a result, it has become impossible to fully repeat all animal experiments at the original scale and quality in the short term. Regrettably, to supplement the NRF2 knockout animal-related experiments (n=6), we have already spent approximately 70,000 RMB (about 10,000 USD). We have made tremendous efforts to ensure the scientific rigor of the paper. We sincerely apologize for any inconvenience caused. At the same time, we fully recognize the importance of increasing the sample size in animal experiments for this study. We have explicitly acknowledged this as a limitation of our work in the Discussion Section and have revised the manuscript accordingly. We greatly appreciate your understanding.

      (4) Furthermore, please double-check the blot in Fig. 9D. Tubulin and P-p65 bands appear very similar, and tubulin disappears in response to EtOH and EtOH/CAP treatment in KO mice. Is it the case? I am not sure the quantitative data reflect the WB bands. Please verify that.

      We sincerely appreciate your valuable feedback on our manuscript. Indeed, we may have included bands that do not meet the requirements due to our eagerness, and we are very grateful for your pointing this out; it was indeed a significant oversight on our part. I will definitely pay more attention to careful checking in the future. In response to this, we have re-conducted the experiments using the preserved tissue samples and have accordingly updated Figure 9d. Thank you for your insightful suggestions.

      Reviewer #2 (Recommendations for the authors):

      Presentation:

      The data with the encapsulated CAP appear a little as side arm that does not bolster your main message (maybe take out and elaborate on this topic more extensively in another manuscript)

      We sincerely thank the reviewer for this suggestion. However, based on the ELISA results demonstrating that nano-capsaicin exerts a significantly stronger anti-inflammatory effect than pure capsaicin (CAP), and considering its superior sensory profile for industrial applications (confirmed by our sensory evaluations), we believe these data provide valuable insights. Therefore, we would prefer to retain this section in the manuscript and hope for your understanding.

      Revise the introduction on the Nrf2 signaling pathway ...as it is written at the moment, someone outside the Nrf2 field might have trouble to understand

      Thank you for the valuable suggestion again. We have rewritten the introduction to the NRF2 signaling pathway to improve accessibility for readers outside the field.

      “The Kelch-like ECH-associated protein 1 (KEAP1)–Nuclear factor erythroid 2–related factor 2 (NRF2)–antioxidant response element (ARE) pathway is a core defense mechanism against oxidative and electrophilic stress[11]. Under homeostatic conditions, KEAP1 acts as a linker protein for the Cul3-E3 ubiquitin ligase complex, continuously promoting the ubiquitination and proteasomal degradation of NRF2, thereby maintaining NRF2 at basal levels[12]. When oxidative or electrophilic stress occurs, critical cysteine residues in KEAP1 are modified, or the interaction between the ETGE/DLG motifs on NRF2 and the Kelch domain of KEAP1 is disrupted, allowing NRF2 to escape degradation, accumulate, and translocate to the nucleus. There, NRF2 forms heterodimers with small Maf proteins and binds to ARE, inducing the expression of antioxidant and cytoprotective genes such as those involved in glutathione metabolism, NADPH regeneration, phase II detoxifying enzymes, and drug efflux transporters, thereby restoring redox balance within the cell and reducing oxidative damage[13].

      Classical NRF2 agonists, such as sulforaphane, are small molecules that bind to KEAP1 and covalently modify its cysteine residues, thereby altering the binding affinity between KEAP1 and NRF2 [14]. However, traditional covalent agonists may induce sustained overactivation of NRF2, leading to adverse side effects and limiting clinical application [15]. Consequently, recent efforts have shifted toward the development of non-covalent NRF2 agonists, which are generally associated with lower toxicity and greater translational potential, enabling more controlled enhancement of NRF2 activity and offering new insights and therapeutic opportunities in antioxidant-related interventions.”

      The authors should check and review extensively for improvements to the use of English to get rid of awkward phrases /wording.

      Thank you very much for this helpful comment. We sincerely appreciate the suggestion and have carefully re‑read and further polished the entire manuscript to remove awkward phrasing and improve the readability of expressions and phrases. We hope these revisions address your concern, and we remain grateful for your guidance.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

      We thank the reviewer for the positive assessment of our manuscript as well as for the many constructive comments during its revision.

      Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cisregulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use stateof-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      We thank the reviewer for these positive comments.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      We thank the reviewer for this assessment. We have carefully revised the manuscript to improve clarity, readability, and consistency throughout. Specifically:

      (1) Streamlined several sections to improve narrative flow. Specially in the abstract, model and dCas9 sections.

      (2) Corrected grammatical issues across the manuscript. As the reviewer pointed out, we found many in the text. We are grateful the reviewer was insistent in this point.

      (3) harmonized formatting and terminology. Many small inconsistencies were found in the figure legends, that have been largely adapted.

      We believe these changes substantially improve the accessibility and overall presentation of the work. However, we have not shortened the manuscript, as we want to transmit the complexity of attempting to dissect non-coding regions, as well as not oversimplify the phenotypes obtained.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      We appreciate this important point. We agree that our study does not fully resolve the upstream mechanisms that ultimately position the AP and DV boundaries. Our goal was instead to determine how the ap early enhancer (apE) contributes to the correct spatial relationship between these boundaries. To address the reviewer’s concern, we have revised the Introduction and Discussion to soften the framing of this question and to more clearly state the scope of our conclusions. We now emphasize that our work provides mechanistic insight into how apE function impacts DV/AP boundary organization, rather than claiming to fully resolve the upstream positioning mechanism.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      We thank the reviewer for this thoughtful evaluation. We agree that the genetic evidence for Grain (Grn) is stronger and more direct than for Antennapedia (Antp). In response, we have revised the manuscript to more carefully calibrate the strength of our conclusions regarding Antp.

      Specifically, we have:

      Softened the language throughout to describe Antp as a candidate HOX input,

      Explicitly stated that direct binding to the m3 site remains to be demonstrated biochemically, and

      Clarified in the Discussion that our data support an early contributory role for Antp rather than establishing it as the definitive HOX factor acting at apE.

      We believe retaining the Antp data is important because:

      (1) The m3 site shows strong HOX dependency in vivo,

      (2) Early Antp depletion produces clear defects in ap expression, and

      (3) Recent literature supports an early requirement for Antp in wing development.

      Together, these observations provide a coherent working model while appropriately acknowledging current limitations. We hope the reviewer agrees that the revised framing now appropriately reflects the strength of the evidence.

      (3) I am not sure whether the term hemizygous is used properly

      We use the term hemizygous as in classical genetics, in which an individual carrying an allele opposite a chromosomal deletion is considered hemizygous at that locus (see for example the entry for ap<sup>4</sup> mutant in the red book (Lindsley and Zimm, The Genome of Drosophila melanogaster):

      “… ap4 /Df(2L) M4IA-54 hemizygote has nearly normal complement of bristles but otherwise resembles ap4 homozygote (Butterworth and King, 1965).”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, and likely to benefit the research community.

      Comments on revisions:

      The concerns raised have been addressed. The heparin separator-based cfDNA method described in this study is likely to benefit the research community. I have no further scientific concerns.

      We appreciate the encouragement and recognition.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      Comments on revisions:

      As suggested previously, the Pearson correlation analysis tends to be overstated; please replace it with Spearman correlation in the whole manuscript. Currently, the authors include both of them in the abstract, method, results, and graphics, all of which are required to be updated to only use Spearman correlation results.

      I don't have other concerns about the manuscript.

      We entirely agree and have removed all instances of Pearson correlation from the paper, including the abstract, method, results, and graphics. Only the Spearman’s correlation was used.

      We appreciate your efforts and helpful comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines how different parts of the brain's reward system regulate eating behavior. The authors focus on the medial shell of the nucleus accumbens, a region known to influence pleasure and motivation. They find that nerve cells in the front (rostral) portion of this region are inhibited during eating, and when artificially activated, they reduce food intake. In contrast, similar cells at the back (caudal) are excited during eating but do not suppress feeding. The team also identifies a molecular marker, Stard5, that selectively labels the rostral hotspot and enables new genetic tools to study it. These findings clarify how specific circuits in the brain control hedonic feeding, providing new entry points to understand and potentially treat conditions such as overeating and obesity.

      We thank Reviewer 1 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      (1) Conceptual advance: The work convincingly establishes a rostro-caudal gradient within the medNAcSh, clarifying earlier pharmacological studies with modern circuit-level and genetic approaches.

      (2) Methodological rigor: The combination of fiber photometry, optogenetics, CRISPR-Cas9 genetic engineering, histology, FISH, scRNA-seq, and novel mouse genetics adds robustness, with complementary approaches converging on the central claim.

      (3) Innovation: The generation of a Stard5-Flp line is a valuable resource that will enable precise interrogation of the rostral hotspot in future studies.

      (4) Specificity of findings: The dissociation between appetitive and aversive conditions strengthens the interpretation that the observed gradient is restricted to feeding.

      We thank Reviewer #1 for their supportive feedback.

      Weaknesses and points for clarification

      (1) Role of D2-SPNs: Since D1 and D2 pathways often show opposing roles in feeding, testing, or discussing D2-SPN contributions would provide an important control and context. Since the claim is that Stard5 is expressed in both D1- and D2MSNs, it seems to contradict the exclusive role of D1R MSNs in authorizing food intake.

      We agree that D2-SPNs represent an important and relevant cell population in the context of our study. The Stard5-Flp line labels a mixed population of D1- and D2-SPNs, and we agree that dissecting the distinct contributions of Stard5<sup>+</sup> D1-SPNs and Stard5⁺ D2-SPNs to feeding behavior would be both interesting and informative.

      Although we understand the point raised by the Reviewer, we do not entirely agree that the expression of Stard5 in both D1- and D2-SPNs contradicts the established role of D1-SPNs in authorizing food intake. In the medNAcSh, D1- and D2-SPNs do not exert opposing functions. D2-SPNs project densely to the ventral pallidum and more sparsely to the lateral hypothalamus and, like D1-SPNs, are predominantly rewardinhibited at the population level (Domingues et al. 2025; Pedersen et al. 2022).

      We added the following in the discussion: “Additionally, a new study showed that manipulation of D2-SPN cell bodies in the medNAcSh modulates reward preference, self-stimulation, and palatable food intake in a frequency- and context-dependent manner (Requejo-Mendoza et al., 2025). Together, these findings suggest that D1- and D2-SPNs within the medNAcSh play complementary rather than opposing roles in reward processing. Hence, the potential role of rostral and caudal medNAcSh D1- and D2-SPNs in foodrelated behaviors beyond the act of consumption could be addressed in future work.” We also acknowledge that not investigating rostro-caudal gradients of D2-SPN in reward and aversion processing “represents a limitation of this work”.

      We fully agree that disentangling the specific contributions of Stard5<sup>+</sup> D1- and Stard5<sup>+</sup> D2-SPNs is an important next step. We have now crossed the Stard5-Flp line with Drd1-Cre and A2a-Cre lines. In a pilot experiment (not shown), we injected Flp+,Cre+, Flp+,Cre- and Flp-,Cre+ mice with 4 different FlpOn-CreOn AAVs to determine if any of these AAVs demonstrate specific expression. However, all AAVs exhibited moderate to strong leaky expression of the Cre, preventing reliable cell-type-specific targeting. This was not seen with Flp-only or Cre-only AAVs. The leakiness mentioned is a known challenge of FlpOn-CreOn AAVs and requires additional troubleshooting (e.g. reduce the titer). As this proved to be more challenging than anticipated, this work is ongoing and will be addressed in a future study rather than in the present revisions.

      (2) Behavioral analyses:

      (a) In Figure 2, group differences in consumption appear uneven; additional analyses (e.g., lick counts across blocks and session totals) would strengthen interpretation.

      The group differences in consumption that appear uneven likely reflect an overall lower total lick counts per session in the Control group. We have now added analyses on average lick counts per block and session totals in the newly included Supplementary Figure S7, which support the results shown in Figure 2.

      Although we observe a difference in total lick count across the entire session between Control and Rostral ChrimsonR mice (Supplementary Figure S7d), we deem the comparison in total session lick counts not that informative here. Instead, we would argue that the laser-on epoch is the most meaningful comparison. During this period, optogenetic activation had no effect on licking behavior in control mice, showed a nonsignificant trend toward reduced consumption in caudal ChrimsonR mice, and produced a significant reduction in lick counts when rostral medNAcSh D1-SPNs were activated (Figure 2g-i and Supplementary Figure S7c).

      We added in the discussion the following explanation:

      “In addition, comparison of licking behavior during the laser-off blocks revealed an interesting effect: following cessation of opto-stimulation, Rostral ChrimsonR mice licked more than Caudal ChrimsonR and Control mice, suggesting a possible compensatory overconsumption. One possible interpretation is that the optogenetic parameters used suppressed consummatory behavior without reducing the motivation to obtain the reward. Furthermore, consistent with the RTPPA results, activation of rostral D1-SPNs may be experienced as aversive and termination of the optogenetic stimulation could produce relief, which in turn reinforces the licking behavior. Further investigations are required to test these possibilities.”

      (b) The design and contribution of aversive assays to the main conclusions remain somewhat unclear and could be better justified.

      We appreciate the Reviewer’s comment regarding the design and contribution of the aversive assays. The rationale for including these experiments was to determine whether the rostro–caudal functional segregation observed for reward-related feeding also applies to aversive processing.

      First, using foot shock, we tested whether D1-SPNs in the rostral versus caudal medNAcSh respond differently to an aversive stimulus. In contrast to reward-related responses, both populations responded similarly, exhibiting excitation. Second, to ensure that this effect was not specific to a single stressor, we tested a second aversive stimulus (tail lift) and again observed comparable excitatory responses in rostral and caudal D1-SPNs. Third, we assessed whether optogenetic activation of these neurons is perceived as rewarding or aversive. Using a real-time place preference/aversion assay, we found that optogenetic stimulation of D1-SPNs in both subregions induced place aversion.

      Together, these experiments show that while D1-SPNs display region-specific effects on reward-related feeding behavior, their activity responses to aversive stimuli and the avoidance response to optogenetic activation are similar across rostral and caudal medNAcSh. This contrast strengthens our conclusion that the D1-SPN rostro-caudal gradient is specific to appetitive contexts.

      We added the following in the discussion:

      “Here, we further tested the existence of rostro-caudal gradients for aversion, asking whether D1-SPNs in the rostral vs. caudal medNAcSh respond differently to aversive stimuli. To ensure that any observed effects were not specific to a single stressor, we tested two distinct aversive stimuli (foot shock and tail lift). In both cases, we found no rostro-caudal differences, as D1-SPNs in both subregions responded with excitation. We also asked whether optogenetic activation of these neurons is perceived as aversive. Stimulation of D1- SPNs in both rostral and caudal medNAcSh promoted aversive behavioral responses in the RTPPA experiment. Hence, in contrast to the pharmacological inhibitions mentioned above, we did not detect differences in aversive behaviors according to the rostro-caudal medNAcSh site.”

      (c) The scope of behavior is mainly limited to consumption; testing related domains (motivation, reward valuation, and extinction) could broaden the significance.

      We thank the Reviewer for the suggestion to examine additional behavioral domains such as motivation, reward valuation, and extinction. We focused our efforts on consumption given the large body of literature demonstrating a very important role of the medNAcSh in reward consumption. However, we fully agree that feeding encompasses multiple phases, from appetitive and goal-directed behaviors to consummatory behavior, and that the NAc in general, and to some extent the NAcSh is involved in behaviors across this spectrum. For instance, prior work has shown that the medNAcSh is involved in reward preference and that this follows a rostro-caudal gradient (e.g. Pedersen et al. 2022).

      While it would be informative to directly test motivational processes using operant paradigms (e.g., nosepoke or lever-press tasks), our current experimental setup did not allow for these assays. Instead, we performed exploratory experiments manipulating the animals’ internal state with food deprivation. As expected, under food deprivation, total licking increased robustly in control mCherry and Rostral ChrimsonR medNAcSh mice as compared to ad libitum feeding (25 min session with 5 alternating on-off blocks: ad libitum Control = 692 and Rostral ChrimsonR= 1280 average total licks per session, see Figure 2g-h and Supplementary Figure S7d; food deprived Control =2428 and Rostral ChrimsonR =2390 total licks averaged for N=9 Control, N= 12 Rostral). Moreover, similar to ad libitum feeding, optogenetic activation of rostral D1-SPNs suppressed licking in food-deprived mice , albeit to a lesser extent than under ad libitum feeding conditions (Figure 2).

      These preliminary observations suggest that internal state modulates the role of rostral D1-SPNs in reward consumption, potentially reflecting an interaction between homeostatic and hedonic feeding circuits. However, as this line of investigation was exploratory and not pursued further in the present study, these data are not included in the main manuscript.

      Author response image 1.

      In vivo optogenetic stimulation of rostral medNAcSh inhibits reward consumption to a lesser extent after overnight food deprivation. a. Quantification of the average lick count per 5 min block in mCherry control mice vs. ChrimsonR (rostral) mice, showing a lower lick count in rostral medNAcSh ChrimsonR mice during the opto-stimulation epoch. Blocks of 5 min with or without opto-stimulation were alternated (on/off/on/off/on) for a total of 5 blocks. b. Quantification of mean lick counts in the opto-stimulation vs. non-opto-stimulation epochs shows a significant decrease in lick counts following stimulation of rostral medNAcSh D1-SPNs and no significant difference in the control mice. 2-way RM-ANOVA (group x epoch). Main effects: epoch F (1, 28) = 6.027, p=0.0206; group F (2, 28) = 1.448, p=0.2520; group x epoch F (2, 28) = 8.123, p=0.0017. Sidak post-hoc opto-stimulation vs. non opto-stimulation: Control on vs. off t(28) = 1.856, p=0.2061; Rostral medNAcSh on vs. off t(28) = 3.054, p= 0.0147. N=9 for Control mCherry; N=12 for Rostral medNAcSh ChrimsonR. c. Pie charts showing % of mice showing food intake inhibition (mean Δlick counts non-opto/opto>0) in each group: 42% of ChrimsonR rostral medNAcSh mice, 20% of controls. Data is mean ± SEM. *p<0.05; **p<0.01; ***p<0.001.

      (3) Molecular profiling:

      (a) Stard5 expression is present in both D1- and D2-SPNs; comparisons to bulk calcium signals and quantification of percentages across rostral and caudal cells would be helpful. The authors should establish whether these cells also express SerpinB2, an established marker of LH projecting neurons.

      We thank the Reviewer for this relevant point. In the photometry experiments (Figure 7) using Stard5-Flp mice, we acknowledge that the recorded signals reflect a mixed population of D1- and D2-SPNs. Based on quantification in a separate set of brains, we estimate that Stard5 is expressed in a variety of cell types, of which 35% are D1-SPNs and 30% are D2-SPNs (Supplementary Figure S3). While Liu et al. 2024 reported no overlap between Stard5 and Drd2, canonical marker for D2-SPNs, available transcriptomic data (Chen et al. 2021) and our own histological and RNA-based analyses (Figure 6 and Supplementary Figure S3) found Stard5 to be expressed in both D1-SPNs and D2-SPNs. Hence, indeed, Stard5 is a mixed population.

      We provide here the quantification of percentages of Stard5 expression across rostral and caudal cells: for instance, in the dorsal rostral medNAcSh, 79% of D1-SPNs and 76% of D2-SPNs express Stard5; in the ventral rostral medNAcSh the percentages are 47% and 55%, whereas the same percentages drop to 39 and 31% in the dorsal caudal medNAcSh and 15% and 20% in the ventral caudal medNAcSh.

      As suggested by the Reviewer, we also performed further analysis of the publicly available scRNA-seq dataset from Chen et al. 2021, which shows that 4.4% of all Stard5-expressing cells are also Serpinb2+, while 1.8% of all sequenced NAc cells are Stard5+/Drd1+/Serpinb2+ and 0.21% are Stard5+/Drd2+/Serpinb2+.

      (b) Verification of the Stard5-2A-Flp line (specificity, overlap with immunomarkers) should be documented more thoroughly.

      We agree with the Reviewer that a more detailed characterization of the Stard5-2A-Flp mouse line would be relevant for the validation of the line.

      In our study, we identified Stard5 as a marker gene that enables selective targeting of the rostral medNAcSh, as it is strongly enriched in the rostral medNAcSh (Figure 5-7). Stard5-Flp mice injected with Flp-dependent AAV in rostral medNAcSh, NAc core and dorsal striatum show specific AAV expression only in the rostral medNAcSh (Figure 7).

      Moreover, we show that the line is specific as injection of a Flp-dependent AAV in a Stard5-Flp negative line does not lead to expression (Figure 7c).

      However, re-analysis of the published scRNA-seq dataset (Chen et al. 2021) indicates that Stard5<sup>+</sup> cells comprise a heterogeneous population, including D1-SPNs (~35%), D2-SPNs (~30%), local interneurons (~18%), glial cells (~12%), and other cell types (Suppl. Fig. S3).

      Together, these data validate the Stard5-2A-Flp line as a spatially specific genetic entry point for the rostral medNAcSh, while highlighting the cellular heterogeneity of Stard5-expressing cells. Given the limited brain material left, we were not able to add additional colocalization analyses with immunomarkers, but agree this would be important to include in future studies.

      (c) The molecular analysis is restricted to a small set of genes; broader spatial transcriptomics could uncover additional candidate markers. See also above.

      We thank the Reviewer for this suggestion. Broader spatial transcriptomic analyses would indeed be highly valuable for identifying additional candidate markers. Our aim for the present study was to identify molecular landmarks to selectively target the rostral medNAcSh, but in a future study, we would be highly interested in building on our initial findings and providing an exhaustive molecular characterization of the region using spatial transcriptomics. We would be particularly motivated to do so, given the important functional specificity of the rostral NAcSh identified in the present publication.

      Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient, while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      We thank Reviewer 2 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implications in physiological and pathological feeding behavior. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings, and proposes relevant future directions

      We thank Reviewer #2 for their supportive feedback.

      Weaknesses:

      At this stage, identification and characterization of the activity of Stard5-positive neurons is a bit disconnected from the rest of the paper, as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to be determined whether their manipulation would result in comparable behavioral outcomes.

      We agree that this represents an important limitation of the current study. In our search for molecular markers of the rostral feeding hotspot, we identified Stard5 as a marker enriched in the rostral medNAcSh; however, Stard5 labels a heterogeneous population that includes D1- and D2-SPNs as well as other cell types. While Stard5<sup>+</sup> neurons display activity patterns similar to D1-SPNs, we acknowledge that whether their direct manipulation would produce comparable behavioral effects to D1-SPNs remains to be determined. Moreover, it remains to be determined how the activity and function of Stard5<sup>+</sup> neurons compares to D2-SPNs.

      To specifically isolate Stard5<sup>+</sup> D1-SPNs, we generated a Stard5-Flp;Drd1-Cre mouse line via breeding. However, the 4 CreON/FlpON AAVs which we tested exhibited leaky expression, including ectopic expression in Cre-positive but Flp-negative cells. This prevented reliable, cell-type-specific manipulation. We are actively working to overcome this common technical limitation of Flp/Cre AAVs, and these experiments will be addressed in a future study.

      Recommendations for the authors:

      Editor's note:

      Readers would also benefit from coding individual data points by sex and noting N/sex in the figure legends.

      We thank the editor for the note, we have noted in each figure legend the N and sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) Integration of results: The manuscript reads as two partly disconnected halves (functional gradient vs. molecular profiling). A more precise articulation of how the molecular findings (Stard5, Peg10) directly relate to the functional data would improve coherence.

      We thank the Reviewer for raising this important point. We agree that clearer integration between the functional gradient and the molecular findings would strengthen the manuscript. In the present study, Stard5 and Peg10 are not introduced as mechanistic drivers of behavior, but as molecular landmarks that map onto the functional rostro-caudal organization of the medNAcSh.

      Stard5 expression is enriched in the rostral medNAcSh, where we identify a functional hotspot for rewardrelated feeding, whereas Peg10 marks more caudal territories. Thus, the molecular profiling provides an independent axis that aligns with and supports the functional gradient revealed by photometry and optogenetic experiments. Whether these genes themselves contribute causally to feeding or aversive behaviors remains an open and interesting question for future studies.

      To improve clarity, we have explicitly articulated this link in the Discussion:

      “Importantly, our results indicate that spatial organization also defines functional specialization in the medNAcSh, and that molecular markers such as Stard5 provide access to these spatially defined subterritories rather than labeling a single, homogenous neuronal subtype.“

      “Having established a robust functional dichotomy of D1-SPNs along the rostro-caudal axis in reward consumption, we next asked whether this functional organization is mirrored by differences in molecular composition across the medNAcSh. Using multiple anatomical techniques, we find strong differences in the molecular composition of the rostral vs. caudal medNAcSh, which in turn could explain behavioral differences between these brain subregions.”

      “This makes Stard5 a spatial molecular landmark that captures the cellular ensemble of the rostral feeding hotspot, rather than a marker defining a single functional cell class. It is interesting that Stard5, a STARTdomain protein implicated in cholesterol metabolism and cellular stress responses (Alpy and Tomasetto, 2005; Rodriguez-Agudo et al., 2012; Calderon-Dominguez et al., 2014), and Peg10, an imprinted gene with roles in embryonic development and cancer (Mou et al. 2025), mark distinct rostro-caudal domains of the medNAcSh. Whether these genes themselves causally contribute to appetitive and consummatory behaviors, or aversive processing in this region remains an important question for future studies.”

      (2) Injection site specificity: Given prior work on NAc manipulations, it is essential to ensure precise targeting. Representative images from both rostral and caudal placements, including verification of fiber/injection confinement, would increase confidence.

      We thank the Reviewer for this important point regarding injection site specificity. Optic fiber placement was validated by identifying the coronal section in which the fiber tip was centered and aligning it to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). We validated currently a total of 14 brains, shown in the newly added Supplementary Figure S10.

      The primary source of variability across animals could be the extent of the viral spread and the size of the optic implants, which were 400 for photometry experiments and 200 μm for the optogenetic studies. We acknowledge that this limits the spatial precision with which the individual subregions can be isolated. This limitation is explicitly discussed in the manuscript.

      Importantly, despite this limitation, we detected robust and reproducible differences between rostral and caudal medNAcSh in reward-consumption photometry and optogenetic assays. This argues against injection site proximity or fiber misplacement being a major confounding factor for the main conclusions. Nonetheless this comment is a valid point, and in future studies we plan to establish targeting methods with reduced viral volumes and/or tapered optic fibers (Pisanello et al. 2017). This will allow finer spatial restriction and more precise dissection of medNAcSh subregions.

      (3) Minor clarifications:

      (a) Provide explicit definitions of "rostral" and "caudal" coordinates.

      We adjusted Figure 1 and added the coordinates.

      (b) Consider alternative wording to "gradient" since only two rostro-caudal positions are tested.

      RNA-seq and MERFISH data indicate that molecular markers in the NAcSh are organized along a continuous rostro–caudal gradient rather than discrete boundaries (Chen et al. 2021; Stanley et al. 2020). Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled two representative positions along this continuum.

      We added the following sentence in the discussion for clarification:

      “Of note, in this paper we decided to use the term “rostro-caudal gradient”, motivated by converging evidence from prior pharmacological studies (see below) and scRNA sequencing data (Chen et al., 2021; Stanley et al., 2020), which show continuous molecular and functional changes along the rostro-caudal axis of the medNAcSh rather than sharply defined boundaries. Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled only two representative positions along this continuum.”

      (c) Enhance representative images (e.g., stronger DAPI, zoom-ins, bregma coordinates).

      To improve clarity, we have adjusted Figure 1 by adding schematic representations including stereotaxic surgery coordinates, which facilitate interpretation of rostro–caudal targeting.

      (d) Report trial numbers in figure legends, injection site details (e.g., S1 mouse), learning curves, and rationale for low-pass filtering in photometry.

      We thank the Reviewer for these suggestions. The average number of successful trials is now reported in the figure legends (Figure 1 and Figure 7). Injection site details are described in the Methods and are now also illustrated in Figure 1a and validated in Supplementary Figure S10. In addition, we have added Supplementary Figure S8 showing the learning curves of the Drd1-Cre and Stard5-Flp mice included in this study.

      Regarding the low-pass filtering in photometry analysis: low-pass filtering (1 Hz) was applied to the signal to remove high-frequency noise and isolate slow calcium-dependent fluorescence fluctuations that reflect population-level neural activity as we have done before (Labouesse et al. 2023, 2024). Low-pass filtering is a commonly-used analysis in fiber photometry and often shows a better artifact-corrected signal (Zhang et al. 2023; Keevers and Jean-Richard-dit-Bressel 2025).

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) As mentioned, I find the part on Stard5-positive neurons a bit disconnected. Ideally, as mentioned in the discussion, the author could cross Stard5-Flp mice with D1-cre to selectively monitor and/or manipulate these neurons. Alternatively, do they have any data regarding D2-positive neurons of the rostral part to show whether they behave differently from D1-positive neurons?

      We thank the Reviewer for this suggestion and agree that selectively monitoring or manipulating Stard5<sup>+</sup> D1-SPNs using an intersectional approach would strengthen the link between the molecular and functional findings. We are pursuing this strategy by crossing Stard5-Flp mice with Drd1-Cre mice; however, as noted above, currently available CreON/FlpON viral tools exhibited leaky expression (a commonly known problem for such AAVs), preventing reliable cell-type–specific targeting. As a result, these experiments are ongoing (including reducing the titers) and will be addressed in a future study.

      At present, we do not have equivalent functional data for D2-SPNs in the rostral medNAcSh. Investigating whether rostral D2-SPNs behave differently from caudal D2-SPNs is an important and interesting question, which we hope to address in a future study. This limitation is acknowledged in the discussion.

      (2) Do the authors have any data on locomotor activity when they manipulate D1-expressing neurons? Lower food consumption as well as lower activity in the stimulated compartment - interpreted as aversion - could be related to diminished locomotor activity.

      We thank the reviewer for the relevant point about locomotion. We ran new analyses of locomotor activity during the feeding task (operant boxes) using a machine-learning model. A small subset of frames (136 frames from 10 video recordings) was manually annotated to define the animal’s body center and nose, as well as the four corners of the operant box. These annotations were used to train a YOLO (Redmon et al. 2015)-based pose estimation model. Locomotion metrics, such as total distance moved were subsequently derived from the temporal integration of positional data and aligned to opto-on and opto-off epochs of the feeding task. During licking periods, the animal’s body center remains largely stationary, which could lead to an overestimation of immobility. Nevertheless, we quantified the total distance traveled in the entire operant box across epochs, shown in Supplementary Figure S9 a-b. In our proof-of-concept experiment (Figure 2c-e), locomotion was increased in rostral ChrimsonR mice compared to controls (Supplementary Figure S9a), a similar effect seen with chemogenetic activation of D1-SPNs (Zhu, Ottenheimer, and DiLeone 2016). In our full experimental cohort, locomotion did not differ between control, rostral and caudal ChrimsonR mice across laser on and laser off epochs. These results indicate that reduced reward consumption during stimulation of rostral D1-SPNs is not due to decreased locomotor activity. Notably, whereas the inhibitory effect on consumption is specific to rostral D1-SPNs activation, locomotor effects are similar for both rostral and caudal D1-SPNs stimulation, indicating they are at least partly dissociated from one another.

      Moreover, in the RTPPA task, it is accepted that the percentage of time spent in the light-paired chamber reflects the preference or aversiveness to optogenetic stimulation. We additionally quantified total distance traveled (Supplementary Figure S9c). While optogenetic stimulation of both rostral and caudal D1-SPNs reduced time spent in the light-paired chamber (Figure 4), total distance traveled was unchanged, indicating that the observed aversion is not due to reduced locomotion.

      We added the following to the Results section: “To determine whether the reduced reward consumption observed in Rostral ChrimsonR mice could be explained by changes in locomotion, we quantified the total distance traveled during this task. Optogenetic stimulation led to an increase in locomotion in the small cohort of Rostral ChrimsonR mice in the reward consumption experiment shown in Figure 2d-e (Supplementary Figure S9a), while no change in locomotion was observed across epochs in mCherry controls, ChrimsonR Rostral and Caudal mice (Supplementary Figure S9b, related to Figure 2g-i)”

      And

      “Quantification of locomotion showed no reduction in distance traveled in the light-paired chamber (Supplementary Figure S9c), indicating that the avoidance was not driven by impaired locomotion. These data indicate that medNAcSh D1-SPNs generally promote aversion without affecting locomotion and without major differences along the rostro-caudal axis”

      Additionally, we added the following sentence to the Discussion: “Importantly, our behavioral effects of rostral D1-SPNs in the reward consumption and RTTPA assays could not be explained by reduced locomotor activity. Indeed, optogenetic stimulation of D1-SPNs during the reward consumption task did not reduce locomotion; instead, locomotion was either unchanged or increased in a small cohort of Rostral ChrimsonR mice. The increased locomotion likely reflected appetitive behavior and is consistent with past chemogenetic studies (Zhu et al., 2016). In the RTTPA no locomotion differences were detected.“

      (3) It would be useful to provide a schematic (or pictures) for the location of fiber implantation in all animals for both photometry and optogenetics.

      We validated optic fiber placement in 14 animals by identifying the coronal section in which the fiber tip was centered and aligning this section to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). Representative optic fiber placement and viral spread are shown in the newly added Supplementary Figure S10.

      Minor Comments:

      (1) Figure 6e and g seem mislabeled: "Drd1+ (D2-SPNs)".

      Yes, thank you. We corrected it.

      (2) Line 395-397: the authors mention Flp minimal Flp Leakage, but could it be low activity of Stard5 promoter in the core and dorsal striatum that allows little expression of the flippase that could be sufficient for recombination?

      We thank the Reviewer for this insightful point. We cannot fully distinguish between these possibilities in the current study; however, the overall recombination outside the target region remains minimal, supporting the utility of the Stard5-Flp line for selective targeting of the rostral medNAcSh. Injection of a Flp-dependent AAV into the lateral shell, core and dorsal striatum showed no expression, therefore we think this is unlikely. Moreover, this aligns with Stard5 expression patterns derived from the scRNAseq data (Chen et al. 2021), Allen Brain Atlas quantifications (Figure 5) and our RNAscope analysis (Figure 6). Nevertheless, we acknowledge that histology alone cannot definitively exclude this possibility, and quantitative approaches such as qPCR would be required.

      References

      Alpy, Fabien, and Catherine Tomasetto. 2005. “Give Lipids a START: The StAR-Related Lipid Transfer (START) Domain in Mammals.” Journal of Cell Science 118(13):2791–2801. doi:10.1242/jcs.02485.

      Calderon-Dominguez, Maria, Gregorio Gil, Miguel Angel Medina, William M. Pandak, and Daniel RodríguezAgudo. 2014. “The StarD4 Subfamily of Steroidogenic Acute Regulatory-Related Lipid Transfer (START) Domain Proteins: New Players in Cholesterol Metabolism.” The International Journal of Biochemistry & Cell Biology 49:64–68. doi:10.1016/j.biocel.2014.01.002.

      Chen, Renchao, Timothy R. Blosser, Mohamed N. Djekidel, Junjie Hao, Aritra Bhattacherjee, Wenqiang Chen, Luis M. Tuesta, Xiaowei Zhuang, and Yi Zhang. 2021. “Decoding Molecular and Cellular Heterogeneity of Mouse Nucleus Accumbens.” Nature Neuroscience 24(12):1757–71. doi:10.1038/s41593-021-00938-x.

      Domingues, Ana Verónica, Tawan T. A. Carvalho, Gabriela J. Martins, Raquel Correia, Bárbara Coimbra, Ricardo Bastos-Gonçalves, Marcelina Wezik, Rita Gaspar, Luísa Pinto, Nuno Sousa, Rui M. Costa, Carina Soares-Cunha, and Ana João Rodrigues. 2025. “Dynamic Representation of Appetitive and Aversive Stimuli in Nucleus Accumbens Shell D1- and D2-Medium Spiny Neurons.” Nature Communications 16(1):59. doi:10.1038/s41467-024-55269-9.

      Keevers, Luke J., and Philip Jean-Richard-dit-Bressel. 2025. “Obtaining Artifact-Corrected Signals in Fiber Photometry via Isosbestic Signals, Robust Regression, and DF/F Calculations.” Neurophotonics 12(02). doi:10.1117/1.NPh.12.2.025003.

      Labouesse, Marie A., Arturo Torres-Herraez, Muhammad O. Chohan, Joseph M. Villarin, Julia Greenwald, Xiaoxiao Sun, Mysarah Zahran, Alice Tang, Sherry Lam, Jeremy Veenstra-VanderWeele, Clay O. Lacefield, Jordi Bonaventura, Michael Michaelides, C. Savio Chan, Ofer Yizhar, and Christoph Kellendonk. 2023. “A Non-Canonical Striatopallidal Go Pathway That Supports Motor Control.” Nature Communications 14(1):6712. doi:10.1038/s41467-023-42288-1.

      Labouesse, Marie A., Maria Wilhelm, Zacharoula Kagiampaki, Andrew G. Yee, Raphaelle Denis, Masaya Harada, Andrea Gresch, Alina-Măriuca Marinescu, Kanako Otomo, Sebastiano Curreli, Laia Serratosa Capdevila, Xuehan Zhou, Reto B. Cola, Luca Ravotto, Chaim Glück, Stanislav Cherepanov, Bruno Weber, Xin Zhou, Jason Katner, Kjell A. Svensson, Tommaso Fellin, Louis-Eric Trudeau, Christopher P. Ford, Yaroslav Sych, and Tommaso Patriarchi. 2024. “A Chemogenetic Approach for Dopamine Imaging with Tunable Sensitivity.” Nature Communications 15(1):5551. doi:10.1038/s41467-024-49442-3.

      Liu, Yiqiong, Ying Wang, Zheng-dong Zhao, Guoguang Xie, Chao Zhang, Renchao Chen, and Yi Zhang. 2024. “A Subset of Dopamine Receptor-Expressing Neurons in the Nucleus Accumbens Controls Feeding and Energy Homeostasis.” Nature Metabolism 6(8):1616–31. doi:10.1038/s42255-02401100-0.

      Mou, Dachao, Shasha Wu, Yanqiong Chen, Yun Wang, Yufang Dai, Min Tang, Xiu Teng, Shijun Bai, and Xiufeng Bai. 2025. “Roles of PEG10 in Cancer and Neurodegenerative Disorder (Review).” Oncology Reports 53(5):1–9. doi:10.3892/or.2025.8893.

      O’Connor, Eoin C., Yves Kremer, Sandrine Lefort, Masaya Harada, Vincent Pascoli, Clément Rohner, and Christian Lüscher. 2015. “Accumbal D1R Neurons Projecting to Lateral Hypothalamus Authorize Feeding.” Neuron 88(3):553–64. doi:10.1016/j.neuron.2015.09.038.

      Pedersen, Christian E., Raajaram Gowrishankar, Sean C. Piantadosi, Daniel C. Castro, Madelyn M. Gray, Zhe C. Zhou, Shane A. Kan, Patrick J. Murphy, Patrick R. O’Neill, and Michael R. Bruchas. 2022. “Medial Accumbens Shell Spiny Projection Neurons Encode Relative Reward Preference.”

      Pisanello, Ferruccio, Gil Mandelbaum, Marco Pisanello, Ian A. Oldenburg, Leonardo Sileo, Jeffrey E. Markowitz, Ralph E. Peterson, Andrea Della Patria, Trevor M. Haynes, Mohamed S. Emara, Barbara Spagnolo, Sandeep Robert Datta, Massimo De Vittorio, and Bernardo L. Sabatini. 2017. “Dynamic Illumination of Spatially Restricted or Large Brain Volumes via a Single Tapered Optical Fiber.” Nature Neuroscience 20(8):1180–88. doi:10.1038/nn.4591.

      Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. “You Only Look Once: Unified, Real-Time Object Detection.”

      Requejo-Mendoza, Nikte, José-Antonio Arias-Montaño, and Ranier Gutierrez. 2025. “Nucleus Accumbens D2-Expressing Neurons: Balancing Reward and Licking Disruption through Rhythmic Optogenetic Stimulation” edited by J. M. Dominguez. PLOS ONE 20(2):e0317605. doi:10.1371/journal.pone.0317605.

      Rodriguez-Agudo, Daniel, Maria Calderon-Dominguez, Miguel Angel Medina, Shunlin Ren, Gregorio Gil, and William M. Pandak. 2012. “ER Stress Increases StarD5 Expression by Stabilizing Its MRNA and Leads to Relocalization of Its Protein from the Nucleus to the Membranes.” Journal of Lipid Research 53(12):2708–15. doi:10.1194/jlr.M031997.

      Stanley, Geoffrey, Ozgun Gokce, Robert C. Malenka, Thomas C. Südhof, and Stephen R. Quake. 2020. “Continuous and Discrete Neuron Types of the Adult Murine Striatum.” Neuron 105(4):688-699.e8. doi:10.1016/j.neuron.2019.11.004.

      Zhang, Yan, Márton Rózsa, Yajie Liang, Daniel Bushey, Ziqiang Wei, Jihong Zheng, Daniel Reep, Gerard Joey Broussard, Arthur Tsang, Getahun Tsegaye, Sujatha Narayan, Christopher J. Obara, JingXuan Lim, Ronak Patel, Rongwei Zhang, Misha B. Ahrens, Glenn C. Turner, Samuel S. H. Wang, Wyatt L. Korff, Eric R. Schreiter, Karel Svoboda, Jeremy P. Hasseman, Ilya Kolb, and Loren L. Looger. 2023. “Fast and Sensitive GCaMP Calcium Indicators for Imaging Neural Populations.” Nature 615(7954):884–91. doi:10.1038/s41586-023-05828-9.

      Zhu, Xianglong, David Ottenheimer, and Ralph J. DiLeone. 2016. “Activity of D1/2 Receptor Expressing Neurons in the Nucleus Accumbens Regulates Running, Locomotion, and Food Intake.” Frontiers in Behavioral Neuroscience 10. doi:10.3389/fnbeh.2016.00066.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      (1) LH levels were not measured in many mice or in robust temporal detail, such as every 30 or 60 min, to allow a more detailed comparison between the fine-scale timing of RP3V neuron activation with onset and timing of LH surge dynamics.

      Please see “Recommendations for Authors” below.

      (2) The authors report that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation. However, it is likely, and indeed evident from the 2 example LH patterns shown in Figures 3A-B, that LH values start to increase several hours before the peak LH. This earlier rise in LH levels ("onset" of the surge) occurs much closer in time to the first RP3V kisspeptin neuron oscillatory activation, and as such, the ensuing LH secretion may not be as delayed as the authors suggest.

      Please see “Recommendations for Authors” below.

      (3) The authors nicely show that there is some variation (~2 hours) in the peak of the first oscillation in proestrus females. Was this same variability present in OVX+E2 females, or was the variability smaller or absent in OVX+E2 versus proestrus? It is possible that the variability in proestrus mice is due to variability in the timing and magnitude of rising E2 levels, which would, in theory, be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice may have less variability between mice for the onset of RP3V kisspeptin activity.

      Please see “Recommendations for Authors” below.

      (4) One concern regarding this study is the lack of data showing the specificity of the AAV and the GCaMP6s signals. There are no data showing that GCaMP6s is limited to the RP3V and is not expressed in other Kiss1 populations in the brain. Given that 2ul of the AAV was injected, which seems like a lot considering it was close to the ventricle, it is important to show that the signal and measured activity are specific to the RP3V region. Though the authors discuss potential reasons for the low co-expression of GCaMP6 and kisspeptin immunoreactivity, it does raise some concern regarding the interpretation of these results. The low co-expression makes it difficult to confirm the Kiss1 cell-specificity of the Cre-dependent AAV injections. In addition, if GFP (GCaMP6s) and kisspeptin protein co-localization is low, it is possible that the activation of these neurons does not coincide with changes in kisspeptin or that these neurons are even expressing Kiss1 or kisspeptin at the time of activation. It is important to remember that the study measures activation of the kisspeptin neuron, and it does not reveal anything specific about the activity of the kisspeptin protein.

      Please see “Recommendations for Authors” below.

      (5) One additional minor concern is that LH levels were not measured in the ovariectomized females during the expected time of the LH surge. The authors suggest that the lower magnitude of activation during the LH surge in these females, in comparison to proestrus females, may be the result of lower LH levels. It's hard to interpret the difference in magnitude of neuronal activation between EB-treated and proestrus females without knowing LH levels. In addition, it's possible that an LH surge did not occur in all EB-treated females, and thus, having LH levels would confirm the success of the EB treatment.

      Please see “Recommendations for Authors” below.

      (6) This kisspeptin neuron peak activity is abolished in ovariectomized mice, and estradiol replacement restored this activity, but only partially. Circulating levels of estradiol were not measured in these different setups, but the authors hypothesize that the lack of full restoration may be due to the absence of other ovarian signals, possibly progesterone.

      Please see “Recommendations for Authors” below.

      (7) Recordings in several mice show inter- and intra-variability in the time of peak onset. It is not shown whether this variability is associated with a similar variability in the timing of the LH surge onset in the recorded mice. The authors hypothesized that this variability indicates a poor involvement of the circadian input. However, no experiments were done to investigate the role of the (vasopressinergic-driven) circadian input on the kisspeptin neuron activation at the light/dark transition. Thus, we suggest that the authors be more tentative about this hypothesis.

      Please see “Recommendations for Authors” below.

      Recommendations for the authors:

      (1) The study measured LH levels over time in just 5 female mice, a small sample size given the variability between mice. Having said that, n=5 is an OK starting point but the LH values are only shown for 2 mice, and there are no graphs or presentation of mean LH levels over time for all 5 mice. Figure 3 would greatly benefit from graphing and statistical analyses of the LH levels for all 5 mice (mean line graphs over time or similar). The authors report the mean "peak LH" level in the text, but it would be important to show and graph all the LH values over time (either by clock time or time relative to start of first RP3V oscillation or both), to allow the reader to compare the LH pattern to the RP3V kisspeptin neuron activity over time.

      We share the Reviewer’s frustration regarding the lack of detailed LH time points to correlate with the changes in GCaMP signal. Certainly, it was our intention to do better. However, with the benefit of actually being able to monitor surge progress through RP3V neuron activity in real time, we found that frequent blood sampling could often interfere with the normal dynamic of surge activity. One some occasions, the RP3V kisspeptin neuron oscillations would stop abruptly mid- or early-surge while on others it would stop and then start again. Knowing that this was not the normal profile, we resorted to taking as few blood samples as possible, trying primarily to get what we thought might be the “peak” LH surge level. We acknowledge that this is not ideal, and leaves open the important question around the precise relationship of the beginning of RP3V kisspeptin oscillations with LH secretion. Although not answering the question directly, this was part of the motivation for the last figure which emphasizes how the RP3V kisspeptin neuron activity and GnRH neuron dendron activity are essentially identical at the time of the surge. We have re-written the relevant section of the Discussion to be more circumspect.

      (2) The authors report and discuss that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation but it is likely, and indeed evident from the 2 example LH patterns shown in Figs 3A-B, that LH values start to increase several hours earlier, well before the peak LH. Thus, the rise in LH levels during the surge starts much closer in time to the first RP3V kisspeptin neuron oscillatory activation, which the authors don't analyze. For example, the 2nd LH value for the 2 representative mice shown in Figure 3 is notably higher than the 1st LH value of those mice, even though the peak value has not yet been attained. Even with the LH levels only being measured here every couple hours, this "first detected rise in LH" be at least be graphed and/or analyzed relative to the timing of kisspeptin neuron activity, and commented on in the Discussion.

      As above.

      (3) It is unclear if the variation (~2 hours) in the peak of the first oscillation in proestrus females is the same as in OVX+E2 females, or was the variability smaller or absent in OVX+E2 females versus proestrus? The variability observed in proestrus mice is likely due to variability in timing and magnitude of rising E2 levels, which would may be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice might display less variability for the timing of the RP3V kisspeptin activity "onset". This measure would be important to analyze here and to discuss, given that many labs around the world often use an OVX+E2 model.

      This is an interesting point given the dogma surrounding the role of the SCN in initiating the surge. Three of the five OVX+E2 mice exhibited clearly discernible GCaMP oscillations that started at approximately noon, 1pm and 2pm. While this sample is very small, it does suggest that the onset of RP3V kisspeptin neuron activity is variable as found in proestrous mice. We have indicated this cautiously given the sample size.

      (4) If looking at kisspeptin immunoreactivity is problematic, is it possible to look at Kiss1 RNA levels or to look at Cre-recombinase protein levels? While the Cre-recombinase would just be a proxy for Kiss1/kisspeptin, it may result in higher expression and better co-localization with the GCaMP6s.

      Yes, RNAscope would likely be the ideal method to settle this long running issue of apparently poor Kiss-cre targeting in the RP3V. Unexpectedly, however, we found that the mCherry probe bound to Kiss1 in our attempts at an RNAscope evaluation. The use of Cre as a proxy for identifying kisspeptin neurons would almost certainly generate better co-localization as Cre is being used to target GCaMP.

      Minor

      (1) It was not clear in the manuscript how many cells were counted or contributed to the neuronal activation data. Is it the entire population of RP3V Kiss1 cells? Just a subset? How much variability is there in the number of cells measured/counted between animals? Presumably, the brains were extracted to confirm the placement of the optic fiber. Were there neuroanatomical studies also done on these animals to confirm how many cells express GFP (GCaMP6) and the correct placement and specificity of the AAV? Is there any potential that cells in the BnST or even the ARC took up the virus and were included in these measurements?

      It is very difficult if not impossible to establish just how many RP3V kisspeptin neurons contribute to the GCaMP population signal using fibre photometry. This will depend on levels of AAV transfection, distance from the optic fibre, and the numbers of RP3V kisspeptin neurons actually involved in the surge mechanism. Of note, C-Fos data suggest that only around one-third of RP3V kisspeptin neurons are activated at the time of the surge. All fibre placements were subsequently shown to be running alongside GCaMP-expressing AVPV/anterior periventricular nucleus cells (now noted), but the numbers of transfected cells were not quantified. As shown in Fig.4, the GCaMP signal was very similar across all mice suggesting little variation in the relationship between transduction, fibre placement and distance.

      The RP3V region is approximately 4-5 mm from the ARN. We felt that the possibility that an AAV injection in the RP3V would spill over into the ARN was so remote that we did not assess GCaMP expression in ARN kisspeptin neurons. We have previously determined for the ARN that recordable GCaMP fluorescence only occurs if the optic fibre is within 0.5 mm from GCaMP-expressing neurons. Ultimately, proof that we are not recording from ARN kisspeptin neurons comes from the very different activity patterns reported here for RP3V neurons compared to the kisspeptin pulse generator. We did not see any GCaMP expressed in the BNST.

      (2) If it is possible to measure LH levels in the EB surge animals, it would be helpful, at least to confirm that they did surge and to support the proposed idea that LH surge levels are lower in that model.

      Unfortunately, as acknowledged in the original text we did not take blood samples from these mice so do not have the data. However, as noted, other studies undertaken by us using the same EB surge paradigm show that peak LH levels are much lower compared to proestrus. In retrospect we do agree that this would have been useful and particularly to establish whether each mouse did show a surge as two of the OVX+EB mice failed to show typical surge-associated oscillations. We have noted this in the Discussion.

      (3) For Figure 4F, please add a gray shaded box to the graph to denote the "dark" period (lights off), as was done for Figures 2 and 3. This is important because Figure 4F is making the point that there is a consistent 90-minute oscillation event right before lights off, so it would be helpful to denote the period of lights off on the graph.

      There was in fact a very light grey shade, but we have now added a grey bar to make the dark period clearer.

      (4) The Title of the paper should include the brain region because this is specifically the RP3V (or preoptic area "POA") kisspeptin neurons that are studied, not other kisspeptin cell populations.

      We have added “preoptic area” to clarify

      (5) The graphs in Figure 3C-D are from different mice and address a different question than the graphs in Figure 3A-B. This was a bit confusing, and it is recommended that the LH + RP3V kisspeptin activity experiment (Figures 3A-B) be its own figure, and the graphs looking at the detailed oscillatory patterns in Figures 3C-D be their own figure, as the latter are addressing a different question and don't have any LH data.

      We have split the figure as requested.

      (6) The tiny font size of the X and Y axes of Figures 2 and 3 is very small and hard to read. Can this text please be increased in size a little? By comparison, the font size of the X and Y axes of Figure 4 is bigger and more legible.

      Changed.

      (7) In the methods for fiber photometry, there is a sentence saying "Twenty two-hour recordings were made..." This was confusing, as it read as if there were twenty 2-hour recordings, when in fact it was one 22-hour recording. The authors should reword or use "22-hour" in this sentence.

      Changed.

      (8) It's a bit hard to see the difference in color between proestrus 1 and proestrus 2 (both blues) in Figure 6, especially when they overlap. It might be helpful to select a different color for one of them.

      Changed.

      (9) Is the virus from Addgene or just the plasmid? Did Addgene insert the plasmid into the virus, or was that done elsewhere? For purposes of replication, it might be helpful to state the plasmid that was used and the virus that was used, and their origins (e.g., if made by Addgene or donated by another investigator). I was not able to find the virus based on the Addgene number in the manuscript and was getting plasmids with different Addgene #s.

      Apologies, the numbering was incorrect. We have now amended to 100842-AAV9 that was packaged by Addgene.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their study the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely and the manuscript is well written, albeit in some cases details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease of the DON content associated with deletion of FgDML1: Although some growth data are shown in figure 6 - indicating a severe growth defect - the DON production presented in figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to a decreased growth and specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation on the same conditions to the DON amount detected. Only then a conclusion as to an altered production in the mutant strains can be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. The point to point responds to the reviewer’s comments are listed as following.

      Comments to the revised manuscript:

      The authors carefully revised the manuscript and provided explanations for methods in several cases. However, there are still some problems - probably due to misunderstanding - that need revision.

      (1) A major problem of the first version of the manuscript was the lack of appropriate description of biomass analysis and the consideration of the respective results for evaluation of production of DON and other metabolites. Although the authors provide some explanation in the response to reviews, I could not find a corresponding explanation or description in the manuscript. It is not sufficient to explain the problem to me, but a detailed explanation and description of the method has to be provided in the manuscript along with the definition of one "unit of mycelium". It is still not entirely clear to me what such a "unit of mycelium" is.

      Please clarify this and any other uncertainties that were commented on by me and other reviewers in the manuscript, not only in the response to reviews. Also adjust the reference list accordingly.

      Thank you very much for your advice. We appreciate the reviewer’s continued attention to the potential impact of biomass differences on DON production, particularly in light of the reduced growth rate observed in the mutant strain.

      We acknowledge that the mutant exhibits slower growth compared to the wild-type strain. However, it is important to emphasize that the reduction in DON levels reported in this study cannot be attributed to decreased fungal biomass. In our experimental design, DON production was normalized to mycelial dry weight, and toxin levels are expressed as μg DON per g dry mycelium. Therefore, differences in total mycelial accumulation among strains were explicitly accounted for and eliminated during data analysis.

      By expressing DON production on a per-unit-biomass basis, the measured values reflect the intrinsic DON biosynthetic capacity of the mycelium rather than the overall growth rate or total biomass. Consequently, the observed reduction in DON content in the mutant indicates a genuine impairment in DON biosynthesis per unit of fungal biomass, rather than a secondary effect resulting from reduced mycelial growth.

      To avoid ambiguity, we have clarified this point in the revised manuscript by explicitly stating the normalization strategy and the definition of the mycelial unit in the Materials and Methods section, and by emphasizing in the Results/Discussion section that DON levels were compared on a biomass-normalized basis.

      We hope that this clarification adequately addresses the reviewer’s concern and clearly distinguishes growth-related effects from alterations in toxin biosynthesis.

      “DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). Under toxin-producing conditions (28 °C, 145 rpm), fungal strains were cultured in TBI medium for 7 days. Cultures were initiated using freshly grown mycelia. After incubation, mycelia and culture filtrates were separated by filtration. The culture filtrates were collected for DON determination, while the mycelia were harvested for biomass analysis. The collected mycelia were washed with sterile distilled water and dried at 60 °C to constant weight. The dry weight of mycelia was recorded and used for normalization of DON production. One mycelial unit was defined as 1 g of dry mycelial biomass. DON concentration in the culture filtrates was quantified using an enzyme-linked immunosorbent assay (ELISA). Briefly, 50 μL of culture filtrate or DON standard solution was added to wells of a 96-well microplate pre-coated with DON antigen, followed by the addition of enzyme conjugate and antibody working solution according to the manufacturer’s instructions. After incubation and washing, color development was achieved using substrate solution and terminated by stop solution. Absorbance was measured at 450 nm using a microplate reader. A standard curve was generated using log<sub>10</sub>-transformed DON concentrations of the standards and the corresponding percentage absorbance values. DON concentrations in the samples were calculated based on the standard curve. Total DON production was calculated according to the culture volume (30 mL) and subsequently normalized to mycelial dry weight. DON production was expressed as μg DON per g dry mycelium. Each treatment group contains three biological replicates and three technical replicates.”

      (2) Another problem was, that the authors considered FgDML1 a regulator of DON production. As mentioned by me and reviewer 3, FgDML1 is crucial to numerous functions in F. graminearum and its lack causes a plethora of problems for fungal physiology. Hence, although it is clear that the lack of FgDML1 causes alterations in DON production, it is not appropriate to designate this factor as a "regulator".

      It seems to me that the authors are afraid that if FgDML1 would not be a "regulator" that this would decrease the value of their study, which is not the case. This is a matter of correct wording. Therefore, please revise the wording accordingly, starting with the title:

      ...FgDML1 impacts DON toxin biosynthesis...

      Moreover, for sure the manuscript might benefit from more detailed description of the whole cascade leading from FgDML1 to DON biosynthesis and production of the other metabolites that change upon deletion. Such explanation can help the reader grasp the relevance of FgDML for regulatory processes as well as on more general versus specific effects.

      Thank you very much for your advice. We fully agree that, given the pleiotropic functions of FgDML1 in F. graminearum and the broad physiological defects caused by its deletion, it is not appropriate to designate FgDML1 as a direct or specific “regulator” of DON biosynthesis.

      We acknowledge that the use of the term “regulator” in the previous version was imprecise. Following the reviewer’s suggestion, we have revised the wording throughout the manuscript to more accurately reflect the role of FgDML1. Specifically, we now describe FgDML1 as a factor that impacts or affects DON toxin biosynthesis rather than directly regulating it. The title has been revised accordingly to read:

      “Mitochondrial protein FgDML1 impacts DON toxin biosynthesis and cyazofamid sensitivity in F. graminearum by affecting mitochondrial homeostasis”

      Importantly, we would like to emphasize that our intention was not to overstate the specificity of FgDML1 in DON regulation, but rather to highlight its influence on secondary metabolism in the context of its broader biological functions. To address this more clearly, we have expanded the Discussion section to provide a more detailed and cautious interpretation of the potential cascade linking FgDML1 deletion to altered DON biosynthesis and changes in other metabolites.

      'Secondary metabolite biosynthesis is generally regarded as an energy-intensive process that is tightly coupled to cellular energy metabolism. ATP serves as the primary energy currency supporting enzymatic reactions, macromolecule synthesis, and subcellular organization required for secondary metabolism. Disruption of ATP generation has been shown to directly impair toxin biosynthesis: for example, silencing of ATP synthase subunit α (AtpA) significantly reduces ATP synthesis and inhibits the production of the TcdA and TcdB toxins(Marreddy et al., 2024). Similarly, in plants, ATP depletion leads to a metabolic shift in which growth and basic physiological processes are prioritized at the expense of energetically costly secondary metabolites, including toxins(Xiao et al., 2024). Together, these findings highlight ATP availability as a key determinant of secondary metabolite production across biological systems.

      In filamentous fungi, mitochondria play a central role in sustaining cellular ATP levels through oxidative phosphorylation and are therefore critical for biosynthetic and stress-adaptive processes. In F. graminearum, mutants defective in mitochondrial components, such as the voltage-dependent anion channel (mitochondrial porin), exhibit aberrant mitochondrial morphology, reduced ATP production, and markedly decreased DON accumulation and virulence (Han et al., 2022). These observations establish a direct link between mitochondrial energy metabolism and secondary metabolite output, supporting the notion that intact mitochondrial function and adequate ATP supply are prerequisites for robust DON production.

      Consistent with this energy-dependent framework, biosynthesis of the mycotoxin DON in F. graminearum requires substantial ATP input. In the present study, ATP content in the ΔFgDML1 mutant was significantly lower than in the wild-type PH-1 and the complemented strain ΔFgDML1-C, and DON production was concomitantly reduced (Fig. 4A). Importantly, DON levels were normalized to mycelial dry weight, indicating that the observed reduction reflects a decreased biosynthetic capacity per unit biomass rather than a secondary consequence of reduced fungal growth. This distinction demonstrates that impaired DON production in the ΔFgDML1 mutant arises primarily from metabolic limitations.

      At the cellular level, ATP depletion compromises multiple energy-dependent steps required for DON biosynthesis. The formation of toxisomes, which are specialized subcellular structures responsible for the spatial organization of DON biosynthetic enzymes, is essential for efficient mycotoxin production and is an ATP-dependent process. Reduced ATP levels disrupt toxisome assembly, and accordingly, the ΔFgDML1 mutant was unable to form functional toxisomes (Fig. 4C). In parallel, western blot analysis revealed a marked reduction in the abundance of the DON biosynthetic enzyme FgTri1 (Fig. 4D). In addition, ATP-dependent processes are directly involved in the biogenesis of the DON biosynthetic machinery: the ATPase activity of myosin I (FgMyo1) is required for efficient translation of key DON biosynthetic enzymes, and disruption of its ATPase function results in reduced DON production(Tang et al., 2018). These findings further underscore the dependence of DON biosynthesis on cellular energy status.

      DON production is also regulated at the transcriptional level by the TRI gene cluster, with Tri5 and Tri6 serving as core components of the biosynthetic pathway. Tri5 encodes trichodiene synthase, which catalyzes the first committed step of DON biosynthesis. In the ΔFgDML1 mutant, expression levels of FgTri5 and FgTri6 were significantly downregulated (Fig. 4B), suggesting that impaired energy metabolism indirectly affects transcription of DON biosynthetic genes. Although no direct regulatory role of DML family proteins in gene expression has been reported in Saccharomyces cerevisiae or Drosophila melanogaster, their established functions in cell division and microtubule organization raise the possibility that FgDML1 indirectly influences gene expression through effects on chromatin organization or cell-cycle progression(Schulze and Wallrath, 2007).

      In addition to reduced ATP levels, deletion of FgDML1 resulted in a significant decrease in acetyl-CoA content (Fig. 5C), a key precursor for trichothecene biosynthesis. Acetyl-CoA links central carbon metabolism with secondary metabolite production, and its depletion further constrains DON biosynthesis by limiting substrate availability. Broader metabolomic studies support this relationship, showing that perturbations in TCA cycle intermediates and central carbon metabolism are closely associated with altered DON production, reinforcing a mechanistic linkage between energy generation and toxin biosynthesis(Atanasova-Penichon et al., 2018).

      “Taken together, these results support a model in which FgDML1 influences DON production indirectly by maintaining mitochondrial energy metabolism. Reduced ATP availability in the ΔFgDML1 mutant restricts energy-dependent biosynthetic processes, disrupts toxisome formation, diminishes DON biosynthetic enzyme abundance and gene expression, and limits precursor supply, ultimately leading to a substantial reduction in DON biosynthesis that is independent of fungal biomass effects.” (in L284-350). In this revised discussion, we explicitly distinguish between general physiological effects caused by the loss of FgDML1 and more specific consequences on secondary metabolic pathways.

      We believe that this revised wording and the expanded mechanistic discussion more accurately reflect the biological role of FgDML1 and improve the conceptual clarity of the manuscript, without overstating its function as a dedicated regulator of DON production.

      Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper in innovative, but there are issues in the writing that need to be added and corrected.

      Comments on revisions:

      The author has addressed my questions.

      We appreciate it very much that you spent much time on my paper and give me good suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole-to-pole oscillation whereby a time average minimum of the Min proteins at mid-cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports the biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterizion of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations were nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      Weaknesses:

      While the study shows that MinD in B. subtilis utilizes a different (MinE-independent) activation mechanism, it remains to be determined the extent to which MinJ and/or MinC play a role.

      Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained. Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

      Reviewer #3 (Public review):

      Experimentally, this study provides sufficient data to support the authors' conclusion that MinD dimerization but not ATPase activity is both necessary and sufficient for concentrating it and its binding partner, the division inhibitor MinC, at cell poles. Biochemical data appears to be rigorously acquired and includes proper controls. Although cytological data are consistent with the authors' model, quantitative information on MinD localization in a statistically relevant set of cells is missing (e.g. Figure 2B).

      The study's other major conclusion, as outlined in their discussion, that a reaction-diffusion model explains MinD localization in wild-type cells, is unsubstantiated. If they would like to make this a major conclusion of the final manuscript, they will need to include modeling that takes into account biochemical and cytological data. From a presentation perspective, the manuscript is challenging to read and will require substantial rewriting and revision prior to publication.

      We thank the reviewers for their detailed and constructive comments on our work. We particularly acknowledge that the initial version of our manuscript was difficult to read and might have provoked the impression that the aim was to formulate a new mathematical model of Min dynamics in B. subtilis. However, our work aimed at providing solid (and first) biochemical evidence for the MinD ATPase cycle and the nature of the ATPase stimulation. Furthermore, we aimed at corroborating the in vitro findings with single-molecule microscopy data that provided a detailed in vivo picture of the Min dynamics in living cells. Together, this work combines for the first time in vitro and single-molecule in vivo data. During the revision, we generated a wealth of new data that aimed at unraveling the potential effects of MinC and MinJ on MinD dynamics. A major problem during the revision was the problematic purification of MinJ. The membrane integral MinJ has been shown to be highly susceptible to proteolytic decay during purification attempts. Despite various attempts we did not succeed in the purification of full length MinJ. These efforts also led to the unusual long revision time. We therefore turned to the purification of the soluble part of MinJ, namely the PDZ domain. The revised work now contains in vitro data showing the impact of MinC and MinJ-PDZ on MinD ATPase activity and membrane binding. Furthermore, we now provide single-molecule tracking data of MinD in minC and minJ deletion mutant backgrounds. Importantly, the new data show that MinC has no effect on MinD activities, while the PDZ domain has a mild stimulating effect on MinD´s ATPase activity. In summary, a detailed picture on how MinD dynamics function mechanistically in B. subtills emerges.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is important to evaluate MinD ATPase activity, PL binding, and release in the presence of MinC and MinJ. In E. coli, MinD recruits MinC to phospholipids. The presence of MinC could change the on/off rates. It is unknown if MinC or MinJ could alter the ATPase rates or dynamics. Presuming that MinD alone drives the complete dynamic story because stimulation is observed in vitro with phospholipids, it follows that Michaelis Menten kinetics is insufficient. It is acknowledged that MinJ is difficult to purify, but one could test a small cytoplasmic subdomain or MinJ-enriched membranes for MinD recruitment and release.

      Indeed, it is unknown whether MinC or MinJ have an impact on the ATPase rates or protein dynamics of MinD in B. subtilis. To address the potential influence of MinC and MinJ on MinD’s ATPase activity and dynamics, we conducted a series of experiments. MinC was successfully purified, and subsequent BLI and ATPase assays revealed no significant impact on MinD activity in our system, except for a modestly reduced ATPase activity (Figure S 5).

      With regard to MinJ, multiple constructs and purification strategies were attempted. While full-length MinJ could not be purified, we isolated the C-terminal PDZ domain to probe potential interactions. In ATPase assays, the PDZ domain reproducibly increased MinD ATP hydrolysis rates, whereas BLI measurements did not reveal detectable changes in MinD membrane-binding kinetics under these conditions. We agree with the reviewer that membrane-integrated MinJ could exert additional effects on MinD recruitment or release that are not captured by the isolated PDZ domain, and we now discuss this limitation in the revised Discussion.

      Furthermore, we performed single-molecule localization and tracking analyses of MinD in ∆minC and ∆minJ backgrounds. These experiments, found in a newly added Results section and summarized in Fig. S 12, demonstrate that MinJ appears to play a role in maintaining dynamic MinD membrane cycling and preventing excessive confinement or aggregation, whereas MinC has no obvious effect on MinD dynamics.

      (2) It is important to show the reduced ATP hydrolysis by MinD mutant proteins (line 243). Stating that they are catalytically inactive without showing the data is presumptuous, and there may be differences between the mutants. Although I am sure that the authors evaluated activity with phospholipids, it should be shown.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      (3) The shoulder on MinD-K16A suggests that it is capable of forming a dimer at low equilibrium. The suggestion that it is due to interaction with the inert SEC matrix (line 242) raises more concerns, although this is highly unlikely, given that G12V elutes as a single peak. The possibility of a dimer here also demonstrates the necessity of reporting precise ATPase rates for the mutants.

      Thank you for this comment. Since we shared some of your concerns, we made sure to gather enough evidence before making the respective claims. We conducted both in vivo (single-particle tracking, widefield microscopy) experiments and in vitro experiments with the respective K16A mutant of MinD. Most convincingly, K16A is completely catalytically inactive (see previous answer), while both positive and negative controls behave as expected. Both in vivo and in vitro experiments suggest that the protein still binds membrane despite not being able to form dimers. Similar observations were made in a study conducted by colleagues in parallel (Bohorquez et. al, 2024). Furthermore, K16A exchanges in other Walker motif-containing proteins, including E. coli MinD and RecA, and B. subtilis ParA/Soj, abolish dimer formation completely.

      There are many possible explanations why the observed shoulder during elution could appear, which we did not spell out in the results section. This includes possible conformational heterogeneity, as the protein may adopt multiple stable or semi-stable conformations that slightly differ in hydrodynamic volume. It is also possible, that the shoulder represents small protein aggregates from degradation products or proteolysis, which we indeed observe in the respective SDS-PAGE/Blot (Fig. S6). As written in the text, interactions with the SEC column through e.g. hydrophobic patches sticking out is not uncommon, as the surface charges of the mutant protein is different to the wild type version. On the same note, the buffer may subtly affect the surface properties like charge and hydrophobicity differently to the wild type protein and thus its interaction with the column. In conclusion, we are confident that the orthogonal methods used point towards dimer abolishment in a K16A mutant of MinD, despite displaying a small shoulder during SEC elution.

      (4) BLI data - were the kon and koff rates also determined without ATP, since it is assumed that MinD-K16A does not bind ATP, but has a strong Kd (Table 1). Does ATP modify Kd of wt MinD for PLs?

      Without ATP, MinD did neither properly interact with the sensor-bound liposomes nor follow a regular binding kinetic. Therefore, kinetic constants could not be determined, as the fitting of the curves is not possible. In addition to the respective figure (Fig. S8), we attached the graph of the raw/unfitted data in the supplement (Fig. S 13)- (MinD2 dataset)).

      (5) Local MinJ interactions are proposed to alter the dynamic localization of MinD wt and variants in vivo (line 349-358), which could occur through regulation of ATP hydrolysis, PL binding, or release by MinJ or MinC. Localization dynamics should be measured in minC and minJ mutant strains.

      We thank the reviewer for this important suggestion. In response, we have now directly measured MinD localization dynamics in both ∆minC and ∆minJ backgrounds. We performed single-molecule localization microscopy (SMLM) and single-molecule tracking (SMT) of Halo-MinD expressed from its native locus in these mutant strains, using the same experimental and analytical pipeline applied throughout the study. These new experiments are presented in a newly added Results section and summarized in Figure S12, where we quantitatively compare MinD localization, mobility, diffusion states, and confinement between wild type, ∆minC and ∆minJ cells. The data show that deletion of minJ leads to a pronounced increase in the confined/static MinD fraction and reduced dynamic cycling, whereas deletion of minC causes only subtle changes in MinD dynamics. These findings support a specific role of MinJ in maintaining dynamic MinD membrane cycling in vivo, while MinC has a more modest modulatory effect. We have integrated these results into the Discussion to refine our model of how MinJ and MinC differentially influence MinD dynamics and localization.

      (6) Considering the single molecule population counting and a lack of error presented for the binning of tracks (confined/slow/fast); it is difficult to rationalize why G12V and K16A are defective. The relative proportions of confined/slow/fast between wt, G12V, and K16A seem quite similar (i.e., bubble plot). And the static localization in Fig. 2B does not seem dramatically perturbed. This seems to invoke other cellular regulators as critical for the system's operation in the cell, further pointing to important regulatory roles by MinJ and/or MinC.

      First, regarding the apparent lack of error estimates for the population binning, the uncertainties associated with the SMT-based population fitting are intrinsically very small and fall below the graphical resolution of the plots. This reflects the large number of tracks analyzed and the robustness of the fitting procedure, rather than an omission of error analysis.

      Second, we respectfully disagree that the diffusion-state distributions and static localization patterns of G12V and K16A are similar to those of the wild type. In the context of SMT data, the observed shifts in population sizes are substantial and biologically meaningful. Moreover, the static localization of these mutants is markedly altered: instead of forming a graded enrichment at poles and septa, both mutants display a uniform membrane distribution, similar to e.g. a membrane stain (also see Fig. 2 B). This indicates a loss of regulated recruitment, consistent with impaired interaction with MinJ. Importantly, our biochemical analyses, together with extensive data on conserved Walker-type ATPases carrying analogous G12V and K16A mutations, strongly support the conclusion that these variants are functionally defective despite retaining membrane association.

      Third, we agree about the importance of MinC and MinJ, and have now directly tested the contribution of these interactors by analyzing MinD dynamics in ∆minC and ∆minJ backgrounds. These new data, presented in a newly added Results section and summarized in Fig. S12, support our interpretation by showing that MinJ has a pronounced effect on MinD confinement and dynamic cycling in vivo, whereas MinC has a more modest influence. Together, these findings reinforce the conclusion that the defects of G12V and K16A arise from impaired regulatory cycling through the mutations, but also through impaired interaction with MinJ.

      (7) Interesting that they stored the His-MinD protein at 4C for up to one week and not at -80C as it was in 10% glycerol. Was MinD inactivated by freezing? Did this contribute to the observed aggregation (line 695)?

      We thank the reviewer for raising this point. Prior to this comment, we routinely worked with freshly purified MinD and therefore had not systematically compared storage at 4 °C and -80 °C. In response to the suggestion, we have now directly compared the activity of MinD stored at 4 °C for one week with that of MinD stored at -80 °C for four weeks. We did not observe any significant difference in ATPase activity or overall biochemical behavior between the two storage conditions. These results indicate that freezing does not inactivate MinD and that the aggregation observed in some preparations is unlikely to be caused by storage at 4 °C. We have clarified this point in the materials and methods part of the manuscript and thank the reviewer for prompting this.

      (8) Line 109 - Type. Change "component" to "components".

      (9) Page 4, line 52 change 'machinery' to ‘machine'.

      (10) Page 13, line 248, changed 'manifested' to 'displayed'.

      Thank you for pointing out these typos, which have all been corrected.

      Reviewer #2 (Recommendations for the authors):

      I suggest making changes to sentence Lines 60-62: "In rod-shaped model bacteria like Escherichia coli and Bacillus subtilis, division site selection is governed by two protein systems (15-17): nucleoid occlusion and the Min system." However, it was shown previously that the deletion of both systems in B. subtilis, division site selection wasn't disturbed and other mechanism was suggested to be involved.

      We agree that this information should be part of the introduction. Therefore, we included the following sentence at the indicated position:

      “However, it was previously shown that simultaneous deletion of both systems in B. subtilis did not disturb division site selection, suggesting additional mechanisms to be involved (Rodrigues and Harry, 2012).”

      I suggest changing sentence Lines 85-86: "Dimerized MinD recruits MinC and activates it to prevent FtsZ dynamics (46)". It would be more precise to say: "Dimerized MinD recruits MinC and activates it to inhibit FtsZ oligomerization (46).

      Thank you, we agree and changed the sentence accordingly.

      In Figure S2 mark the two mentioned peaks 31 and 62 kDa to which elution volumes correspond.

      We thank the reviewer for this point. We ran the standards for this column again and fitted them to our peaks (see updated Fig. S2), now demonstrating that the shoulders are indeed not at a size where dimers would elute but rather around ~44.3 kDa. We note that both the Ni-NTA eluate and SEC fractions contain multiple His-tagged degradation products (see revised Fig. S2 and His-MinD blot in Fig. S1). Because the SEC run was performed with excess ADP to suppress ATP-dependent dimerization, we interpret the minor shoulder at ~44.3 kDa as arising from sample heterogeneity due to these degradation products, either by co-elution of fragments or by transient fragment:full-length MinD assemblies, rather than full-length MinD dimerization. This is now also described in the respective Results section.

      Reviewer #3 (Recommendations for the authors):

      The quality of the written manuscript is poor, making it difficult to read and appreciate. Specifically: The introduction is quite long. It takes almost three pages until the primary objective of the paper, identifying determinants of MinD localization in B. subtilis, is clearly stated. The introduction should be shortened to focus specifically on Min system function across species-i.e. prevent aberrant polar septation events. Three or four paragraphs should be sufficient. E.g. 1. Introduction to Min systems generally, 2. A summary of the mechanism underlying MinD oscillation in E. coli, 3. An explanation of similarities and differences between E. coli and B. subtilis, and 4. A paragraph outlining the specific questions to be addressed in this study.

      We have substantially revised the Introduction to address this concern. The revised version is considerably shorter and more focused, and now follows the structure proposed by the reviewer. As a result, the main objective of the manuscript is now stated much earlier, and the overall readability and clarity of the Introduction have been substantially improved.

      The results section is challenging to read, in part due to the inclusion of methods as well as some issues with organization. For example, this section begins with a single sentence describing the need to investigate MinD's ATPase cycle in vitro. This sentence is followed by a header and an entirely new section describing the methods used to purify MinD for biochemical analysis. These details should be in the methods section. Similarly, the first paragraph of the following section, which focuses on the ATPase activity MinD in the presence and absence of liposomes, describes how the commercially available EnzChek phosphate assays works. This is, again, something that belongs in methods, not results.

      We have revised the Results section extensively in response to this comment. In the revised manuscript, we have removed or relocated substantial methodological detail from the Results to the Methods section and streamlined the overall organization. Descriptions of protein purification procedures and standard assay principles, including details of the EnzChek phosphate assay, have been condensed or moved to the Methods where appropriate.

      At the same time, we have retained limited methodological information in the Results where it is essential for understanding the interpretation of non-standard experimental setups or key controls, like SMLM. In these cases, brief methodological context is provided to ensure clarity without requiring frequent cross-referencing to the Methods section.

      Overall, the Results section has been substantially condensed and reorganized to improve readability, while additional experiments added in response to reviewer comments necessarily increase the scope of the section. We believe the revised structure now clearly separates experimental outcomes from methodological detail and improves the flow of the Results.

      The discussion section, at 7 pages, is overly long and includes substantial extraneous information. For example, it begins with a 2.5 page long paragraph that includes a summary of pattern formation during embryogenesis in animals, followed by a brief description of Turing's reaction-diffusion model, and finally, repeating parts of the introduction, a summary of the mechanism underlying MinCDE localization in E. coli. It is only in the middle of this paragraph - near the end of the second page - that the authors turn their attention back to MinD localization in B. subtilis, albeit with a focus on reaction-diffusion-based behaviors of other ParA homologues. A revised discussion section should focus on the primary conclusion of the authors, based on data presented in the results. If the authors would like to make the case that their data fit the Turing reaction-diffusion model, they will need to include mathematically based modeling that demonstrates this point in their results.

      We have substantially revised and condensed the Discussion in response to this comment. In the revised manuscript, we removed the extended introductory material on general pattern formation, embryogenesis, and Turing reaction-diffusion theory, as these topics extended beyond the scope of the present study. We also eliminated redundant summaries of the E. coli MinCDE system that overlapped with the Introduction. The revised Discussion now focuses on the primary conclusions supported by our experimental data, namely the biochemical and in vivo mechanisms governing MinD membrane binding, ATPase activity, and dynamic localization in B. subtilis, as well as the regulatory roles of MinJ and MinC. Importantly, we would like to clarify that we did not intend to claim that the B. subtilis Min system follows a Turing-type reaction-diffusion mechanism. References to general reaction-diffusion concepts were meant to provide contextual background and not to imply a specific mathematical framework for the system studied here. To avoid any possible ambiguity, we have removed these references from the Discussion.

      While the overall length of the Discussion is now comparable to the previous version, this reflects the inclusion of substantial new experimental data added during revision. Importantly, the structure and content of the Discussion have been streamlined to prioritize interpretation of the results rather than general background, resulting in a more focused and cohesive narrative.

      Experimental comments:

      Line 213: Please provide a rationale for the ATPase experiments. What is the expected result for each mutant and why?

      We have clarified the rationale for the ATPase experiments in the revised manuscript by briefly outlining the expected behavior of each MinD mutant. The anticipated ATPase properties of G12V, K16A, and D40A are based on well-established studies of conserved Walker-type ATPases and were implicit in the original experimental design, as they should all be catalytically inactive. To avoid any ambiguity, we now state these expectations explicitly in the manuscript.

      Line 243: ATPase data for the mutant proteins should be included in the supplement.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      Figure 2B: Please include transverse section fluorescence data for all variants as well as quantitative data on average MinD positioning.

      The quantitative information requested is already provided by our single-molecule localization and tracking (SMLM/SMT) analysis of Halo-MinD and its variants (Fig. 4 A and now S 12 A). This approach represents the averaged spatial distribution of individual MinD localizations collected from dozens of cells per condition and provides substantially higher spatial resolution and quantitative precision than transverse fluorescence profiles obtained by conventional widefield microscopy.

      We therefore believe that the SMLM-based analysis is superior to transverse section fluorescence measurements and more accurately captures average MinD positioning across the cell population. To avoid redundancy, we have retained the SMLM analysis as the quantitative framework for MinD localization.

      Figure 2B: I am not convinced that punctate and membrane-associated are mutually exclusive. Quantitative data on protein localization from transverse fluorescent sections is necessary to make this point.

      Please see the answer above and Fig. 4 A

      Figure 2B: It is impossible to assess the functionality of individual mutants without quantitative data on minicell frequency and cell length.

      We have addressed this point by quantitatively measuring both cell length and minicell frequency for all relevant strains. These analyses were performed on a minimum of n = 430 cells per strain and are now presented in Table S 5. The added data provide a quantitative assessment of mutant functionality and support the phenotypic interpretations shown in Fig. 2B, and is also integrated in the Results section.

      Other comments:

      Line 109: should read "components".

      Thank you, corrected.

      Line 135: Why is this sentence outside the major section of the results?

      It now has been integrated into the major section.

      Line 197: I am not sure I understand this sentence.

      We have revised this sentence to improve clarity and readability.

      Line 218: I do not understand this paragraph.

      We have also rephrased and rewritten this paragraph for clarity and readability.

      Line 223: To make this section focused on the results rather than the method, the authors could simply say "To determine the role of ATP mediated dimerization, we...." (If I am understanding this section correctly).

      We followed this suggestion and revised the text accordingly to focus on the experimental outcome rather than methodological detail.

      Line 273: "depicted" not depictured.

      Thank you, corrected.

      Figure 4: The single-cell data look good in the figure, however, the description of these results and their meaning are nearly impossible to follow in the text.

      We acknowledge that the single-molecule data presented in Fig. 4 are complex. While we have made minor clarifications to improve the flow and wording of the text, we did not substantially reduce the level of detail, as the description of the analytical framework is required for correct interpretation of the results.

      At the same time, we aimed to avoid repeating extensive methodological explanations that are already described in the Materials and Methods section, in line with other reviewer comments. We therefore retained a concise but technically accurate description in the Results to ensure that the biological conclusions drawn from Fig. 4 can be properly understood.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses:

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, PF4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the PF4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, in the revised version, we discuss a potential limitation of the Pf4Cre-based labeling approach for studying meningeal macrophages’ Ca<sup>2+</sup> signaling, namely that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In the revised discussion part, we acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation).

      In the results section, we indicate that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling regulates dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      Thank you for this suggestion. In the revision, we indicate that further studies are required to resolve the exact source of synchrony.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      While we propose that the decrease in macrophage Ca<sup>2+</sup> signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised discussion, we indicate that further studies are needed to determine whether this reduction in meningeal macrophage Ca<sup>2+</sup> activity reflects altered viability or reduced immune function that could interfere with the macrophage’s ability to restore homeostasis and dampen local inflammation.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

      Although n=3 is common in intravital imaging of the meninges, including experiments employing pharmacological manipulations, such as RAMP1 inhibition (5-7), a larger sample size will increase confidence in the results. We further acknowledge that our pharmacological data indicate only a potential role for RAMP1 signaling in meningeal macrophages and that CGRP/RAMP1 signaling in other meningeal immune or vascular cells may also play a role.

      Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments.

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text.

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (8). We now provide this information in the revised manuscript, including their labeling approach with a dextran tracer.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We have added more background and the method for inducing CSD (i.e., a pinprick in the frontal cortex) in the Results section.

      Reviewer #3 (Public review):

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included.

      Please see previous responses regarding the specificity of the PF4Cre line for targeting macrophages. The specificity of the RAMP1 antagonist we used (BIBN4096, Olcegepant) has been confirmed by its developer Boehringer Ingelheim, and has been used to target CGRP signaling in numerous studies, including those targeting meningeal macrophage and vascular signaling (2, 7). A section on the study’s limitations has been added.

      References:

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9, (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv, (2025).

      (5) T. L. Roth et al., Transcranial amelioration of inflammation and cell death after brain injury. Nature 505, 223-228 (2014).

      (6) M. V. Russo, L. L. Latour, D. B. McGavern, Distinct myeloid cell subsets promote meningeal remodeling and vascular repair after mild traumatic brain injury. Nat Immunol 19, 442-452 (2018).

      (7) K. L. Monaghan et al., Highly dynamic dural sinuses support meningeal immunity. Nature, (2026).

      (8) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221, (2024).

    1. Author response:

      In the review, the critique was focused mainly on the functional results, which show that interpatch neurons in mouse V1 are more strongly modulated by locomotion than patch neurons. The anatomical results that patch and interpatch modules are recurrently connected in three interareal subnetworks were considered solid.

      We acknowledge the limitations of our work. Specifically, the number of recorded neurons could be higher, the mapping of neurons onto to patch and interpatch modules could be more direct, and the asymmetric distribution of locomotion-modulated responses in layer 2/3 may be confounded by selective masking of GCaMP signals by surface blood vessels. In experiments which are not included in the manuscript we have found no systematic spatial relationship between the M2AChR pattern and the vascular marker CD31, ruling out that masking contributed to the imaging results. Unfortunately, we are unable to revise the manuscript to the extent recommended by the reviewers because the collaborators have left the lab, which closed in 2024.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055, or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to assess for potential secondary KRAS mutations. While our study focused on KRAS secondary mutation and its specific signaling pathways, we acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We revised the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      (2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      This has been done in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to investigate the additional secondary mutation in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data has been provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The text has been revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics can has beeen added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future study. The raw data has been added as supplements for the readers' interest.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      Has been uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells, highlighting a distinct signaling behavior that prompted us to further investigate on AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain the resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition (i.e., H358 10.1016/j.jtcvs.2005.06.051; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells demonstrated high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To strengthen the scientific rigor and overall presentation of the study, the authors should consider the following:

      (1) Perform additional functional validations, including reconstitution experiments after PI3K and 4E-BP1 knockouts, to more definitively demonstrate the role of these targets in mediating resistance.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Acquired resistant H23AR and H358AR isogeneic cells overly expressed PI3K and 4EBP1 proteins, whereas the expression of these proteins was normal in parental cell lines (H23 and H358). These two pairs of cell lines (H23 vs H23AR & H358 vs H358AR), along with multiple knock-out clones from each cell line, were used in every functional assay, which represents the cells or clones with normal, overexpression, and no expression of the target proteins (Figure 5B, D-F & Figure 6D-E). Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      (2) Improve experimental quantifications, particularly for western blot analyses, and ensure all key findings are supported by statistically significant comparisons.

      The changes observed on the Western blot were not subtle and obvious without quantification.

      (3) Clarify enrichment analysis by directly comparing resistant and sensitive models and use appropriate FDR thresholds (<0.05) when claiming significant pathway activation.

      The Mass Spectrometry data were analyzed by the Department of Biostatistics, and the methodology for the statistical analysis is explained in the Methods section. The enriched pathways were identified by pre-ranked GSEA using the gene list ranked by log-transformed P values with signs set to positive/negative for a fold change of >1 or <1, respectively, from the global proteomics and phosphoproteomics data. All the enriched pathways were ranked based on their enrichment scores and considered significant with an FDR value <0.05. Each enrichment plots in Figure 2 were marked with its respective FDR q value as well as nominal p-value (Figure 2D-E). The result section (page 14) is also revised for clarification.

      (4) Address alternative mechanisms of resistance, such as secondary mutations or KRAS overexpression, through deeper genetic and proteomic profiling.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations, and not considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (5) Improve data presentation by enhancing figure quality, ensuring consistent labeling, and providing complete figure legends and descriptions.

      Revised

      (6) Revise and polish the manuscript text for clarity, accuracy, and consistency, paying special attention to avoiding contradictory statements and strengthening mechanistic interpretations.

      Revised

      Major Comments:

      (1) In Figure 1A, the authors state that "four PDX models were selected for evaluating sotorasib sensitivity based on their distinct co-mutation patterns," but it is unclear whether these patterns are common, clinically significant, or selected for another specific reason. Clarification is needed regarding the rationale for model selection.

      The models have co-mutations that are common in clinical specimens and are associated with drug resistance (Skoulidis, Ferdinandos, et al. "Co-occurring genomic alterations define major subsets of KRAS-mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities."Cancer discovery 5.8 (2015): 860-877). Out of 11 PDX models with KRAS<sup>G12C</sup> mutations, 4 models were selected for in vivo evaluation of sotorasib sensitivity based on their distinct co-mutation status. Co-mutations with either p53, STK11, or KEAP1 are the most commonly found co-mutations in NSCLC and become more challenging in therapeutic treatments in the clinic. All four PDXs selected for the in-vivo study harbor at least one of these co-mutations with the KRAS<sup>G12C</sup> mutation.

      (2) Whole-exome sequencing (WES) results for TC303 AR and TC314 AR are mentioned but not shown in the supplementary material. These results should be included.

      Included as a figure supplement in Figure 1-figure supplement 1

      (3) In Figure 2 - Figure Supplement 1, H23 AR and H358 AR acquired multiple SNPs and indels compared to their sensitive counterparts. The authors need to address whether these genetic alterations could contribute to resistance.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells, shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (4) In Figure 3D-E, in the enrichment analysis, the authors describe enrichment of mTORC1 signaling in resistant PDXs without sufficiently comparing with the sensitive counterparts. They need to clarify whether the enrichment is unique to resistant cells.

      The comparison is sensitive to resistant cells (Figure 3C). In Figure 3D-E all enrichment data presented in the figure were derived from global and phosphoproteomic analysis on sotorasib-acquired resistant TC314AR PDX and compared with its sensitive counterpart TC314 PDX (Figure 3D) and sotorasib-acquired resistant TC314AR+TC303AR PDXs (combined) vs their sensitive counterparts TC314 + TC303 PDXs (Combined) in Figure 3E. We revised the text to make it clear.

      (5) In Figure 3F, the FDR values of 0.5 and 1.0 are too high to support conclusions of significant pathway activation. Similar issues exist for Figure 3 - Figure Supplement 2 (FDR q-values of 1.0, 0.989, and 0.813).

      Agree, FDR values are higher in the enrichment analysis on phosphoproteomic data, and not in the proteomics data. However, these enrichment scores indicate pathway activation. The FDR was higher, most likely due to the low number of phosphoproteins enriched in the designated pathways. Significant FDR values were found when the enrichment analysis was done on global proteomics data.

      (6) In Figure 3H, PI3K upregulation is inferred from RPPA quantification. An independent validation, such as immunoblotting, should be provided.

      In addition to the sotorasib-acquired resistant PDX samples, PI3K was found to be upregulated and shown in immunoblotting on sotorasib-resistant isogeneic cell lines (H23AR and H358AR cells) in Figure 4B.

      (7) In Figure 4B, increased PI3K (p85) levels alone do not support pathway activation, as p-AKT levels remain unchanged. Functional downstream markers (e.g., p-S6, p-4EBP1) should be assessed.

      Agree, the status of other downstream markers, such as p-S6 and p-4EBP1, was shown in Figure 4H and Figure 5E & 5F.

      (8) In Figure 4D, PI3K inhibition does not reduce colony formation in AR cells relative to parental cells. The data do not support the conclusion that PI3K inhibition sensitizes AR cells.

      These experiments show that the drugs are equally effective in the presence or absence of drug resistance to sotorasib. The specific role of PI3K is shown in the knockout experiments (Fig. 5) as explained in the result section on pages 18-19. H23AR and H358AR cells showed over 600- and 200-fold resistance to sotorasib as compared with their sensitive counterpart (Figure 2A) with IC50 20µM and 6µM, respectively. Whereas copanlisib, a PI3K inhibitor, can significantly sensitize the AR cells with the IC50 0.39µM and 0.06µM in H23AR and H358AR cells, respectively, which were as sensitive as the parental cells. PI3K signaling was significantly upregulated in AR cells, and inhibition of the PI3K-AKT-mTOR signaling through CRISPR-Cas9 PI3K knock-out (Figure 5) or inhibition of PI3K or downstream molecules by copanlisib, everolimus, or AZD8055 sensitizes the AR cells as singularly or synergistically with sotorasib (Figure 6H, & Figure 7A).

      (9) In Figures 4D-F, single or combination inhibition of PI3K, AKT, and mTORC1 in H23/H23AR and H358/H358AR cells shows no significant difference in colony formation between resistant and parental lines. Therefore, the conclusion that PI3K inhibition sensitizes sotorasib-resistant cells is not supported by the data.

      See response to (8).

      (10) In Figure 4G, copanlisib does not significantly inhibit p-mTOR (S2448) in H23 AR cells, and total mTOR levels decrease slightly. Quantification should be added.

      Added as a supplement

      (11) In Figure 4G, western blot results for p-PDK and PDK are not quantified, and effects vary between H23^AR and H358^AR cells. Quantification needs to be added.

      Added as a supplement

      (12) In Figure 6H, cell viability curves for H23AR/PI3K KO 3-3 cells start from <60%, suggesting pre-existing poor cell health. This casts doubt on conclusions regarding dual drug effects.

      All cell viability remained at or close to 100% at the no-treatment control condition, and the cell viability at the starting point was lower than 100% only in the combination treatment group, where the cells were treated with at least one drug. Here, a fixed dose of AZD8055 (50nM or 100nM) was combined with different doses of sotorasib. The dual drug effects are assessed by the combination index, which takes viability factors into account. Combination effects were confirmed by in vivo experiments.

      (13) The manuscript claims that mTORC1 inhibition alone is insufficient to suppress resistance (page 23), yet earlier reports that the mTORC1 inhibitor everolimus significantly reduces colony formation (page 17). This inconsistency needs to be addressed.

      revised. On p. 23, we are referring to 4E-BP1-mediated resistance.

      (14) In Figure 7G, since copanlisib alone appears as effective as combination therapy, the authors should revise the conclusion to emphasize the sufficiency of PI3K inhibition alone.

      Agree, the copanlisib treatment appeared to be very effective in the H23AR xenograft model, which is most likely due to the copanlisib dose used in this model, which showed a strong antitumor effect and superseded the combination effect. However, the synergistic antitumor activity of copanlisib with sotorasib was found in H358CDX and TC314AR PDX models (Figure 7D, & I).

      (15) In Figure 7I, statistical comparisons (P-value) comparing combination therapy to copanlisib monotherapy are missing. Without statistical significance, the conclusion regarding the combination efficacy cannot be justified.

      Revised

      Minor Comments:

      (1) Figure 1D is not described in the main text.

      Revised

      (2) On page 12, "FigG" and "FigH" should be corrected to "Figure 2G" and "Figure 2H," respectively.

      Revised

      (3) On page 17, the section title "copanlisib modulates PI3K-AKT-mTOR signaling..." should capitalize the first word.

      Revised

      (4) In Figure 7, "sotorasib" and "AMG510" are used interchangeably but refer to the same drug; consistent labeling should be used to avoid confusion.

      Revised

      (5) In Figure 7 - Figure Supplement 2A-B, the rationale for switching from AZD8055 to sapanisertib, another dual mTORC1/mTORC2 inhibitor, is unclear and should be explained.

      Revised

      Reviewer #2 (Recommendations for the authors):

      Please review all the figures and labels, are there are many mistakes? Also, check the way that the figures are presented and, if necessary, increase the definition.

      Revised

      (1) Figure 2 seems to be squashed.

      Revised

      (2) RPPA experiment "PI3K-AKT-mTOR signaling pathway compared to their sensitive counterparts. Specifically, the expression levels of MEK1, p-MEK1, p-MAPK, PDK1, p-PRAS40, p-GSK-3β, p-4E-BP1, p-PI3K, p-Akt, p-PRAS40, p-p38-MAPK, p-AMPK, and p-MAPK were markedly increased in resistant TC303AR and TC314AR PDXs." Several of these proteins are not really part of the PI3K-AKT-MTOR pathway, as such, but the MAPK pathway, and this is masked by not mentioning this. It is also necessary to explain which proteins are called MAPK and why there are 2 p-MAPK.

      Revised

      (3) Figure 3 - Figure Supplement 3. The images seem saturated for some of the blots. Is there still a decrease in ERK activity in the resistant cells? Lower exposure blots should be included, and if possible, some quantification performed.

      Quantification added

      (4) Figure 4I, review the title of the left graph, as this is not only sensitivity to everolimus.

      Revised

      (5) The figure legends need extensive review and rewriting. For instance, in Figure 6, the times for how long the treatments were performed in the different graphs have to be specified. The figure legends must allow interpretation of the data without reading the material and methods or text.

      Revised

      Materials and Methods

      This section needs special attention for typos and style, for instance:

      (1) Correct "KRASG12G inhibitors including sotorasib, adagrasib," to G12C.

      Revised

      (2) Use appropriate symbols i.e., "3 ul sgRNA (30 uM), 0.5 ul Cas9 (20 uM), and 3.5 ul Buffer R were mixed"

      Revised

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate Reviewer #1’s very positive feedback. Incorporating the perspective of ‘incidental’ sensory signals is a valuable suggestion that aligns perfectly with our findings. We agree that this perspective significantly strengthens the impact of our paper.

      In the revised version, we will update the manuscript to bridge these perspectives (the functional role of incidental” sensory signals and the role of retinal flow in navigation). In addition we will elaborate on the potential predictions of the model and possible manipulations that might affect the integration between sensory evidence (curl signal) and straight-ahead prior.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s feedback regarding the formalization of our reference frames. We agree that certain definitions were implicitly assumed rather than explicitly stated. We will revise the manuscript to provide all necessary self-contained information, ensuring that the geometry of the task response and the definition of heading are unambiguous. Also, we will address the gap between the task response (in world coordinates) and the functional role of the controller, as well as the other points raised by the reviewer.

      Major issues:

      (1a), (2a) Clarification of Reference Frames

      The reviewer asks: “To ‘directly estimate heading’ relative to what?”

      In our study, participants were instructed to report their “perceived direction of self-motion” by aligning a rotational encoder (steering wheel) with the direction they felt they were moving within the 3D simulated scene. Consequently, participants reported their instantaneous heading in a world-centered reference frame, from which the 3D trajectories were reconstructed. Since the reviewer had to infer this information, it should be clarified to ensure it is immediately evident.

      Participants were informed that the initial heading (i.e. θ<sub>0</sub> in our controller nomenclature) was oriented “straight ahead” relative to their body which was aligned longitudinally with the experimental room. We will modify Figure 1B and revise the Methods section to explicitly clarify this initial alignment and the instructions provided to participants.

      In the revised manuscript, we will clarify that while the participant’s report is world-centered, the retinal curl provides a gaze-relative heading signal. Although this was already mentioned, we will emphasize this point. In natural navigation toward a fixated target, a world-centered vector is often unnecessary; an error signal indicating heading relative to fixation is sufficient (as the reviewer also notes). However, the initial alignment of the heading within the 3D scene allows the brain to “calibrate” this internal controller, mapping the retinal curl signal onto the 3D world coordinates required for the task.

      The reviewer also asks how we can be certain that participants were reporting in world coordinates rather than an alternative frame, such as “heading relative to the fixation target.” We believe our “Cancelled Curl” (and over-cancelled) conditions provide the most compelling evidence to rule out this alternative. In these conditions, the physical position of the fixation target in the scene remained identical to the unaltered flow condition. If participants were simply reporting heading relative to the fixation target’s spatial location, the observed biases should have persisted regardless of the flow manipulation. Instead, the bias vanished when the curl was removed. This causal evidence proves that the bias is driven by the retinal motion signal (curl) rather than the spatial orientation of the eyes or the target’s position in the scene. Furthermore, the temporal evolution of the response supports a world-centered integration. For simulated straight paths, the perceived heading remains straight for the first few seconds (consistent with the initial world-centered alignment), with biases only emerging after approximately 3 seconds of integration (a point we elaborate on in our response to Reviewer #3). Had participants been responding based on a simple gaze-relative reference frame from the onset, these biases would have manifested significantly earlier. We will incorporate these points into the revised Discussion to better frame our findings alongside other cues, such as the Focus of Expansion (FOE), that contribute to heading estimation.

      (1b) The reviewer notes that we must be clear about the relationship between curl and heading (relative to fixation) and the variables that affect curl.

      Beyond the discrepancy between heading (θ) and gaze (ψ), curl is geometrically determined by translational self-motion speed (υ), eye height (h), and pitch (α). More specifically curl = (υ sin_ψ_cos α)/h). The derivation will be included in the Supplementary Information. Since h = d_sin_α, where d is the 3D distance to the fixation point, we could express cos α as a function of distance. Certainly, there is not a 1:1 map from curl signal to heading relative to gaze (e.g. θ – ψ). Participant would need to know υ and eye height plus extra-retinal information. Frenz et al (2003, Vis Res.) showed that people can estimate self-motion directly from optic flow, across different simulated eye height and gaze angle; extra-retinal information can, in addition, provide knowledge to (ψ) and (α). It is then plausible that the visual system can use and transform the curl signal from a qualitative directional cue (i.e. steering left or right of fixation) into a quantitative steering command. By combining curl with knowledge of gaze orientation and eye height, the visual system can resolve ambiguities in the flow field and utilize curl as a more precise error signal for locomotor control. These aspects will be included in the new version.

      (2b) Mismatch between task and controller

      We thank the reviewer for this point. We have addressed the alignment of the reference frames in our response to Issues 1a and 2a. Once the initial orientation () is established in the world frame, the controller model generates steering adjustments that directly translate into heading predictions within that same world reference frame. By treating the perceptual report as an output of the locomotor controller, we resolve the discrepancy between the steering task and the reported heading.

      (2c) No raw data provided

      We respectfully disagree with the reviewer’s interpretation regarding data smoothing. The thin lines in Figure 2 represent the mean 3D paths derived directly from the response variable (θ<sub>0</sub>) across trials of identical conditions for each participant (as detailed in the ‘Computation of Perceived Path’ section). No smoothing or filtering has been applied to these plotted trajectories other than computing the mean across trials. We also wish to remind the reviewer that the raw data and analysis code remain publicly accessible for further inspection. Regarding the visual representation: in earlier versions of the manuscript, we included shaded 95% Confidence Intervals (CIs) in Figure 2. However, this addition rendered the plot overly cluttered and obscured the individual trajectories. We therefore elected to present individual participant means (thin lines) alongside group averages (thick lines) to emphasize inter-subject variability. For clarity, the 95% CIs are explicitly displayed in Figure 3, where the data density is more conducive to shaded areas.

      (3) Difference with Matthis et al (2022)

      While Matthis et al. (2022) described the existence of retinal curl during walking and which information can provide relative to gaze, Our paper provides the causal link, since we manipulate in real-time (the ‘cancelled & overcancelled curl’ condition) providing the critical evidence that perceived heading is affected by this signal.

      (4) Eye movements analysis

      We thank the reviewer for noting that retinal slip (velocity error) is a more critical metric than positional gaze error. We agree that tracking inaccuracies can introduce translational noise into the flow field. The 3° threshold was established based on the eye tracker’s specifications and the naturalistic setup (1-meter viewing distance without head stabilization). Across all participants, the mean positional error ranged from 1.016° to 1.5° (1 deg is 2.08 cm in our setup). We also calculated retinal slip values, which ranged from 0.12 to 0.27 deg/s (X dimension) and 0.12 to 0.23 deg/s (Y dimension). These values are comparable to natural oculomotor drift (Kowler et al., 1979) and are understandably small given the low velocity of the fixation target. Consequently, it is highly unlikely that retinal slip influenced the results. Furthermore, assuming that tracking error remained consistent across fixation conditions, any present retinal slip cannot explain why the bias followed the retinal curl manipulation as predicted by the controller. We therefore consider retinal slip to be an unlikely confounding factor.

      (5) the separate and joined fits

      We thank the reviewer for the opportunity to clarify the logic behind our modeling choices. We acknowledge that the “separate fits” are inherently less informative due to the high number of free parameters relative to the data. Our primary scientific goal was not to achieve perfect descriptive accuracy via 30 parameters, but to test a specific functional hypothesis through the “joint fit.”

      The Logic of the Joint Fit:

      We agree with the reviewer that the joint fit misses some paths in some conditions. Of course, the joint fit reflects a significant compromise. The “Gain” (the weighting of the curl signal) is likely not a static constant but is dynamically tuned based on task demands, confidence in the visual signal, simulated speed, and so on. By using a single Gain parameter, we intentionally ignore this contextual variability to see how much of the behavior can be explained by a “minimalist” controller. In this sense, the 2-parameter joint model is a deliberate attempt to test this limit. By forcing a single Gain parameter to account for all conditions across both straight and curved paths within one flow manipulation (e.g. unaltered flow) we are asking if a single, fixed linear relationship between retinal curl and steering effort/gain can explain the results. We view the joint fit not as a “perfect” model, but as a stronger test of the curl-based control theory. The fact that a 2-parameter model can capture the direction and scale of biases across such a diverse set of conditions (straight/curved paths, five fixation eccentricities) suggests that retinal curl is a robust signal. Upon closer analysis, these discrepancies between the joint model and the data are most pronounced in the over-cancelled condition which is the one when sensory evidence becomes more ecologically inconsistent with the extra-retinal information (gaze direction). While the joint fit successfully demonstrates that a single parameter can capture the general functional role of curl, it fails to account for the complex sensory re-weighting that occurs in ecologically inconsistent conditions (like ‘over-cancelled’ flow). We will update the manuscript to discuss these limitations, framing the model as a parsimonious first-order approximation rather than a complete description of human heading perception based on a minimal set of parameters.

      (6) On the neural simulations

      We acknowledge that the presentation of the neural model requires more clarity regarding its objectives and its relationship to the behavioral data.

      We first wish to clarify the intended scope of the neural ring-attractor model. Our primary goal was not to provide a comprehensive account of behavioral performance across all conditions (which is the role of the controller model), but rather to demonstrate a biologically plausible mechanism that explains the emergence of the “Opposite-to-Gaze” bias. While the controller demonstrates that the bias follows a specific control law, the neural model shows how such a law can emerge from known primate neurophysiology, specifically, spiral-tuned MSTd neurons, gaze-contingent inhibition, and an egocentric “straight-ahead” prior.

      Why Straight Paths are Sufficient for this Objective. The reviewer asks why only straight paths were simulated. In our study, the straight-path condition with eccentric gaze is the purest test of the bias mechanism. Simulating the straight paths allowed us to isolate the interaction between foveal inhibition and the straight-ahead prior without the confounding variable of path-curvature flow. Given the complexity of the neural network’s parameter space, we focused on these conditions to provide a clear neuro-plausible explanation.

      Units: Pixels vs. Degrees. We acknowledge that the use of “pixels” in the plots of internal neural dynamics may appear awkward. The neural network operates on input stimuli that are defined by the pixel resolution of the videos used in the simulations, we used pixels as the native coordinate system to describe the movement of activity peaks within the network’s internal “map.”

      Behavioral Output (Meters): Importantly, the final heading estimates produced by the network are not left in pixels. We use a pinhole camera model to reconstruct the 3D trajectories from the neural activity. These results are expressed in meters, allowing for a direct comparison with the human behavioral data.

      Addressing Wild Oscillations and Smooth Paths. The oscillations observed in the instantaneous heading estimates reflect the stochastic nature of the population peak when tracking high-frequency sensory inputs. In our model, the synaptic time constant (τ) was kept relatively small to ensure a fast, low-latency response to changes in self-motion. While increasing τ would have produced smoother internal dynamics, it would also have introduced delays into the control loop. Instead, we chose to maintain this high sensory responsiveness and applied a temporal moving average later to the network’s decoding to reconstruct the 3D trajectories.

      In addition, the neural activity over time is shown in two ways: the heatmap shows the neuron with preferred heading (one can see more oscillations, specially when the fixation point is closer to the centre (eccentricities -2 and 2), due to larger competition between the sensory evidence and the straight-ahead prior. The other way is the decoded heading. In the ring-attractor model, the decoded heading is not determined by a single neuron but is calculated using a population vector average (equation 19). By summing across the entire population, the decoder effectively integrates sensory evidence from many neurons simultaneously. One can appreciate (see e.g. Fig. 5B) that averaged decoding, leads to a smoother resulting estimate (the white dashed line, whose visibility will be improved in the revised version). Behavioral work by Burr and Santoro (2001) suggests that global motion signals (divergence and rotation in optic flow) are integrated over much longer timescales—roughly 1000ms to 3000ms—compared to local motion units (~200ms).

      See also our comment on temporal integration in the responses to reviewer #3.

      Reviewer #3 (Public review):

      We thank Reviewer #3 the comments regarding the definition of heading at different time scales, the role of the gait cycle, and the temporal integration of the curl signal. They will help us refine the manuscript’s core arguments.

      We agree that “heading” must be precisely defined within the context of the differing temporal demands of balance and steering. While instantaneous retinal motion provides the high-frequency feedback necessary for momentary postural adjustments and balance, our study is concerned with heading as a gaze-relative signal used for the continuous control of a locomotor trajectory. As such, we will revise the manuscript to specify that the perceived heading measured in our task reflects a signal integrated over the gait cycle to filter out the oscillatory noise induced by head bob and sway.

      The reviewer correctly notes that gait-induced head bob and sway produce high-frequency oscillations in the curl signal, yet our behavioral results show smooth, slowly evolving biases. The visual system does not react to “instantaneous” curl, which would lead to jittery, unstable heading estimates. Instead, it integrates flow over a timescale roughly commensurate with a full gait cycle (~500–1000ms). This implies a significant temporal integration process. This temporal integration is consistent with evidence (Burr and Santoro,2001, Vis Res) indicating that optic flow signals (radial and rotational components) are integrated over windows of approximately up to 3 seconds to ensure perceptual stability. Neurally, this likely involves the projection from area MSTd to the Ventral Intraparietal area (VIP), a pathway where fast, eye-centered sensory inputs are transformed into stable, body-centered representations suitable for guiding long-term steering behavior (Chen et al. 2011, JNeurosci.). By grounding our definition of heading in these specific temporal and neural constraints, we aim to clarify how the visual system exploits retinal curl for goal-directed action in natural, dynamic environments and relate our findings to recent studies addressing the role of retinal motion on balance (Powell et al. 2026 Bioarx).

      In our implementation, we explicitly address the high-frequency noise introduced by gait dynamics by smoothing the retinal curl signals computed from the stimulus videos before they are fed into the controller. This temporal filtering allows the fit of the controller’s prediction to the response data while remaining robust to the rapid fluctuations of head bob and sway. In contrast, the neural ring-attractor model would not require an external smoothing step; instead, the integration is an emergent property of the system’s architecture that can be controlled with different parameters. The dynamics of the synaptic weights and the characteristic “leak” in the population activity naturally implement a leaky integration of sensory evidence, ensuring that the decoded heading reflects a sustained estimate rather than an instantaneous response to visual noise.

    1. Author response:

      Reviewer 1:

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      We thank the reviewer for thoroughly evaluating our work.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      We will clarify that when discussing bimodal stimuli, we mean the co-presentation of two unimodal stimuli of similar intensity. We will add references to the literature during discrimination tasks that have shown that multisensory cue-combination followed Bayes rule integration (e.g., Ernst & Banks, 2002; Battaglia et al., 2003; Alais & Burr, 2004) and clarify in which ways our work differs from this rich body of work and provides novel contributions.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion. Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity.

      One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      We will clarify that our framework explains how absence judgments (and related confidence) are formed, and what it adds to SDT models, including the reproduction of reaction times and a normative explanation of criterion placement (results about RTs are available in the supplementary materials).We will also run additional model comparisons assessing how an SDT-based generative model performs compared to our Bayesian model based on counterfactual perceivability.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      This is an excellent suggestion, and we will conduct the proposed analysis.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      All typical measures of metacognitive sensitivity are influenced by metacognitive bias and task performance to some extent, and none of them is a pure measure of type-2 sensitivity (e.g., see Rahnev, 2025). Here, we chose logistic regression because it enables modeling interactions with other predictors in a factorial design with a limited number of trials.

      We will clarify the limitations of metacognitive sensitivity measures and better explain why we then used Mratio to estimate metacognitive performance while controlling for underlying task performance.

      Thank you for this suggestion. We will add the d’ values per condition to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      This higher confidence for absence judgments than for presence judgments was observed while controlling for response accuracy. We will clarify this in the main text.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding. Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive/absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model. Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      Indeed, the disjunctive rule was expected, given our design; we will clarify this. As mentioned above, we will directly compare the results of our current model with those of a more traditional SDT-based generative model, as suggested by the reviewer.

      Contrary to a classical drift diffusion model, the model does not assume a fixed decision boundary, but derives an optimal stopping policy per time point and belief state. As a result, and depending on beliefs about perceptual evidence and the temporal discounting factor, optimal decision boundaries can be asymmetric and may collapse asymmetrically toward 0. Furthermore, given the asymmetry in the information value between sensor activations and inactivations, and differences in the information value of sensor activations of the two modalities, boundary crossing can lead to belief states that are far or close to the decision boundary, depending on the nature of the evidence. Together, even without an explicit modeling of post-decisional evidence, the model can account for variability in the total accumulated evidence at decision time.

      From our understanding, the proposed alternative is equivalent to our current model, in which confidence scales with evidence magnitude.

      The model was not fitted to confidence data, which could explain its overall overconfidence. To further test our model, we will assess its ability to reproduce patterns of metacognitive efficiency (M-ratios).

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

      The model was not fitted to either the modality-specific responses or the confidence ratings, so the correlation between observed and simulated CAI was not expected and provides a good test of our model's ability to reproduce the observed patterns. We will test whether the same correlations hold when using the difference in accuracy instead of the confidence.

      We found that the best model is the one with the same belief across the visual and auditory sensors. Given this, we cannot investigate how modality-specific belief parameters are linked to unimodal sensitivity for each participant.

      Reviewer 2:

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1 . In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      This is an excellent suggestion. We will run this analysis and refit the d’ and criterion response using unequal-variance models to see whether we observe the same results.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task

      We will clarify this point in the manuscript. In Experiment 2, the stimulus was absent on 50% of the trials. As a result, the 50% of stimulus present trials were split into the three possible conditions, resulting in a sixth of the trials being auditory, a sixth visual, and a sixth audiovisual; we will make these proportions clearer in the text.

      We did not have any a priori hypotheses about the response criteria for Experiment 1. The reviewer is right, the proportion of absent versus present trials can indeed have an impact on response bias. In fact, one of the goals of Experiment 2 was to test whether the low frequency of absent trials compared to present ones could explain both response bias and higher confidence in absence observed in Experiment 1, which we found was not the case, as we did not observe a difference between the two experiments. We will clarify this in our revision.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      We will add individual data to Figure 2.

      Indeed, staircases failed to achieve 50% detection on average; participants for whom psychometric curves did not converge were excluded, as were those at floor level in one of the two modalities.

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123and https://link.springer.com/article/10.3758/s13428-025-02852-7 ). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      Data were collected on the Prolific platform, which has been shown to yield high-quality data (Kay, 2025). However, we agree that this is a potential concern and will add a note of caution in the revised manuscript, even if the risk that the data do not come from humans but from bots is low (Huskey et al., 2026; Chetverikov, 2026).

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      We will clarify this aspect of the analysis. Data were binned into four quartiles based on the overall distribution of confidence values across participants, based on the binning used in the example in Fleming (2017). We will examine whether changing the number of bins changes the results (Dayan, 2023).

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      Yes, logistic regression has been used to quantify metacognitive sensitivity before. We will add the relevant papers as references (e.g., Sandberg et al., 2010; Norman et al., 2011; Siedlecka et al., 2016; Wierzchoń et al., 2012; Faivre et al., 2018; Pereira et al., 2023)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      The number of participants was determined using a Bayesian optional stopping rule, as preregistered. The reviewer is right that the delta values are very similar in the two experiments. Given that a difference was found in only one experiment, we decided not to draw conclusions from it.

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      We will examine how the proportion of miscategorisation changed throughout the task.

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      Our main goal was to investigate how participants form integrated, supramodal confidence judgments on the basis of multisensory sources of information. Therefore, the amodal confidence judgments are required here.

      Moreover, the model was fitted to response times that corresponded to the amodal judgment. Because we had no meaningful response times for the modality-specific judgment, we could not use them to fit the model.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      Indeed, the model does not capture the full behavioral effects reflecting multisensory interference in the modality-specific responses. We suppose that the model does not reproduce these interferences, as it is only fitted to amodal detection accuracy, and as the two sensors are completely independent from one another. We will clarify this aspect in the text.

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this

      In addition to the d’ and criterion comparison between the observed and simulated data, we will compare modality-specific d’ and the correlations between observed and simulated confidence.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      To further support this claim, we will add a quantitative comparison of the different confidence fits.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      Thank you for this remark. Indeed, the difference in metacognitive efficiency may be driven by differences in the d’ values, and so a lower d’ for auditory stimuli can lead to higher metacognitive efficiency for a similar metacognitive sensitivity.

      Reviewer 3:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      We thank the reviewer for engaging with our work.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      We will improve the narrative structure so that each result clearly relates to the research question.

      We will also add a clearer definition of the various metacognition metrics to improve readability.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      The model was not fitted to confidence data, which could explain its overall overconfidence. As stated in previous responses, we will perform additional analyses to evaluate the model’s ability to reproduce confidence ratings. As some of the results were not replicated across experiments, we decided to put all statistical results related to multisensory interference in the supplementary materials and to focus only on consistent results across experiments.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

      The relation here is indeed artificial; we try to reinforce it as much as possible in the instructions of the task by indicating to the participants that they have to “detect a mosquito” that could be present auditory, visually, or both. But we acknowledge that the association between the visual and auditory stimuli is artificial, which may indeed influence our results.

      References

      Alais, D., & Burr, D. (2004). The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. Current Biology, 14(3), 257‑ 262. https://doi.org/10.1016/j.cub.2004.01.029

      Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. JOSA A, 20(7), 1391‑ 1397. https://doi.org/10.1364/JOSAA.20.001391

      Chetverikov, A. (2026). Online behavioral studies are safe for now : Unusual RTs do not imply bots (A reply to Van der Stigchel et al., 2026) (Gjw5u_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/gjw5u_v1/

      Dayan P. (2023). Metacognitive Information Theory. Open mind : discoveries in cognitive science, 7, 392–411. https://doi.org/10.1162/opmi_a_00091

      Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), Article 6870. https://doi.org/10.1038/415429a

      Faivre, N., Filevich, E., Solovey, G., Kühn, S., & Blanke, O. (2018). Behavioral, Modeling, and Electrophysiological Evidence for Supramodality in Human Metacognition. Journal of Neuroscience, 38(2), 263‑ 277. https://doi.org/10.1523/JNEUROSCI.0322-17.2017

      Fleming, S. M. (2017). HMeta-d : Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1),

      Huskey, R., Zhao, Z., Parry, D. A., & Fisher, J. T. (2026). An AI agent can complete the Attention Network Test with human-like behavioral signatures : Implications for the bot-or-not debate (T2jru_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/t2jru_v1/

      Kay, C.S. Why you shouldn’t trust data collected on MTurk. Behav Res 57, 340 (2025). https://doi.org/10.3758/s13428-025-02852-7nix007. https://doi.org/10.1093/nc/nix007

      Norman, E., Price, M. C., & Jones, E. (2011). Measuring strategic control in artificial grammar learning. Consciousness and Cognition, 20(4), 1920-1929. https://doi.org/10.1016/j.concog.2011.07.008

      Pereira, M., Skiba, R., Cojan, Y., Vuilleumier, P., & Bègue, I. (2023). Preserved Metacognition for Undetected Visuomotor Deviations. Journal of Neuroscience, 43(35), 6176‑ 6184. https://doi.org/10.1523/JNEUROSCI.0133-23.2023

      Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16(1), 701. https://doi.org/10.1038/s41467-025-56117-0

      Sandberg, K., Timmermans, B., Overgaard, M., & Cleeremans, A. (2010). Measuring consciousness : Is one measure better than the other? Consciousness and Cognition, 19(4), 1069‑ 1078. https://doi.org/10.1016/j.concog.2009.12.013

      Siedlecka, M., Paulewicz, B., & Wierzchoń, M. (2016). But I Was So Sure ! Metacognitive Judgments Are Less Accurate Given Prospectively than Retrospectively. Frontiers in Psychology, 0. https://doi.org/10.3389/fpsyg.2016.00218

      Wierzchoń, M., Asanowicz, D., Paulewicz, B., & Cleeremans, A. (2012). Subjective measures of consciousness in artificial grammar learning task. Consciousness and cognition, 21(3), 1141-1153. https://doi.org/10.1016/j.concog.2012.05.012

    1. Author response:

      We sincerely thank the Reviewing Editor (Dr. Florent Ginhoux), Senior Editor (Dr. Satyajit Rath), and both reviewers for their thoughtful and constructive evaluation of our manuscript. We appreciate the recognition that our study provides a valuable observation regarding the TLR7-independent effects of imiquimod (IMQ) via the unfolded protein response (UPR) and Gelsolin in psoriasis-like dermatitis. Importantly, we acknowledge that the current framing may overemphasize direct relevance to human psoriasis. In the revised manuscript, we will reposition the study to focus on IMQ-induced skin inflammation as a model of chemical- and stress-induced inflammatory responses, rather than a direct representation of human plaque psoriasis. We also acknowledge that the mechanistic link between Gelsolin and skin inflammation remains incomplete, and we are committed to addressing the key concerns raised.

      Below, we outline our planned revisions in response to the public reviews. We will submit a revised version after performing the additional experiments and textual improvements.

      Reviewer #1 (Public review):

      We fully agree that the exclusive use of the IMQ model has limitations in fully recapitulating human plaque psoriasis, which is primarily driven by the IL-23/IL-17 axis involving Th17/Tc17 cells. We will substantially temper our claims regarding direct translational relevance to human psoriasis and clearly discuss the IMQ model as a tool to study innate immune-driven and chemical stress-induced inflammation in the skin (new Discussion section). In addition, we will strengthen the rationale for focusing on Gelsolin by incorporating available human data suggesting altered Gelsolin expression in inflammatory conditions.

      (1) We will add a dedicated paragraph in the Introduction and Discussion acknowledging the differences between IMQ-induced dermatitis and human psoriasis (citing key references such as PMID: 28945199).

      (2) For keratinocyte experiments, we will revise the text to avoid implying that keratinocytes stimulated with IMQ represent a psoriasis model, and instead position this system more conservatively. Specifically, we will treat keratinocytes as a system to assess AMP and chemokine induction rather than as a direct model of psoriasis. We will therefore incorporate stimulation with IL-17A (100 ng/ml) ± TNF-α (10 ng/ml) to establish AMP/chemokine induction, and additionally examine the effect of UPR activation by co-treatment with DTT (or other UPR inducers). This will allow us to determine whether UPR activation enhances IL-17A/TNF-α-driven AMP and chemokine expression.

      (3) We will expand the Methods section with full details on RNA-seq dataset selection, normalization, cross-species mapping, and statistical analysis, and re-evaluate key analyses where necessary to ensure robustness and reproducibility. Canonical psoriasis signature genes (e.g., S100A8/A9, IL-17C, IL-36g) will be validated by qRT-PCR in the revised manuscript.

      (4) Vehicle controls (including Aldara-specific effects) will be clearly described and shown in all relevant figures.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the strengths in demonstrating TLR7-independent UPR induction and Gelsolin as an IMQ-binding protein.

      (1) To strengthen the mitochondrial Ca<sup>2+</sup> signaling data (Fig. 1B), we will add an orthogonal approach (e.g., pharmacological inhibition or alternative Ca<sup>2+</sup> probe) in a new supplementary figure.

      (2) For Gelsolin-IMQ interaction specificity (Fig. 7E-G), we will perform additional experiments comparing IMQ versus RSQ (resiquimod) effects on the observed phenotypes, as recommended.

      We believe these revisions will substantially address the key concerns raised by the reviewers and strengthen the overall quality of the manuscript.

      We again thank the reviewers and editors for their time and valuable feedback, which will significantly improve the manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. Author response:

      Thank you very much for your careful evaluation of our manuscript entitled “Cross-Species BAC Transgenesis Reveals Long-Range Regulation Drives Variation in Brain Oxytocin Receptor Expression and Social Behaviors.” We sincerely appreciate the insightful and constructive comments from both reviewers.

      We are particularly encouraged by the positive assessment that our study provides a useful experimental framework and resource for understanding how regulatory variation contributes to diversity in brain expression patterns and social behaviors. We have carefully considered all comments and outline below the key revisions we will implement in the revised manuscript.

      Conceptual clarification: We will clarify the conceptual framework of the study. While our initial aim was to test whether prairie vole regulatory elements could recapitulate vole-like Oxtr expression patterns in mice, the generation of multiple independent Koi lines revealed that such expression is not faithfully reproduced but instead varies across lines. This observation led us to refocus the study on how regulatory architecture gives rise to diverse expression patterns and their functional consequences. Accordingly, we will revise the manuscript to emphasize that the goal is not to reconstruct prairie vole circuits, but to test how variation in Oxtr expression distribution drives variation in social behaviors.

      Quantification of expression patterns: We will include quantitative analyses of Oxtr expression in both brain and mammary gland tissues. These additions will provide an objective basis for comparing tissue-specific expression and support the conclusion that brain expression is more variable, whereas mammary gland expression is broadly conserved. We will include qRT-PCR data to support mammary gland comparisons.

      Behavioral interpretation: We will clarify that the behavioral analyses are designed to assess how distinct Oxtr expression patterns influence social behaviors within a controlled mouse system, rather than to directly replicate prairie vole phenotypes. We will refine the manuscript to clearly distinguish between partial resemblance to prairie vole expression and the broader goal of linking regulatory variation to behavioral diversity.

      Technical clarification and limitations: We will revise the manuscript to more carefully interpret the roles of genomic integration site and transgene copy number, noting that while integration site likely plays a major role, contributions from copy number cannot be excluded. In addition, we will explicitly acknowledge that our analyses of 3D chromatin architecture are correlative in nature, and that establishing causality would require direct perturbation of chromatin structure, which is beyond the scope of the current study.

      Presentation improvements: We will improve figure clarity, include representative reference images from prairie vole brain to facilitate qualitative comparison, and refine descriptions in the Results and Methods sections to enhance clarity and readability.

      We thank the reviewers again for their insightful and constructive feedback, which we believe will significantly strengthen the manuscript. We look forward to submitting a revised version incorporating these improvements.

    1. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

      We thank the reviewers for their thoughtful and constructive feedback. We appreciate their recognition of our work’s value and the improvements made in this revised version.

      We are particularly grateful to Reviewer 3 for their detailed and insightful comments, which identified errors we (and other reviewers) had unfortunately overlooked. To address these concerns and ensure the manuscript meets the high standards of clarity and rigor we aim for, we have made additional corrections and refinements.

      As part of this process, we conducted a thorough review of the original source files. This was especially important given that the project spanned from 2018 to 2025, and many co-authors have since left their previous positions.

      We appreciate the opportunity to resubmit this manuscript and are confident that these updates fully address the concerns raised by the reviewer and the editorial team.

      Reviewer #3 (Recommendations for the authors):

      (1) I still do not see the data in WB 2G reflecting the quantification in 2H and 2I. Moreover, the authors state they performed 1 additional experiment, but it appears not to have been included in the analysis of 2H and 2I since the graphs remained the same from the last version of the manuscript.

      We apologize for this oversight. The additional experiment has now been incorporated into the analysis for Figures 2H and 2I, and the graphs have been updated accordingly. While we had uploaded the new blot, we inadvertently forgot to update the analysis graphs. Thank you for bringing this to our attention.

      (2) The authors talk several times about "supercomplexes 1 and 2" without testing their precise composition (there is a ton of literature about SC species in several mouse cell types, and separate BN-PAGE immunoblotting of individual MRC complexes would precisely define them in this context)

      We agree with the reviewer that this is an important point. However, structural differences between supercomplexes were outside the scope of this paper, and we did not perform such analyses. That said, examining the precise composition of supercomplexes could be a valuable direction for future work.

      (3) Steady-state levels of MRC subunits do not match the observations from BN-PAGE results. That might be potentially interpreted and explained by the possible accumulation of intermediates but this is not explored.

      We appreciate the reviewer’s observation. There is indeed a strong possibility that differences in the expression of structural components of mitochondrial complexes exist between WT and Miro1 -/- cells. However, in this study, we chose to focus on assessing potential differences in the enzymatic activities of the complexes rather than examining their structural composition. Exploring the accumulation of intermediates and structural differences could be an interesting avenue for future investigations.

      (4) Citrate synthase normalization of kinetic enzyme activities is claimed, yet it is not shown in any graph and no description of the method is provided.

      We sincerely thank the reviewer for pointing out this discrepancy. Upon careful review, we realized that our statement regarding citrate synthase normalization of kinetic enzyme activities in the last revised version was made in error. This was a miscommunication between co-authors, and we did not perform citrate synthase normalization. Instead, the normalization was performed against protein concentration, determined by the BCA assay as described in the manuscript. We regret this oversight and appreciate the opportunity to clarify this.

      (5) Complex I activity is still wrongfully described as NADPH oxidation in the methods

      We corrected this error.

      (6) The authors state 'Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV'. I do not understand this, I find this justification insufficient and not substantiated by any experimental evidence. What buffer has been used for isolation? There are hundreds of protocols for isolation of intact mitochondria and MRC complexes. Also, DDM and digitonin are the gold-standard detergents for MRC complexes isolation and separation via BN-PAGE.

      We thank the reviewer for raising this important point. We have revised the response to clarify the exact experimental conditions and to provide supporting data.

      For BN-PAGE, mitochondrial fractions purified from cultured VSMCs or aortic tissue were prepared using a standard protocol (now explicitly detailed in the Methods). Briefly, mitochondria were resuspended in 6-aminocaproic acid (ACA) buffer containing 750 mM ACA, 50 mM Bis-Tris (pH 7.0), and protease inhibitors. Forty micrograms of mitochondrial protein were solubilized with 1.5% digitonin, using a final detergent-to-protein ratio of 8:1, and incubated on ice for 20 minutes prior to clarification by centrifugation at 16,000 g for 30 minutes at 4°C. Thus, consistent with established standards, digitonin—one of the gold-standard detergents for MRC complex solubilization and BN-PAGE—was used throughout.

      Despite using these widely accepted conditions, we found that detection of fully assembled Complex IV by BN-PAGE was inconsistent, a limitation that has been reported by others and is known to be sensitive to mitochondrial source, tissue type, and solubilization efficiency. To address this directly and avoid over-interpretation, we assessed Complex IV integrity by examining core subunits. As shown in Figure 6—figure supplement 1 (panels B and C), expression levels of MTCO1 and MTCO2, both essential core components of Complex IV, do not differ significantly between WT and Miro1-/- cells, supporting the conclusion that Complex IV abundance is not altered.

      We have revised the manuscript to clarify these methodological details and to explicitly state that conclusions regarding Complex IV are based on subunit analysis rather than BN-PAGE visualization alone.

      (7) Complex V IGA also does not seem to reflect its quantification.

      Thank you for highlighting this concern. To address it, we will include the numerical data alongside the figures to ensure clarity and alignment with our findings. We hope this will provide a more comprehensive understanding and resolve any ambiguity.

      (8) Figure 6 supplement 1, the authors state 'we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants'. I do not understand, what background is being used? what mutants are being expressed? all the figures refer to Miro1 -/- which is, according to standard genetic nomenclature, a loss-of-function allele (KO).

      Thank you for your comment. To clarify, we first infected MIRO1fl/fl VSMCs with an adenovirus expressing the DNA recombinase Cre or a control adenovirus. Cells infected with the adenovirus expressing Cre are labeled as MIRO1-/- cells. In these MIRO1-/- cells, we then introduced MIRO1 wild type (WT) and MIRO1 mutants via adenoviral expression.

      The mutants include one lacking the transmembrane domain (MIRO1-ΔTM), and another in which the two EF hands of MIRO1 were point-mutated (MIRO1-KK). MIRO1-WT is denoted as Ad WT, the mutant MIRO1-KK as Ad KK, and MIRO1-ΔTM as Ad ΔTM in the figures. We hope this explanation clarifies the experimental background and nomenclature used.

      (9) Figure 6 supplement 1B, no normalization is provided (e.g. VDAC, TOM20 etc.). Interestingly, VDAC is then used to normalize the data in C-D-E-F-G. Also, why is MIRO1 detected in lane 4? Is the mutant stable or not? There is zero signal in A.

      Thank you very much for pointing out that the immunoblot for VDAC1 was missing in Figure 6—Supplement 1B. This figure has been reviewed several times, and unfortunately, this error was not detected. We sincerely apologize for this oversight. We have now revised the figure to include the immunoblot for VDAC1 to address this issue.

      Regarding the detection of MIRO1 in lane 4, we confirm that the "mutant" is not stable. To generate MIRO1 knockout cells, aortic smooth muscle cells from MIRO1fl/fl mice were isolated and cultured, followed by infection with an adenovirus expressing Cre. As these are primary cells and the deletion was induced by Cre expression, the recombination efficiency can vary, which is reflected in the variability observed in lanes 2 and 4 of the immunoblot.

      (10) Why are COX4 levels so low in the 2nd replicate in 7A? the authors 'We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (see image below)'. I could not find the image.

      Thank you for your comment. The second pair of samples in Figure 7A is from a different preparation of mitochondria. In our experimental design, a control sample and a MIRO1 knockdown sample were processed side by side and run next to each other on the immunoblot.

      Regarding the anti-VDAC immunoblot, the image was included in our response to reviewers during the previous revision, as we did not believe it altered the message conveyed by the COX4 blot. However, to ensure clarity and address your concern, we have now included the anti-VDAC immunoblot directly in the figure. We hope this addition resolves any ambiguity and provides further confidence in the data presented.

      (11) The proposed interaction between MIRO1 and NDUFA9 is very difficult to reconcile, as the two proteins reside in distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane (OMM), with its functional domains facing the cytosol, whereas NDUFA9 is a matrix-facing accessory subunit of mitochondrial Complex I, positioned at the interface between the N- and Q-modules.

      We appreciate the reviewer’s comment and agree that MIRO1 and NDUFA9 occupy distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane with cytosol-facing domains, whereas NDUFA9 is a matrix-facing accessory subunit of Complex I at the N/Q-module interface.

      Our data do not suggest a stable, constitutive interaction within intact mitochondria. Rather, the observed association likely reflects an indirect, transient, or context-dependent interaction, potentially occurring during mitochondrial stress, remodeling, or turnover. Such associations may be mediated by multi-protein complexes spanning mitochondrial membranes, dynamic contact sites, or post-lysis interactions detected under experimental conditions. Increasing evidence supports functional coupling between outer mitochondrial membrane proteins and inner membrane or matrix pathways without direct physical binding.

      Additional comments:

      (12) All the raw data should be provided to the readers (uncropped and annotated WB, IHC images, numerical data with statistics applied).

      We agree with the reviewer and appreciate the emphasis on transparency. In accordance with eLife submission requirements, we have provided all raw data. The Source Data files associated with each figure now include uncropped and annotated immunoblots, as well as the numerical source data for all quantified analyses.

      During the compilation of these materials, we were unable to locate the original source files for Figure 2A. The control experiment depicted in the previous version, which demonstrates in vitro recombination, was performed in 2018. However, this experiment was repeated several times throughout the project. Therefore, to ensure the manuscript remains complete, we have replaced this panel with a representative immunoblot from a similar experiment. Additionally, during our review, we discovered a labeling error in Figure 3D and G. We have corrected these figures to ensure accuracy.

      All source files have been provided and carefully labeled to facilitate independent evaluation.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1:

      Thank you very much for considering that our manuscript evaluates an important question and that the reagents used are well prepared and characterized. We also much appreciate that you consider the information generated as potentially useful for those studying HIV infection processes and strategies to prevent infection.

      (1) While a single particle tracking routine was applied to the data, it's not clear how the signal from a single GFP was defined and if movement during the 100 ms acquisition time impacts this. My concern would be that the routine is tracking fluctuations, and these are related to single particle dynamics, it appears from the movies that the density or the GFP tagged receptors in the cells is too high to allow clear tracking of single molecules. SPT with GFP is very difficult due to bleaching and relatively low quantum yield. Current efforts in this direction that are more successful include using SNAP tags with very photostable organic fluorophores. The data likely does mean something is happening with the receptor, but they need to be more conservative about the interpretation.

      Some of the paradoxical effects might be better understood through deeper analysis of the SPT data, particularly investigation of active transport and more detailed analysis of "immobile" objects. Comments on early figures illustrate how this could be approached. This would require selecting acquisitions where the GFP density is low enough for SPT and performing a more detailed analysis, but this may be difficult to do with GFP.

      When the authors discuss clusters of <2 or >3, how do they calibrate the value of GFP and the impact of diffusion on the measurement. One way to approach this might be single molecules measurements of dilute samples on glass vs in a supported lipid bilayer to map the streams of true immobility to diffusion at >1 µm2/sec.

      We fully understand the reviewer’s apprehensions regarding the application of these high-end biophysical techniques, in particular the associated complexity of the data analysis. We provide below extensive explanations on our methodology, which we hope will satisfactorily address all of the reviewer’s concerns.

      We would first like to emphasize that the experimental conditions and the quantitative analysis used in our current experiments are similar to the established protocols and methodologies applied by our group previously (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022; Gardeta et al. Frontiers in Immunol., 2022; García-Cuesta et al. eLife, 2024; Gardeta et al. Cell. Commun. Signal., 2025) and by others (Calebiro et al. PNAS, 2013; Jaqaman et al. Cell, 2011; Mattila et al. Immunity, 2013; Torreno-Pina et al. PNAS, 2014; Torreno-Pina et al. PNAS, 2016).

      As SPT (single-particle tracking) experiments require low-expressing conditions in order to follow individual trajectories (Manzo & García-Parajo Rep. Prog. Phys., 2015), we transiently transfected Jurkat CD4<sup>+</sup> cells with CXCR4-AcGFP or CXCR4<sup>R334X</sup>-AcGFP. At 24 h post-transfection, cells expressing low CXCR4-AcGFP levels were selected by a MoFlo Astrios Cell Sorter (BeckmanCoulter) to ensure optimal conditions for SPT. Using Dako Qifikit (DakoCytomation), we quantified the number of CXCR4 receptors and found ~8,500 – 22,000 CXCR4-AcGFP receptors/cell, which correspond to a particle density ~2 – 4.5 particles/µm<sup>2</sup> (Author response image 1) and are similar to the expression levels found in primary human lymphocytes.

      Author response image 1.

      Purified AcGFP monomeric protein was immobilized on glass at various concentrations. Dependency of the distribution of particle components on particle density was calculated; >95% were monomeric single particles at 2.0-4.5 particles/µm<sup>2</sup>. This range of particle density was used to analyze the dynamics of CXCR4-AcGFP, or CXCR4<sup>R334X</sup>-AcGFP single particles on JKCD4 cells.

      These cells were resuspended in RPMI supplemented with 2% FBS, NaPyr and L-glutamine and plated on 96-well plates for at least 2 h. Cells were centrifuged and resuspended in a buffer with HBSS, 25 mM HEPES, 2% FBS (pH 7.3) and plated on glass-bottomed microwell dishes (MatTek Corp.) coated with fibronectin (FN) (Sigma-Aldrich, 20 µg/ml, 1 h, 37°C). To observe the effect of the ligand, we coated dishes with FN + CXCL12; FN + X4-gp120 or FN + VLPs, as described in material and methods; cells were incubated (20 min, 37°C, 5% CO<sub>2</sub>) before image acquisition.

      For SPT measurements, we use a total internal reflection fluorescence (TIRF) microscope (Leica AM TIRF inverted) equipped with an EM-CCD camera (Andor DU 885-CS0-#10-VP), a 100x oilimmersion objective (HCX PL APO 100x/1.46 NA) and a 488-nm diode laser. The microscope was equipped with incubator and temperature control units; experiments were performed at 37°C with 5% CO<sub>2</sub>. To minimize photobleaching effects before image acquisition, cells were located and focused using the bright field, and a fine focus adjustment in TIRF mode was made at 5% laser power, an intensity insufficient for single-particle detection that ensures negligible photobleaching. Image sequences of individual particles (500 frames) were acquired at 49% laser power with a frame rate of 10 Hz (100 ms/frame). The penetration depth of the evanescent field used was 90 nm.

      We performed automatic tracking of individual particles using a very well established and common algorithm first described by Jaqaman (Jaqaman et al. Nat. Methods, 2008). Nevertheless, we would stress that we implemented this algorithm in a supervised fashion, i.e., we visually inspect each individual trajectory reconstruction in a separate window. Indeed, this algorithm is not able to quantify merging or splitting events.

      We follow each individual fluorescence spot frame-by-frame using a three-by-three matrix around the centroid position of the spot, as it diffuses on the cell membrane. To minimize the effect of photon fluctuations, we averaged the intensity over 20 frames. Nevertheless, to assure the reviewer that most of the single molecule traces last for at least 50 frames (i.e., 5 seconds), we provide the following data and arguments. We currently measure the photobleaching times from individual CD86-AcGFP spots exclusively having one single photobleaching step to guarantee that we are looking at individual CD86-AcGFP molecules. The distribution of the photobleaching times is shown below (Author response image 2). Fitting of the distribution to a single exponential decay renders a t0 value of ~5 s. Thus, with 20 frames averaging, we are essentially measuring the whole population of monomers in our experiments. As the survival time of a molecule before photobleaching will strongly depend on the excitation conditions, we used low excitation conditions (2 mW laser power, which corresponds to an excitation power density of ~0.015 kW/cm<sup>2</sup> considering the illumination region) and longer integration times (100 ms/frame) to increase the signal-to-background for single GFP detection while minimizing photobleaching.

      Author response image 2.

      Single molecule photobleaching times measured directly from single molecule trajectories of CD86-AcGFP, considering only traces that exhibit single molecule photobleaching steps. The experimental data are shown in gray bars (n=273 trajectories over 3 independent experiments). The red line corresponds to a single exponential decay fitting of the experimental data, from where t<sub>o</sub> has been extracted.

      To infer the stoichiometry of receptor complexes, we also perform single-step photobleaching analysis of the TIRF trajectories to establish the existence of different populations of monomers, dimers, trimers and nanoclusters and extract their percentage. Some representative trajectories of CXCR4-AcGFP with the number of steps detected are shown in new Supplementary Figure 1.  

      The emitted fluorescence (arbitrary units, a.u.) of each spot in the cells is quantified and normalized to the intensity emitted by monomeric CD86-AcGFP spots that strictly showed a single photobleaching step (Dorsch et al. Nat. Methods, 2009). We have preferred to use CD86-AcGFP in cells rather than AcGFP on glass to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. We have also previously shown pharmacological controls to exclude CXCL12-mediated receptor clustering due to internalization processes (Martinez-Muñoz et al. Mol. Cell, 2018) that, together with the evaluation of single photobleaching steps and intensity histograms, allow us to exclude the presence of vesicles in our data. Thus, the dimers, trimers and nanoclusters found in our data do correspond to CXCR4 molecules on the cell surface. Finally, distribution of monomeric particle intensities, obtained from the photobleaching analysis, was analyzed by Gaussian fitting, rendering a mean value of 980 ± 86 a.u. This value was then used as the monomer reference to estimate the number of receptors per particle in both cases, CXCR4-AcGFP and CXCR4<sup>R334X</sup>-AcGFP (new Supplementary Figure 1).

      (2) I understand that the CXCL12 or gp120 are attached to the substrate with fibronectin for adhesion. I'm less clear how how that VLPs are integrated. Were these added to cells already attached to FN?

      For TIRF-M experiments, cells were adhered to glass-bottomed microwell dishes coated with fibronectin, fibronectin + CXCL12, fibronectin + X4-gp120, or fibronectin + VLPs. As for CXCL12 and X4-gp120, the VLPs were attached to fibronectin taking advantage of electrostatic interactions. To clarify the integration of the VLPs in these assays, we have stained the microwell dishes coated with fibronectin and those coated with fibronectin + VLPs with wheat germ agglutinin (WGA) coupled to Alexa647 (Author response image 3) and evaluated the staining by confocal microscopy. These results indicate the presence of carbohydrates on the VLPs and are, therefore, indicative of the presence of VLPs on the fibronectin layer.

      Author response image 3.

      Representative confocal images of microwell dishes coated with fibronectin ((left panel) or fibronectin + VLPs (right panel)) and stained with wheat germ agglutinin (WGA) coupled to Alexa647. Bar scale 1µm.

      Moreover, it is important to remark that the effect of the VLPs on CXCR4 behavior at the cell surface observed by TIRF-M confirmed that the VLPs remained attached to the substrate during the experiment.

      (3) Fig 1A - The classification of particle tracks into mobile and immobile is overly simplistic description that goes back to bulk FRAP measurements and it not really applicable to single molecule tracking data, where it's rare to see anything that is immobile and alive. An alternative classification strategy uses sub-diffusion, normal diffusion and active diffusion (or active transport) to descriptions and particles can transition between these classes over the tracking period. Fig 1B- this data might be better displayed as histograms showing distributions within the different movement classes.

      In agreement with the reviewer’s commentary, the majority of the particles detected in our TIRFM experiments were indeed mobile. However, we also detected a variable, and biologically appreciable, percentage of immobile particles depending on the experimental condition analyzed (Figure 1A in the main manuscript). To establish a stringent threshold for identifying these immobile particles under our specific experimental conditions, we used purified monomeric AcGFP proteins immobilized on glass coverslips. Our analysis demonstrated that 95% of these immobilized proteins showed a diffusion coefficient £0.0015 µm<sup>2</sup>/s; consequently, this value was established as the cutoff to distinguish immobile from mobile trajectories. While the observation of truly immobile entities in a dynamic, living system is rare, the presence of these particles under our conditions is biologically significant. For instance, the detection of large, immobile receptor nanoclusters at the plasma membrane is entirely consistent with facilitating key cellular processes, such as enabling the robust signaling cascade triggered by ligand binding or promoting the crucial events required for efficient viral entry into the cells.

      Regarding the mobile receptors (defined as those with D<sub>1-4</sub> values exceeding 0.0015 µm<sup>2</sup>/s), we observed distinct diffusion profiles derived from mean square displacement (MSD) plots (Figure V) (Manzo & García-Parajo Rep. Prog. Phys., 2015), which were further classified based on motion, using the moment scaling spectrum (MSS) (Ewers et al. PNAS, 2005). Under all experimental conditions, the majority of mobile particles, ~85%, showed confined diffusion: for example under basal conditions, without ligand addition, ~90% of mobile particles showed confined diffusion, ~8.5% showed Brownian-free diffusion and ~1.5% exhibited directed motion (new Supplementary Figure 5A in the main manuscript). These data have been also included in the revised manuscript to show, in detail, the dynamic parameters of CXCR4.

      Due to the space constraints, it is very difficult to include all the figures generated. However, to ensure comprehensive assessment and transparency (for the purpose of this review), we have included below representative plots of the MSD values as a function of time from individual trajectories, showing different types of motion obtained in our experiments (Author response image 4).

      Author response image 4.

      Representative MSD plots from individual trajectories of CXCR4AcGFP detected by SPT-TIRF in resting JKCD4 cells showing different types of motion: A) confined, B) Brownian/Free, C) direct transport.

      (4) Fig 1C,D - It would be helpful to see a plot of D vs MSI at a single particle level. In comparing C and D I'm surprised there is not a larger difference between CXCL12 and X4-gp120. It would also be very important to see the behaviour of X4-gp120 on the CXCR4 deficient Jurkat that would provide a picture of CD4 diffusion. The CXCR4 nanoclustering related to the X4-gp120 could be dominated by CD4 behaviour.

      As previously described, all analyses were performed under SPT conditions (see previous response to point 1). Figure 1C details the percentage of oligomers (>3 receptors/particle) calibrated using Jurkat CD4<sup>+</sup> cells electroporated with monomeric CD86-AcGFP (Dorsch et al. Nat. Methods, 2009). The monomer value was determined by analyzing photobleaching steps as described in our previous response to point 1.

      In our experiments, we observed a trend towards a higher number of oligomers upon activation with CXCL12 compared with X4-gp120. This trend was further supported by measurements of Mean Spot Intensity. However, the values are also influenced by the number of larger spots, which represents a minor fraction of the total spots detected.

      The differences between the effect triggered by CXCL12 or X4-gp120 might also be attributed to a combination of factors related to differences in ligand concentration, their structure, and even to the technical requirements of TIRF-M. Both ligands are in contact with the substrate (fibronectin) and the specific nature of this interaction may differ between both ligands and influence their accessibility to CXCR4. Moreover, the requirement of the prior binding of gp120 to CD4 before CXCR4 engagement, in contrast to the direct binding of CXCL12 to CXCR4, might also contribute to the differences observed.

      We previously reported that CXCL12-mediated CXCR4 dynamics are modulated by CD4 coexpression (Martinez-Muñoz et al. Mol. Cell, 2018). We have now detected the formation of CD4 heterodimers with both CXCR4 and CXCR4<sup>R334X</sup>, and found that these conformations are influenced by gp120-VLPs. In the present manuscript, we did not focus on CD4 clustering as it has been extensively characterized previously (Barrero-Villar et al. J. Cell Sci., 2009; JiménezBaranda et al. Nat. Cell. Biol., 2007; Yuan et al. Viruses, 2021). Regarding the investigation of the effects of X4-gp120 on CXCR4-deficient Jurkat cells, which would provide a picture of CD4 diffusion, we would note that a previous report has already addressed this issue using single molecule super-resolution imaging, and revealed that CD4 molecules on the cell membrane are predominantly found as individual molecules or small clusters of up to 4 molecules, and that the size and number of these clusters increases upon virus binding or gp120 activation (Yuan et al. Viruses, 2021).

      (5) Fig S1D- This data is really interesting. However, if both the CD4 and the gp120 have his tags they need to be careful as poly-His tags can bind weakly to cells and increasing valency could generate some background. So, they should make the control is fair here. Ideally, using non-his tagged person of sCD4 and gp120 would be needed ideal or they need a His-tagged Fab binding to gp120 that doesn't induce CXCR4 binding.

      New Supplementary Figure 2D shows that X4-gp120 does not bind Daudi cells (these cells do not express CD4) in the absence of soluble CD4. While the reviewer is correct to state that both proteins contain a Histidine Tag, cell binding is only detected if X4-gp120 binds sCD4. Nonetheless, we have included in the revised Supplementary Figure 2D a control showing the negative binding of sCD4 to Daudi cells in the absence of X4-gp120. Altogether, these results confirm that only sCD4/X4-gp120 complexes bind these cells.

      (6) Fig S4- Panel D needs a scale bar. I can't figure out what I'm being shown without this.

      Apologies. A scale bar has been included in this panel (new Supplementary Figure 6D).

      Reviewer #2:

      (1) This study is well described in both the main text and figures. Introduction provides adequate background and cites the literature appropriately. Materials and Methods are detailed. Authors are careful in their interpretations, statistical comparisons, and include necessary controls in each experiment. The Discussion presents a reasonable interpretation of the results. Overall, there are no major weaknesses with this manuscript.

      We very much appreciate the positive comments of the reviewer regarding the broad interest and strength of our work.

      (2) NL4-3deltaIN and immature HIV virions are found to have less associated gp120 relative to wild-type particles. It is not obvious why this is the case for the deltaIN particles or genetically immature particles. Can the authors provide possible explanations? (A prior paper was cited, Chojnacki et al Science, 2012 but can the current authors provide their own interpretation.)

      Our conclusion from the data is actually exactly the opposite. As shown in Figure 2D, the gp120 staining intensity was higher for NL4-3DIN particles (1,786 a.u.) than for gp120-VLPs (1,223 a.u.), indicating lower expression of Env proteins in the latter. Furthermore, analysis of gp120 intensity per particle (Figure 2E) confirmed that gp120-VLPs contained fewer gp120 molecules per particle than NL4-3DIN virions. These levels were comparable with, or even lower than, those observed in primary HIV-1 viruses (Zhu et al. Nature, 2006). This reduction was a direct consequence of the method used to generate the VLPs, as our goal was to produce viral particles with minimal gp120 content to prevent artifacts in receptor clustering that might occur using high levels of Env proteins in the VLPs to activate the receptors.  

      This misunderstanding may arise from the fact that we also compared Gag condensation and Env distribution on the surface of gp120-VLPs with those observed in genetically immature particles and integrase-defective NL4-3ΔIN virions, which served as controls. STED microscopy data revealed differences in Env distribution between gp120-VLPs and NL4-3ΔIN virions, supporting the classification of gp120-VLPs as mature particles (Figure 2 A,B).

      Reviewer #3:

      We thank the reviewer for considering that our work offers new insights into the spatial organization of receptors during HIV-1 entry and infection and that the manuscript is well written, and the findings significant.

      (1) For mechanistic basis of gp120-CXCR4 versus CXCL12-CXCR4 differences. Provide additional structural or biochemical evidence to support the claim that gp120 stabilises a distinct CXCR4 conformation compared to CXCL12. If feasible, include molecular modelling, mutagenesis, or crosslinking experiments to corroborate the proposed conformational differences.

      We appreciate the opportunity to clarify this point. The specific claim that gp120 stabilizes a conformation of CXCR4 that is distinct from the CXCL12-bound state was not explicitly stated in our manuscript, although we agree that our data strongly support this possibility. It is important to consider that CXCL12 binds directly to CXCR4, whereas gp120 requires prior sequential binding to CD4, and its subsequent interaction is with a CXCR4 molecule that is already forming part of the CD4/CXCR4 complex, as demonstrated by our FRET experiments and supported by previous studies (Zaitseva et al. J. Leuk. Biol., 2005; Busillo & Benovic Biochim. Biophys. Acta, 2007; Martínez-Muñoz et al. PNAS, 2014). This difference makes it inherently complex to compare the conformational changes induced by gp120 and CXCL12 on CXCR4.

      However, our findings show that both stimuli induce oligomerization of CXCR4, a phenomenon not observed when mutant CXCR4<sup>R334X</sup> was exposed to the chemokine CXCL12 (García-Cuesta et al. PNAS, 2022).

      (1) CXCL12 induced oligomerization of CXCR4 but did not affect the dynamics of CXCR4<sup>R334X</sup> (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022). By contrast, X4-gp120 and the corresponding VLPs—which require initial binding to CD4 to engage the chemokine receptor—stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>.

      (2) FRET analysis revealed distinct FRET<sub>50</sub> values for CD4/CXCR4 (2.713) and CD4/CXCR4<sup>R334X</sup> (0.399) complexes, suggesting different conformations for each complex.

      (3) Consistent with previous reports (Balabanian et al. Blood, 2005; Zmajkovicova et al. Front. Immunol., 2024; García-Cuesta et al. PNAS, 2022), the molecular mechanisms activated by CXCL12 are distinct when comparing CXCR4 with CXCR4<sup>R334X</sup>. For instance, CXCL12 induces internalization of CXCR4, but not of mutant CXCR4<sup>R334X</sup>. Conversely, X4-gp120 triggers approximately 25% internalization of both receptors. Similarly, CXCL12 does not promote CD4 internalization in cells co-expressing CXCR4 or CXCR4<sup>R334X</sup>, whereas X4-gp120 does, although CD4 internalization was significantly higher in cells co-expressing CXCR4.

      These findings suggest that CD4 influences the conformation and the oligomerization state of both co-receptors. To further support this hypothesis, we have conducted new in silico molecular modeling of CD4 in complex with either CXCR4 or its mutant CXCR4<sup>R334X</sup> using AlphaFold 3.0 (Abramson et al. Nature, 2024). The server was provided with both sequences, and the interaction between the two molecules for each protein was requested. It produced a number of solutions, which were then analyzed using the software ChimeraX 1.10 (Meng et al. Protein Sci., 2023). CXCR4 and its mutant, CXCR4<sup>R334X</sup> bound to CD4, were superposed using one of the CD4 molecules from each complex, with the aim of comparing the spatial positioning of CD4 molecules when interacting with CXCR4.

      Author response image 5.

      CD4/CXCR4 complexes were superimposed with CD4/CXCR4 complexes (left panel) or CD4/CXCR4<sup>R334X</sup> complexes (right panels). Arrows indicate the CD4 molecule used as reference for the superimposing.

      As illustrated in Author response image 5, the superposition of the CD4/CXCR4 complexes was complete. However, when CD4/CXCR4 complexes were superimposed with CD4/CXCR4<sup>R334X</sup> complexes using the same CD4 molecule as a reference, indicated by an arrow in the figure, a clear structural deviation became evident. The main structural difference detected was the positioning of the CD4 transmembrane domains when interacting with either the wild-type or mutant CXCR4. While in complexes with CXCR4, the angle formed by the lines connecting residues E416 at the C-terminus end of CD4 with N196 in CXCR4 was 12°, for the CXCR4<sup>R334X</sup> complex, this angle increased to 24°, resulting in a distinct orientation of the CD4 extracellular domain (Author response image 6).

      Author response image 6.

      Comparison of the angle between the transmembrane domains of CD4 in CXCR4 WT and WHIM complexes. The angle between residues N196 from one CXCR4 molecule and E416 from the two CD4 dimer molecules was calculated for the CXCR4 WT (12°) and WHIM (24°) complexes to demonstrate the difference in CD4 positioning.

      To further analyze the models obtained, we employed PDBsum software (Laskowski & Thornton Protein Sci., 2021) to predict the CD4/CXCR4 interface residues. Data indicated that at least 50% of the interaction residues differed when the CD4/CXCR4 interaction surface was compared with that of the CD4/CXCR4<sup>R334X</sup> complex (Author response image 7). It is important to note that while some hydrogen bonds were present in both complex models, others were exclusive to one of them. For instance, whereas Cys<sup>394</sup>(CD4)-Tyr<sup>139</sup> and Lys<sup>299</sup>(CD4)-Glu<sup>272</sup> were present in both CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes, the pairs Asn<sup>337</sup>(CD4)-Ser<sup>27</sup>(CXCR4<sup>R334X</sup>) and Lys<sup>325</sup>(CD4)-Asp<sup>26</sup>(CXCR4<sup>R334X</sup>) were only found in CD4/CXCR4<sup>R334X</sup> complexes.

      Author response image 7.

      Interacting residues at the CD4/CXCR4 interface. The panel displays the interface residues from the CXCR4 and CD4 oligomer. CD4 residues labeled with a red sphere show the interacting residues present in both CXCR4-WT and –WHIM hetero- oligomers. The continuous red lines represent a saline bridge, while the blue lines indicate a hydrogen bond and the dashed red lines represent non-bonded interactions. As illustrated in the figure, half of the interacting residues differ between the WT and WHIM models, indicating that the interacting surfaces are also distinct.

      These findings, which are consistent with our FRET results, suggest distinct interaction surfaces between CD4 and the two chemokine receptors. Overall, these results are compatible with differences in the spatial conformation adopted by these complexes.

      (2) For Empty VLP effects on CXCR4 dynamics: Explore potential causes for the observed effects of Envdeficient VLPs. It's valuable to include additional controls such as particles from non-producer cells, lipid composition analysis, or blocking experiments to assess nonspecific interactions.

      As VLPs are complex entities, we thought that the relevant results should be obtained comparing the effects of Env(-) VLPs with gp120-VLPs. Therefore, we would first remark that regardless of the effect of Env(-) VLPs on CXCR4 dynamics, the most evident finding in this study is the strong effect of gp120-VLPs compared with control Env(-) VLPs. Nevertheless, regarding the effect of the Env(-) VLPs compared with medium, we propose several hypotheses. As several virions can be tethered to the cell surface via glycosaminoglycans (GAGs), we hypothesized that VLPs-GAGs interactions might indirectly influence the dynamics of CXCR4 and CXCR4<sup>R334X</sup> at the plasma membrane. Additionally, membrane fluidity is essential for receptor dynamics, therefore VLPs interactions with proteins, lipids or any other component of the cell membrane could also alter receptor behavior. It is well known that lipid rafts participate in the interaction of different viruses with target cells (Nayak & Hu Subcell. Biochem., 2004; Manes et al. Nat. Rev. Immunol., 2003; Rioethmullwer et al. Biochim. Biophys. Acta, 2006) and both the lipid composition and the presence of co-expressed proteins modulate ligand-mediated receptor oligomerization (Gardeta et al. Frontiers in Immunol., 2022; Gardeta et al. Cell. Commun. Signal., 2025). We have thus performed Raster Image Correlation Spectroscopy (RICS) analysis to assess membrane fluidity through membrane diffusion measurements on cells treated with Env(-) VLPs.

      Jurkat cells were labeled with Di-4-ANEPPDHG and seeded on FN and on FN + VLPs prior to analysis by RICS on confocal microscopy. The results indicated no significant differences in membrane diffusion under the treatment tested, thereby discarding an effect of VLPs on overall membrane fluidity (Author response image 8).

      Author response image 8.

      VLPs treatment does not alter cell membrane fluidity. Diffusion values obtained by RICS from JKCD4X4 cells. (n = 3, with at least 10 cells analyzed per experiment and condition; n.s., not significant).

      Nonetheless, these results do not rule out other non-specific interactions of Env(-) VLPs with membrane proteins that could affect receptor dynamics. For instance, it has been reported that Ctype lectin DC-SIGN acts as an efficient docking site for HIV-1 (Cambi et al. J. Cell. Biol., 2004; Wu & KewalRamani Nat. Rev. Immunol., 2006). However, a detailed investigation of these possible mechanisms is beyond the scope of this manuscript.

      (3) For Direct link between clustering and infection efficiency - Test whether disruption of CXCR4 clustering (e.g., using actin cytoskeleton inhibitors, membrane lipid perturbants, or clustering-deficient mutants) alters HIV-1 fusion or infection efficiency.

      Designing experiments using tools that disrupt receptor clustering by interacting with the receptors themselves is difficult and challenging, as these tools bind the receptor and can therefore alter parameters such as its conformation and/or its distribution at the cell membrane, as well as affect some cellular processes such as HIV-1 attachment and cell entry. Moreover, effects on actin polymerization or lipids dynamics can affect not only receptor clustering but also impact on other molecular mechanisms essential for efficient infection.

      Many previous reports have, nonetheless, indirectly correlated receptor clustering with cell infection efficiency. Cholesterol plays a key role in the entry of several viruses. Its depletion in primary cells and cell lines has been shown to confer strong resistance to HIV-1-mediated syncytium formation and infection by both CXCR4- and CCR5-tropic viruses (Liao et al. AIDS Res. Hum. Retroviruses, 2021). Moderate cholesterol depletion also reduces CXCL12-induced CXCR4 oligomerization and alters receptor dynamics (Gardeta et al. Cell. Commun. Signal., 2025). By restricting the lateral diffusion of CD4, sphingomyelinase treatment inhibits HIV-1 fusion (Finnegan et al. J. Virol., 2007). Depletion of sphingomyelins also disrupts CXCL12mediated CXCR4 oligomerization and its lateral diffusion (Gardeta et al. Front Immunol., 2022). Additional reports highlight the role of actin polymerization at the viral entry site, which facilitates clustering of HIV-1 receptors, a crucial step for membrane fusion (Serrano et al. Biol. Cell., 2023). Blockade of actin dynamics by Latrunculin A treatment, a drug that sequesters actin monomers and prevents its polymerization, blocks CXCL12-induced CXCR4 dynamics and oligomerization (Martínez-Muñoz et al. Mol. Cell, 2018).

      Altogether, these findings strongly support our hypothesis of a direct link between CXCR4 clustering and the efficiency of HIV-1 infection.

      (4) CD4/CXCR4 co-endocytosis hypothesis - Support the proposed model with direct evidence from livecell imaging or co-localization experiments during viral entry. Clarification is needed on whether internalization is simultaneous or sequential for CD4 and CXCR4.

      When referring to endocytosis of CD4 and CXCR4, we only hypothesized that HIV-1 might promote the internalization of both receptors either sequentially or simultaneously. The hypothesis was based in several findings:

      a) Previous studies have suggested that HIV-1 glycoproteins can reduce CD4 and CXCR4 levels during HIV-1 entry (Choi et al. Virol. J., 2008; Geleziunas et al. FASEB J, 1994; Hubert et al. Eur. J. Immunol., 1995).

      b) Receptor endocytosis has been proposed as a mechanism for HIV-1 entry (Daecke et al. J. Virol., 2005; Aggarwal et al. Traffick, 2017; Miyauchi et al. Cell, 2009; Carter et al. Virology, 2011).

      c) Our data from cells activated with X4-gp120 demonstrated internalization of CD4 and chemokine receptors, which correlated with HIV-1 infection in PBMCs from WHIM patients and healthy donors.

      d) CD4 and CXCR4 have been shown to co-localize in lipid rafts during HIV-1 infection (Manes et al. EMBO Rep., 2000; Popik et al. J. Virol., 2002)

      e) Our FRET data demonstrated that CD4 and CXCR4 form heterocomplexes and that FRET efficiency increased after gp120-VLPs treatment.

      We agree with the reviewer that further experiments are required to test this hypothesis, however, we believe that this is beyond the scope of the current manuscript.

      Minor Comments:

      (1) The conclusions rely solely on the HXB2 X4-tropic Env. It would strengthen the study to assess whether other X4 or dual-tropic strains induce similar receptor clustering and dynamics.

      The primary goal of our current study was to investigate the dynamics of the co-receptor CXCR4 during HIV-1 infection, motivated by previous reports showing CD4 oligomerization upon HIV1 binding and gp120 stimulation (Yuan et al. Viruses, 2021). We initially used a recombinant X4gp120, a soluble protein that does not fully replicate the functional properties of the native HIV-1 Env. Previous studies have shown that Env consists of gp120 trimers, which redistribute and cluster on the surface of virions following proteolytic Gag cleavage during maturation (Chojnacki et al. Nat. Commun., 2017). An important consideration in receptor oligomerization studies is the concentration of recombinant gp120 used, as it does not accurately reflect the low number of Env trimers present on native HIV-1 particles (Hart et al. J. Histochem. Cytochem., 1993; Zhu et al. Nature, 2006). To address these limitations, we generated virus-like particles (VLPs) containing low levels of X4-gp120 and repeated the dynamic analysis of CXCR4. The use of primary HIV-1 isolates was limited, in this project, to confirm that PBMCs from both healthy donors and WHIM patients were equally susceptible to infection. This result using a primary HIV-1 virus supports the conclusion drawn from our in vitro approaches. We thus believe that although the use of other X4- and dual-tropic strains may complement and reinforce the analysis, it is far beyond the scope of the current manuscript.

      (2) Given the observed clustering effects, it would be valuable to explore whether gp120-induced rearrangements alter epitope exposure to broadly neutralizing antibodies like 17b or 3BNC117. This would help connect the mechanistic insights to therapeutic relevance.

      As 3BNC117, VRC01 and b12 are broadly neutralizing mAbs that recognize conformational epitopes on gp120 (Li et al. J. Virol., 2011; Mata-Fink et al. J. Mol. Biol., 2013), they will struggle to bind the gp120/CD4/CXCR4 complex and therefore may not be ideal for detecting changes within the CD4/CXCR4 complex. The experiment suggested by the reviewer is thus challenging but also very complex. It would require evaluating antibody binding in two experimental conditions, in the absence and in the presence of oligomers. However, our data indicate that receptor oligomerization is promoted by X4-gp120 binding, and the selected antibodies are neutralizing mAbs, so they should block or hinder the binding of gp120 and, consequently, receptor oligomerization. An alternative approach would be to study the neutralizing capacity of these mAbs on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> complexes. Variations in their neutralizing activity could be then extrapolated to distinct gp120 conformations, which in turn may reflect differences between CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes.

      We thus assessed the ability of the VRC01 and b12, anti-gp120 mAbs, which were available in our laboratory, to neutralize gp120 binding on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup>. Specifically, increasing concentrations of each antibody were preincubated (60 min, 37ºC) with a fixed amount of X4-gp120 (0.05 µg/ml). The resulting complexes were then incubated with Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> (30 min, 37ºC) and, finally, their binding was analyzed by flow cytometry. Although we did not observe statistically significant differences in the neutralization capacity of b12 or VRC01 for the binding of X4-gp120 depending on the presence of CXCR4 or CXCR4<sup>334X</sup>, we observed a trend for greater concentrations of both mAbs to neutralize X4-gp120 binding in Jurkat CD4/CXCR4 cells than in Jurkat CD4/CXCR4<sup>R334X</sup> cells (Author response image 9).

      Author response image 9.

      Flow cytometry analysis of gp120 binding to Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> in the presence of different concentrations of the neutralizing anti-gp120 antibodies b12 (left panel) and VRC01 (right panel). AUC comparison by Welch’s t-test: pvalues 0.2950 and 0.2112 for b12 and VRC01 respectively (n = 2).

      These slight alterations in the neutralizing capacity of b12 and VRC01 mAbs may thus suggest minimal differences in the conformations of gp120 depending of the coreceptor used. We also detected that X4-gp120 and VLPs expressing gp120, which require initial binding to CD4 to engage the chemokine receptor, stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>, but FRET data indicated distinct FRET<sub>50</sub> values between the partners, (2.713) for CD4/CXCR4 and (0.399) for CD4/CXCR4<sup>R334X</sup> (Figure 5A,B in the main manuscript). Moreover, we also detected significantly more CD4 internalization mediated by X4-gp120 in cells co-expressing CD4 and CXCR4 than in those co-expressing CD4 and CXCR4<sup>R334X</sup> (Figure 6 in the main manuscript). Overall these latter data and those included in Author response images 5,6 and 7 indicate distinct conformations within each receptor complexes.

      (3) TIRF imaging limits analysis to the cell substrate interface. It would be useful to clarify whether CXCR4 receptor clustering occurs elsewhere, such as at immunological synapses or during cell-to-cell contact.

      In recent years, chemokine receptor oligomerization has gained significant research interest due to its role in modulating the ability of cells to sense chemoattractant gradients. This molecular organization is now recognized as a critical factor in governing directed cell migration (Martínez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022, Hauser et al. Immunity, 2016). In addition, advanced imaging techniques such as single-molecule and super-resolution microscopy have been used to investigate the spatial distribution and dynamic behaviour of CXCR4 within the immunological synapse in T cells (Felce et al. Front. Cell Dev. Biol., 2020). Building on these findings, we are currently conducting a project focused on characterizing CXCR4 clustering specifically within this specialized cellular region.

      (4) In LVP experiments, it would be useful to report transduction efficiency (% GFP+ cells) alongside MSI data to relate VLP infectivity with receptor clustering functionally.

      These experiments were designed to validate the functional integrity of the gp120 conformation on the LVPs, confirming their suitability for subsequent TIRF microscopy. Our objective was to establish a robust experimental tool rather than to perform a high-throughput quantification of transduction efficiency. It is for that reason that these experiments were included in new Supplementary Figure S6, which also contains the complete characterization of gp120-VLPs and LVPs. In such experimental conditions, quantifying the percentage of GFP-positive cells relative to the total number of cells plated in each well is very difficult. However, in line with the reviewer’s commentary and as we used the same number of cells in each experimental condition, we have included, in the revised manuscript, a complementary graph illustrating the GFP intensity (arbitrary units) detected in all the wells analyzed (new Supplementary Fig. 6E).

      (5) To ensure that differences in fusion events (Figure 7B) are attributable to target cell receptor properties, consider confirming that effector cells express similar levels of HIV-1 Env. Quantifying gp120 expression by flow cytometry or western blot would rule out the confounding effects of variable Env surface density.

      In these assays (Figure 7B), we used the same effector cells (cells expressing X4-gp120) in both experimental conditions, ensuring that any observed differences should be attributable solely to the target cells, either JKCD4X4 or JKCD4X4<sup>R334X</sup>. For this reason, in Figure 7A we included only the binding of X4-gp120 to the target cells which demonstrated similar levels of the receptors expressed by the cells.

      (6) HIV-mediated receptor downregulation may occur more slowly than ligand-induced internalization. Including a 24-hour time point would help assess whether gp120 induces delayed CD4 or CXCR4 loss beyond the early effects shown and to better capture potential delayed downregulation induced by gp120.

      The reviewer suggests using a 24-hour time point to facilitate detection of receptor internalization. However, such an extended incubation time may introduce some confounding factors, including receptor degradation, recycling and even de novo synthesis, which could affect the interpretation of the results. Under our experimental conditions, we observed that CXCL12 did not trigger CD4 internalization whereas X4-gp120 did. Interestingly, CD4 internalization depended on the coreceptor expressed by the cells.

      (7) Increase label font size in microscopy panels for improved readability.

      Of course; the font size of these panels has been increased in the revised version.

      (8) Consider adding more references on ligand-induced co-endocytosis of CD4 and chemokine receptors during HIV-1 entry.

      We have added more references to support this hypothesis (Toyoda et al. J. Virol., 2015; Venzke et al. J. Virol., 2006; Gobeil et al J. Virol., 2013).

      (9) For Statistical analysis. Biological replicates are adequate, and statistical tests are generally appropriate. For transparency, report n values, exact p-values, and the statistical test used in every figure legend and discussed in the results.

      Thank you for highlighting the importance of transparency in statistical reporting. We confirm that the n values for all experiments have been included in the figure legends. The statistical tests used for each analysis are also clearly indicated in the figure legends, and the interpretation of these results is discussed in detail in the Results section. Furthermore, the Methods section specifies the tests applied and the thresholds for significance, ensuring full transparency regarding our analytical approach.

      In accordance with established conventions in the field, we have utilized categorical significance indicators (e.g., n.s., *, **, ***) within our figures to enhance readability and focus on biological trends. This approach is widely adopted in high-impact literature to prevent visual clutter. However, to ensure full transparency and reproducibility, we have ensured that the underlying statistical tests and thresholds are clearly defined in the respective figure legends and Methods section.

      Reviewer #4:

      We thank the reviewer for considering that this work is presented in a clear fashion, and the main findings are properly highlighted, and for remarking that the paper is of interest to the retrovirology community and possibly to the broader virology community.

      We also agree on the interest that X4-gp120 clusters CXCR4<sup>R334X</sup> suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, an aspect that we are now evaluating. These data also indicate that WHIM patients can be infected by HIV-1 similarly to healthy people.

      (1) The observation that "empty VLPs" reduce CXCR4 diffusivity is potentially interesting. However, it is not supported by the data owing to insufficient controls. The authors correctly discuss the limitations of that observation in the Discussion section (lines 702-704). However, they overinterpret the observation in the Results section (lines 509-512), suggesting non-specific interactions between empty VLPs, CD4 and CXCR4. I suggest either removing the sentence from the Results section or replacing it with a sentence similar to the one in the Discussion section.

      In accordance with the reviewer`s suggestion, the sentence in the result section has been replaced with one similar to that found in the discussion section. In addition, we have performed Raster Image Correlation Spectroscopy (RICS) analysis using the Di-4-ANEPPDHQ lipid probe to assess membrane fluidity by means of membrane diffusion, and compared the results with those of cells treated with Env(-) VLPs. The results indicated that VLPs did not modulate membrane fluidity (Author response image 8). Nonetheless, these results do not rule out other potential non-specific interactions of the Env(-) VLPs with other components of the cell membrane that might affect receptor dynamics (see our response to point 2 of reviewer #3).

      (2) In the case of the WHIM mutant CXCR4-R334X, the addition of "empty VLPs" did not cause a significant change in the diffusivity of CXCR4-R334X (Figure 4B). This result is in contrast with the addition of empty VLPs to WT CXCR4. However, the authors neither mention nor comment on that result in the results section. Please mention the result in the paper and comment on it in relation to the addition of empty VLPs to WT CXCR4.

      We would remark that the main observation in these experiments should focus on the effect of gp120-VLPs, and the results indicates that gp120-VLPs promoted clustering of CXCR4 and of CXCR4<sup>R334X</sup> and reduced their diffusion at the cell membrane. The Env(- ) VLPs were included as a negative control in the experiments, to compare the data with those obtained using gp120VLPs. However, once we observed some residual effect of the Env(-) VLPs, we decided to give a potential explanation, formulated as a hypothesis, that the Env(-) VLPs modulated membrane fluidity. We have now performed a RICS analysis using Di-4-ANEPPDHQ as a lipid probe (Author response image 9). The results suggest that Env(-) VLPs do not modulate cell membrane fluidity, although we do not rule out other potential interactions with membrane proteins that might alter receptor dynamics. We appreciate the reviewer’s observation and agree that this result can be noted. However, since the main purpose of Figure 4B is to show that gp120-VLPs modulate the dynamics of CXCR4<sup>R334X</sup> rather than to remark that the Env(-) VLPs also have some effects, we consider that a detailed discussion of this specific aspect would detract from the central finding and may dilute the primary narrative of the study.

      Minor comments

      (1) It would be helpful for the reader to combine thematically or experimentally linked figures, e.g., Figures 3 and 4.

      (2) Figures 3 and 4 are very similar. Please unify the colours in them and the order of the panels (e.g. Figure 3 panel A shows diffusivity of CXCR4, while Figure 4 panel A shows MSI of CXCR4-R334X).

      While we considered consolidating Figures 3 and 4, we believe that maintaining them as separate entities enhances conceptual clarity. Since Figure 3 establishes the baseline dynamics for wildtype CXCR4 and Figure 4 details the distinct behavior of the CXCR4<sup>R334X</sup> mutant, keeping them separate allows the reader to fully appreciate the specificities of each system before making a cross-comparison.

      (3) Some parts of the Discussion section could be shortened, moved to the Introduction (e.g., lines 648651), or entirely removed (e.g., lines 633-635 about GPCRs).

      In accordance, the Discussion section has been reorganized and shortened to improve clarity.

      (4) I suggest renaming "empty VLPs" to "Env(−) VLPs" (or similar). The name empty VLPs can mislead the reader into thinking that these are empty vesicles.

      The term empty VLPs has been renamed to Env(−) VLPs throughout the manuscript to more accurately reflect their composition. Many thanks for this suggestion.

      (5) Line 492 - please rephrase "...lower expression of Env..." to "...lower expression of Env or its incorporation into the VLPs...".

      The sentence has been rephrased

      (6) Line 527 - The data on CXCL12 modulating CXCR4-R334X dynamics and clustering are not present in Figure 4 (or any other Figure). Please add them or rephrase the sentence with an appropriate reference. Make clear which results are yours.

      (7) Line 532 - Do the data in the paper really support a model in which CXCL12 binds to CXCR4R334X? If not, please rephrase with an appropriate reference.

      Previous studies support the association of CXCL12 with CXCR4<sup>R334X</sup> (Balabanian et al. Blood, 2005; Hernandez et al. Nat Genet., 2003; Busillo & Benovic Biochim. Biophys. Acta, 2007). In fact, this receptor has been characterized as a gain-of-function variant for this ligand (McDermott et al. J. Cell. Mol. Med., 2011). The revised manuscript now includes these bibliographic references to support this commentary. In any case, our previous data indicate that CXCL12 binding does not affect CXCR4<sup>R334X</sup> dynamics (García-Cuesta et al. PNAS, 2022).

      (8) Line 695 - "...lipid rafts during HIV-1 (missing word?) and their ability to..." During what?

      Many thanks for catching this mistake. The sentence now reads: “Although direct evidence for the internalization of CD4 and CXCR4 as complexes is lacking, their co-localization in lipid rafts during HIV-1 infection (97–99) and their ability to form heterocomplexes (22) strongly suggest they could be endocytosed together.”

    1. Author Response:

      We sincerely thank the reviewers for their insightful and constructive suggestions on our manuscript. We are encouraged by the positive recognition of our study’s conceptual significance, particularly the involvement of the mushroom body (MB) in nociceptive escape behavior and the utility of our ALTOMS behavioral platform.

      We fully agree with the reviewers’ assessments and have initiated several key revisions, additional experiments, and analytical refinements to strengthen the study.

      Below is a summary of our planned improvements:

      1. Experimental Revisions and Scope Expansion

      To address concerns regarding potential developmental compensation (Reviewers 1 and 2), we are performing new experiments using temporally precise manipulation tools to confirm the acute necessity of the identified circuits. Additionally, responding to Reviewer 3, we are conducting further behavioral assays to include necessary genetic controls (e.g., split-GAL4-only lines) and expanding our screen to cover all major MBON and DAN compartments using standardized lines to ensure a comprehensive functional map.

      2. Analytical Refinements and Methodological Transparency

      We are revising our quantitative and anatomical reporting to address several technical suggestions from all three reviewers. Specifically, we will implement a weighted “Behavioral Potency Level” that accounts for driver-specific expression intensity and specificity. Anatomical clarity will be enhanced by providing presynaptic expression patterns alongside trans-Tango signals and a neuron-centric data model for Figure 5. Furthermore, the Materials and Methods will be updated to explicitly detail habituation protocols, stimulation timing, sample sizes, while incorporating a more nuanced discussion on the limitations of the tracing systems.

      We believe these revisions will significantly enhance the rigor and clarity of our manuscript. We look forward to submitting the revised version upon completion of these supplementary tasks.

    1. Author response:

      eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

      We thank the eLife team for taking the time to consider and assess our manuscript. Please find below our provisional author responses accompanying the first version of the Reviewed Preprint.

      We would like to clarify an important error regarding the battery model reported in the manuscript. We mistakenly referred to the CP1254-A3 (1.8 g), whereas the battery used for all devices is the CP9440 A4X (0.8 g).

      Importantly, this correction reduces the total device weight by approximately 1 g compared to the value assumed by Reviewer #3. We believe this directly addresses the concern raised regarding battery weight in both the individual review and the overall eLife assessment.

      We will correct this error in the revised manuscript and clearly report the exact battery model and total device weight.

      For reference, the official VARTA CoinPower catalog is available here:

      https://www.varta-ag.com/fileadmin/varta/industry/downloads/products/lithium-ion-cells/VARTA_CoinPower_EN_digital_221124_A5_6p.pdf

      The battery used in BlueBerry is listed on the last line of page 2.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      We thank the reviewer for the statement.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

      We thank the reviewer for these valuable comments. We did consider implementing programmable control of stimulation power, for example using a digital potentiometer. However, in our current design this approach was not sufficient because the output current supported by typical digital potentiometers is too low for the high-power LEDs used in our system. For this reason, we did not include programmable stimulation current in the present version. We agree that this is a limitation and that further work is needed to identify a suitable solution for adjustable stimulation power, which we plan to pursue in future versions of the device. We will revise the manuscript to make this limitation and future direction clearer.

      We also agree that the use of a battery-powered wireless system introduces an important trade-off. We will revise the manuscript to discuss this limitation more explicitly.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      We appreciate the reviewer’s enthusiasm. Thank you.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      We thank the reviewer for this important suggestion. We agree that this is a missing characterization in the current manuscript. In the revised version, we will include a more detailed analysis of the system’s power budget, including the maximum stimulation power supported by the BlueBerry device, the corresponding output currents, and the contribution of the main integrated circuits to overall current consumption.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      In the revised manuscript, we will clarify more explicitly that latency is a key limitation of the current system. We will also further investigate the source of this latency, including whether it can be reduced through additional configuration changes. In addition, we will include comparative latency measurements using different Arduino modules as the central BLE controller for the BlueHub device.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      We will investigate this matter and provide a simpler install and configuration script to setup both the BlueHub and Blueberry systems.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      Due to the current structure of the ArduinoBLE library used in BlueHub devices, each BlueHub unit can support active communication with up to maximum 3 BlueBerry units. We thank the reviewer for highlighting this point and in the next version of the paper we will clarify this point.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

      In the revised submission, we will deposit all code used to program both the BlueHub and BlueBerry devices, together with the Gerber files required for PCB fabrication, alongside the paper.

      Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      We thank the reviewer for the statement.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      As correctly mentioned, in the revised manuscript we will move the necessary quantifications from supplementary section to main section.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      We will clarify this point more clearly in the revised manuscript.

      (3) System range (3 meters reported) is limited for a BLE device.

      The system range reported is the range considered as reliable communication range. In the revised manuscript we quantify this problem by reporting the Received Signal Strength (RSS) value for multiple BlueBerry devices across varying distances.  

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      That is indeed a limitation of our system, we will investigate the feasibility of integrating programmable stimulation protocols in the updated version of BlueBerry device.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      We thank the reviewer for this comment. In the revised manuscript, we will clarify that the thermal measurements reported here apply only to the specific superficial implant geometry and stimulation conditions used in this study. Because tissue heating depends strongly on implant design and on parameters such as optical power, pulse width, and stimulation frequency, a general safety statement cannot be made for all possible implant configurations. Since the primary goal of this work is to present the wireless device platform rather than to validate a particular implant design, thermal safety should be evaluated individually for each implant and stimulation paradigm.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

      In the revised manuscript, we will include a comparison table summarizing our system alongside currently available wireless optogenetic devices.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male or female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. The experiments are heroic, yet do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past databases using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found little overlap of their sequences with these annotated sequences (depending on the individual, ranged from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a generalizable finding in the human population.

      Strengths:

      It is a novel dataset that attempts to understand sex differences in the T cell repertoire in humans. Overall, the methodologies are sound and are the current state-of-the-art. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females. This is an important negative result.

      Weaknesses:

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. This reviewer recognizes the difficulty in obtaining samples for this experiment (which were from deceased donors), and this limitation was appropriately discussed. Their analysis was limited by the current availability of other TCR sequences. These weaknesses were appropriately discussed and considered.

      We thank this reviewer for his appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      We disagree with the reviewer’s assertion that our findings create a conceptual inconsistency.

      Autoimmune diseases are multifactorial conditions in which multiple biological layers, including thymic selection, peripheral immune regulation, hormonal effects, environmental exposures, and tissue-specific vulnerability, contribute to disease incidence. These layers may influence sex ratios in different directions. Therefore, observing a higher frequency of TCRs annotated as T1D-associated in females does not imply that T1D incidence must also be higher in females.

      Actually, T1D incidence itself is not uniformly male-biased worldwide. Epidemiological analyses (reviewed in Qu and Hakonarson, Diabetes Obes Metab 2025) show that male predominance is mainly observed in high-incidence Northern European populations, whereas in several lowerincidence regions, including parts of East Asia and Africa, the sex ratio is balanced or even femalebiased. Furthermore, another recent study highlights that T1D incidence and prevalence in women and men varies depending on the study period (PMC12544016).

      This heterogeneity indicates that disease incidence reflects context-dependent interactions between genetic load, environmental exposures, and sex-specific biological modifiers. Moreover, biological sex acts as a dynamic modifier of genetic risk and immune function in T1D, influencing central tolerance, peripheral immune activation, and β-cell intrinsic resilience (reviewed in Qu and Hakonarson, 2025). Experimental models further demonstrate estrogenmediated protection of pancreatic β-cells (Kim et al., Biochem Biophys Res Commun 2025), indicating that disease incidence reflects the integration of immune, hormonal, and tissuespecific layers rather than central autoreactive TCR release alone. Sex hormones may exert distinct and sometimes opposing effects on thymic selection and on target-organ vulnerability, while environmental factors such as vitamin D status, infections, and microbiota composition further shape disease expression.

      Importantly, our study does not claim causality, nor does it aim to predict the epidemiology of any specific autoimmune disease. Our conclusions are limited to the observation that sexdependent differences exist in thymic TCR selection.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We agree with the reviewer’s comment. As already stated in the previous revision and the "Data Availability" section of the manuscript, all raw sequencing data have been deposited and are publicly available on NCBI (BioProject PRJNA1379632): https://www.ncbi.nlm.nih.gov/sra/PRJNA1379632.

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.13652796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi: 10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      We acknowledge that currently available antigen-annotated TCR databases remain limited. This reflects the considerable experimental difficulty of defining TCRs’ antigen specificities and is a widely recognized limitation in the field.

      In the curated database used here, the autoimmune-associated entries correspond primarily to type 1 diabetes (T1D) and celiac disease (CeD), two autoimmune contexts for which antigen-specific TCRs have been experimentally characterized. However, focusing on the number of antigens alone does not accurately reflect the breadth of the dataset.

      Specifically, our analysis is based on 48 epitopes and nearly 200 annotated TRB sequences, providing substantially broader antigenic representation than suggested by antigen count alone.

      Author response table 1.

      Importantly, our analytical framework does not attempt to interpret each epitope specificity individually. Instead, we examine whether TCRs annotated as autoimmune-associated are differentially represented between sexes at the level of thymic selection.

      In our dataset we observe a stronger CD8⁺ thymic selection of TCRs annotated as autoimmune- associated in females. We interpret this as evidence that central tolerance mechanisms may contribute to sex-dependent differences in autoreactive repertoire composition, rather than as a determinant of any specific autoimmune disease pathophysiology.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      For multiple sclerosis, the only antigen present in the database is myelin basic protein (MBP). In our thymic repertoire dataset, we could not detect any CDR3 sequence matching MPB annotated CDR3s from the database.

      For rheumatoid arthritis, the database contains only a small number of TRA sequences without corresponding TRB chains. Because our specificity analysis is based on TRBs, these entries could not be used in our analyses.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia(doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.

      In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      We agree with the reviewer that the concept of TCR polyspecificity is complex, controversial and not uniformly defined in the literature.

      For some, polyspecificity refers to the ability of individual TCRs to recognize multiple related peptides sharing structural motifs, as described by Mason, Sewell, Garcia, and others. With this definition, we agree that many/most TCRs exhibit some degree of cross-reactivity and would thus be defined as polyspecific.

      In contrast, our definition of polyspecificity came from our observation arising from large-scale repertoire analyses that certain CDR3 sequences are repeatedly annotated across databases as recognizing distinct and unrelated antigenic categories. In our previous study (Quiniou et al.), we showed that these sequences display specific biochemical and repertoire features and may represent a particular class of TCRs involved in early or heterologous immune responses. A classic cross reactivity based on structural motif sharing could not explain these results.

      We believe that the existence of such TCRs, rather than classic cross-reactive TCRs, has the potential to better explain why patients with extremely reduced TCR repertoires (around 3000 TCRs only) can respond well to various infectious challenges (https://doi.org/10.1073/pnas.97.1.274) or why there are T cells with memory phenotypes against viruses not previously encountered (https://pmc.ncbi.nlm.nih.gov/articles/PMC3626102/ ). We acknowledge that direct experimental validation of the function of such TCRs is currently limited; further work will help clarify the notion of polyspecificity, and hopefully to better understand the overlooked “heterologous immunity”.

      Of note, a recent paper in Nature Machine Intelligence (https://doi.org/10.1038/s42256-02501096-6) described the in-silico generation of antigen-specific TCRs. Using our definition of polyspecificity (TCRs with higher generation probabilities, specific V/J gene preferences, shared CDR3s across individuals, and reactivity to multiple unrelated peptides), they showed that “multitask models preferentially sample polyspecific CDR3β sequences”. Therefore, we consider the debate on polyspecificity to be ongoing, and our discussion of polyspecificity in this paper to be part of this debate.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      In the original manuscript, the comparison to the pooled database was intended as a descriptive assessment rather than as a biological enrichment analysis. Differences between an experimental thymic repertoire and a curated reference database are expected, given the structure and annotation biases inherent to the reference resource.

      The purpose of Supplementary Figures 15B and 15C was therefore twofold: (i) to provide a descriptive overview of how specificity categories are distributed in our thymic dataset relative to the curated database, and (ii) to evaluate whether deviations from database proportions were of similar magnitude in males and females, ensuring that database composition did not differentially bias one sex over the other. In addition, the donor-resolved representations demonstrate that these patterns are consistent across individuals and are not driven by a single donor.

      To avoid any potential misinterpretation, we have revised the manuscript to remove references to “enrichment” relative to database composition and eliminated quantitative comparisons to baseline database frequencies. The corresponding text and figure legends have been clarified to indicate that these analyses are descriptive and methodological in nature, while all biological interpretations rely exclusively on direct sex-specific comparisons within the thymic dataset.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

      The use of “comparable” in comparing TCR repertoire diversity is indeed “soft”, and aimed to indicate that there are no obvious dissimilarities.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Available HLA typing data for selected donors should be included as a table in the manuscript.

      The available low-resolution HLA typing data for the donors included in this study have been compiled and added as Supplementary Table 1 in the revised manuscript.

      (2) The authors' explanation for why external validation of gene usage biases was not possible should be concisely incorporated into the Discussion.

      We have incorporated a concise explanation in the Discussion clarifying why independent validation of the TRBV6-5 bias in external thymic datasets is currently not feasible, due to the absence of publicly available cohorts combining sorted thymic subsets, balanced sex representation, and sufficient sequencing depth.

      (3) The clarification that considered sex-specific motifs are public should be included explicitly in the main text, not only figure legend and methods.

      We now explicitly state in the main Results section that only public motifs, defined as motifs containing CDR3 sequences shared by at least two individuals, were retained in the analysis.

      (4) The statement "Thymocytes expressing TCRs with insufficient or excessive avidity are eliminated (negative selection)" is strictly speaking incorrect. Thymocytes with insufficient avidity are eliminated by death by neglect during positive selection.

      We thank the reviewer for pointing out this imprecision. The statement has been corrected.

      (5) Figure 8C is unclear - what does "80% of unique polyspecific TCRs" mean? In any case, I strongly recommend removal of all polyspecificity-related analyses.

      We apologize for the lack of clarity in the axis label of Figure 8C. To clarify, this analysis represents the proportion of polyspecific CDR3aa sequences among all sequences with an assigned specificity within an individual’s repertoire. Specifically, it measures how many unique TCR sequences, previously identified as having a known specificity in reference databases, are also categorized as polyspecific.

      To address the reviewer’s concern, we have updated the Y-axis label of Figure 8C to: "Proportion of polyspecific CDR3aa among antigen-specific sequences (%)".

      (6) "However, no significant sex-based differences were found in the usage of hydrophobic, hydrophilic, or neutral aa at the critical p109 and p110 positions in TRB" - this Discussion statement is inconsistent with the new analysis on Fig. 4C.

      We regret that the Discussion still contained wording from a previous version of the analysis. The text has now been corrected to reflect the updated results showing a significant increase in hydrophobic amino acid usage at positions p109/p110.

      (7) In the Discussion the authors write: "the absence of age-related clustering in repertoire features (data not shown)". What is the reasoning for not showing the data?

      We understand the reviewer's point. This exploratory clustering analysis was performed on the data presented in the heatmaps (Figure 2B and Supplemental Figures 10-13). However, as it revealed no distinct patterns or clustering based on the donors' age (with samples from different age groups being interspersed throughout the clusters), we chose not to add an extra layer of annotation to Figure 2B to maintain clarity.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers and editors for the second round of peer review. Following the editorial assessment and specific review comments, we now present new results to compare EDS and IDS behavior, and use conventional standard for reporting statistics. We also request to simplify the manuscript title to be ‘Locus coeruleus modulation of prefrontal dynamics during attentional switching in mice’.

      Public Reviews:

      Reviewer #1 (Public review):

      In their response to reviewers, the authors say "We report p values using 2 decimal points and standard language as suggested by this reviewer". However, no changes were made in the manuscript: for example, "P = 4.2e-3" rather than "p = 0.004".

      We apologize for this misunderstanding. We initially interpreted this comment as reporting two non-zero digits in p values. We now have corrected this in the revision. We also follow the editorial recommendation and use a standard convention to report statistics (e.g., p = 0.03, t(7) = -2.8).

      In their response to the reviewers, they wrote: "Upon closer examination of the behavioral data, we exclude several sessions where more trials were taken in IDS than in EDS." If those sessions in which EDSIDS. Most problematic is the fact that the manuscript now reads "Importantly, control mice (pooled from Fig. 1e, 1h, Supp. Fig. 1a, 1b) took more trials to complete EDS than IDS (Trials to criterion: IDS vs. EDS, 10 {plus minus} 1 trials vs. 16 {plus minus} 1 trials, P < 1e-3, Supp. Fig. 1c), further supporting the validity of attentional switching (as in Fig. 1c)" without mentioning that data has been excluded.

      Editor raised a similar concern. We apologize for this oversight, which was due to miscommunication within the lab. We have now revised the manuscript to include all data points without any exclusion in Fig. 1e, 1h, and Supp. Fig. 1a-c. By pooling all data without any exclusion, control mice readily took more trials to complete EDS than IDS, supporting the validity of attentional switching (Trials to criterion: IDS vs. EDS, 11 ± 1 trials vs. 15 ± 1 trials, p = 0.006, Supp. Fig. 1c).

      The exclusion we initially meant to perform was to exclude sessions where task performance in IDS was beyond 95% threshold inferred from the naïve control group (15 trials, Fig. 1c). Exclusions are now explicitly described. Of note, including or excluding these sessions does not change any of the conclusions presented in our manuscript. We have added this analysis in Supp. Fig. 1d and the results remain robust (Supp. Fig. 1d). This panel could be removed if deemed unnecessary by the reviewers.

      Reviewer #3 (Public review):

      The authors overall do a nice job of addressing reviewer comments, and I believe the manuscript is significantly improved. Congratulations!

      We thank you for this positive assessment.

      Weaknesses are mostly minor, but there are some caveats that should be considered. First, the authors use a DBH-Cre mouse line and provide histological confirmation of overlap between HM4Di expression and TH immunostaining. While this strongly suggests modulation of noradrenergic circuit activity, the results should be interpreted conservatively as there is no independent confirmation that norepinephrine (NE) release is suppressed and these neurons are known to release other neurotransmitters and signaling peptides. In the absence of additional control experiments, it is important to recognize that effects on mPFC activity may or may not be directly due to LC-mPFC NE.

      We agree with this comment, and now further discuss this limitation in Discussion, line 255-259:

      “However, it is important to note that LC-NE neurons can co-release other neurotransmitters, such as dopamine and neuropeptides[73,75,76]. In the absence of further control experiments to confirm the suppression of NE release, the observed effects on mPFC may or may not be directly due to NE. Future studies are needed to better delineate the involvement of specific neurotransmitters, cell types and receptors in flexible decision making.”

      Another caveat is that the imaging analyses are entirely from the extradimensional shift session. Without analyzing activity data from the intradimensional shift (IDS) session, one cannot be certain that the observed changes are to some feature of activity that is specific to extradimensional shifts. Future experiments should examine animals with LC suppression during the IDS as well, which would show whether the observed effects are specific to an extradimensional shift and might explain behavioral effects.

      We also agree with this comment, and have thought about this. Technically, IDS has low trial numbers, especially incorrect trials, limiting the power of statistical comparisons. Conceptually, since in our paradigm EDS is always the last stage, comparing neural signals in EDS with previous stages may be confounded by the order of learning. That is, whether the observed differences in mPFC activity were due to mPFC responding to different rules, or due to mPFC responses over time/learning. We now discuss this point in Discussion, line 291-295:

      “Another limitation in the current study is that neurophysiological analyses were entirely from EDS. Without comparing with other task stages (e.g., REV, IDS), it is uncertain to what extent the observed neuronal changes are specific to EDS. Future experiments should examine the behavioral and neurophysiological effects with LC inhibition to determine the specificity of LC-NE modulation of the mPFC during attentional switching.”

      We are also actively collecting additional data to address this point, which requires considerable efforts. We hope to report our findings in a follow up study.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      We appreciate the reviewer’s positive evaluation and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      We thank Reviewer #2 for the careful statistical perspective and focus on assignment strategy and uncertainty. Importantly, we emphasize that Neuroplex is presented as a methodological proof-of-principle, not as a definitive quantification of projection convergence.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Weaknesses:

      (1) Single-label assignment in the main analyses: When multiple fluorophores exceed threshold for a neuron/ROI, the workflow applies a winner-take-all rule and assigns a single label (the fluorophore with the largest standardized beta), while additional above-threshold fluorophores are retained only as "secondary hits." This is a reasonable specificity-first choice, but because cortical excitatory neurons can collateralize, collapsing dual-threshold ROIs to one identity may under-represent dual-projecting cells and could bias estimated subtype fractions and behavioural comparisons.

      We thank the reviewer for raising this important conceptual point.

      We agree that cortical excitatory neurons frequently collateralize and therefore may legitimately express more than one retrograde fluorophore. Our use of a winner-take-all (WTA) rule in the primary analyses was an intentionally conservative methodological choice designed to prioritize specificity over sensitivity in this proof-of-principle study.

      As demonstrated in our simulations (Supp. Fig. 5–6), under realistic background and noise conditions, secondary assignments are more susceptible to false-positive errors than primary assignments. For this reason, we chose to assign a single primary identity for quantitative behavioral stratification while retaining additional above-threshold fluorophores as “secondary hits” and reporting their distribution separately (Supp. Fig. 7).

      We did not intend to imply that projections are exclusive. Rather, the WTA strategy provides a conservative lower-bound estimate of subtype proportions and avoids inflation of dual-label rates under conditions where spectral separability is imperfect.

      We agree that this rationale should be stated more explicitly in the manuscript, and that the potential impact of assignment strategy on subtype fractions and behavioral comparisons should be acknowledged clearly as a methodological trade-off rather than a biological claim.

      Importantly, the biological analyses presented in this manuscript are illustrative demonstrations of functional stratification capability and do not depend on exclusivity of projection identity. We have revised the manuscript to clarify this framing as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using a winner-take-all rule. We emphasize that this assignment strategy does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.” (Result, Fluorophore distribution in behaviorally relevant ROIs)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications. ” (Results, Neuronal Cell Type and Behavior)

      “Cortical pyramidal neurons frequently collateralize to multiple downstream targets, and accordingly some ROIs exceeded threshold for more than one fluorophore. In this proof-of-principle implementation, we adopted a specificity-first winner-take-all assignment rule for primary analyses to minimize false-positive multi-label calls under realistic noise conditions. This strategy likely underestimates the true prevalence of dual-projecting neurons and should therefore be interpreted as a conservative stratification approach rather than a statement of projection exclusivity.” (Discussion)

      (2) Dual-label detection is acknowledged but remains descriptive in vivo: the manuscript explicitly discusses the possibility of dual projection, evaluates dual-fluorophore detection in simulations (including performance under realistic noise/background), and reports in vivo rates of secondary hits. However, these dual-threshold events are not incorporated as co-identities in the main statistical analyses, making it difficult to judge how robust the principal biological conclusions are to the single-label decision rule.

      We thank the reviewer for this important clarification request.

      We agree that dual-projection neurons are biologically plausible and that dual-threshold ROIs were detected in vivo. In this manuscript, however, our primary goal was to establish the feasibility of high-dimensional spectral assignment and projection-resolved stratification, rather than to provide a definitive quantification of projection convergence.

      For this proof-of-principle study, we chose a conservative winner-take-all (WTA) framework for primary behavioral analyses in order to minimize false-positive multi-label assignments under realistic noise and background conditions, as demonstrated in our simulations (Supp. Fig. 5–6). Secondary hits were retained and reported descriptively (Supp. Fig. 7), but not incorporated into the primary statistical comparisons to avoid overinterpretation of potentially ambiguous dual-label calls.

      Importantly, the principal biological conclusions presented in the manuscript are qualitative demonstrations that projection-defined stratification is feasible within a single animal. These conclusions do not rely on projection exclusivity or on precise quantification of dual-projecting fractions.

      We agree that this distinction should be made clearer in the manuscript, and we have revised the text as follows:

      “Although dual-threshold ROIs were detected in vivo, these secondary assignments were not incorporated as co-identities in the primary behavioral analyses. This decision reflects a conservative specificity-first framework designed to minimize false-positive multi-label calls under realistic noise conditions. Accordingly, dual-label rates reported here should be interpreted descriptively. The present study focuses on demonstrating the feasibility of projection-resolved stratification, rather than providing definitive quantification of projection convergence.” (Results, Fluorophore distribution in behaviorally relevant ROIs)

      “We then stratified these neurons by projection target and examined behaviorally selective activity across cell types. These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Behavioral Analysis)

      (3) Uncertainty is not propagated: False-positive/false-negative rates from simulations and uncertainty from registration/segmentation are not carried forward into quantitative confidence bounds on subtype proportions or behaviour-by-subtype effects.

      We agree that formal propagation of classification and registration uncertainty into subtype proportions and behavioral comparisons would be appropriate in a study primarily focused on precise anatomical quantification. However, the central goal of the present manuscript is methodological and to demonstrate that high-dimensional spectral identity can be reliably linked to miniscope-recorded functional activity within a single animal.

      We have shown that simulations under realistic noise, background, and class imbalance conditions (Supp. Fig 5-6) show that errors are predominantly false negatives rather than false positives. However, behavioral analyses are presented as qualitative demonstrations of the feasibility of projection-resolved stratification rather than as definitive quantitative anatomical measurements.

      In the revised manuscript, we clarified that 1) subtype proportions and behavioral effects are assignment-dependent estimates, 2) simulation-derived error rates provide guidance for experimental design rather than formal confidence intervals, and 3) future studies centered on precise quantification of projection fractions would benefit from formal uncertainty modeling, as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “Because classification uncertainty was not formally propagated into these analyses, behavior-by-subtype comparisons should be interpreted as qualitative demonstrations of functional stratification rather than precise quantitative estimates.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      We thank the reviewer for highlighting this important conceptual assumption.

      Fluorophore identity in Neuroplex is genetically encoded via AAVretro delivery and therefore does not depend on transient physiological state. Spectral imaging is performed in vivo through the same GRIN lens and field of view used during behavioral imaging, and co-registration relies on anatomical landmarks. While repeat spectral imaging was not formally performed as a longitudinal experiment, the underlying fluorescent protein expression is stable over weeks, and there is no biological mechanism in this paradigm that would alter fluorophore identity across sessions.

      We revised the manuscript to explicitly state this assumption and clarify why identity stability is expected as follows:

      “…fluorophore signals and reduce unmixing fidelity, leading to an increased false positive rate. Fluorophore identity in this framework is genetically encoded via retrograde AAV delivery and is therefore expected to remain stable across behavioral and spectral imaging sessions. Because both functional and spectral data are acquired in vivo through the same GRIN lens and co-registered using anatomical landmarks, assignment stability is not expected to vary across time unless expression levels change substantially. While repeat spectral imaging was not performed as a formal longitudinal experiment in this study, the stability of fluorescent protein expression supports the assumption that fluorophore identity reflects a persistent cellular attribute.” (Discussion)

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      We agree that the dual-pass thresholding approach is heuristic rather than fully probabilistic. More formal probabilistic classifiers are possible but would introduce additional modeling assumptions and training requirements beyond the scope of this proof-of-principle study.

      We revised our manuscript to clarify this as follows:

      “The current classification framework relies on linear unmixing followed by empirically defined thresholding rather than full probabilistic inference. This approach provides transparency and practical robustness under realistic noise and background conditions but does not generate single-ROI posterior uncertainty estimates. ” (Discussion)

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      We agree that spectral separability is conceptually important. In this manuscript, separability is demonstrated empirically through 1) In vitro fingerprint acquisition under identical optical conditions, 2) simulation under background and noise, and 3) successful in vivo classification across regimes. We did not compute formal matrix conditioning metrics, but we agree that the separability rationale should be described more explicitly. We revised our manuscript as:

      “While formal conditioning metrics were not explicitly computed empirical fingerprint acquisition and simulation-based perturbation analyses demonstrate sufficient spectral independence for reliable linear unmixing under the tested regimes.” (Discussion)

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      We agree that ROI definition influences spectral extraction. Spectral fingerprints are derived by averaging all pixels within the ROI mask, and therefore neuropil contamination, partial ROI overlap, and dense labeling could influence beta estimates. In the revised manuscript, we have acknowledged this dependencies more explicitly.

      “Spectral unmixing operates on CNMF-derived ROI masks treated as fixed supports. Accordingly, segmentation quality, neuropil contamination, and partial overlap between neighboring cells can influence extracted spectral fingerprints and may contribute to false negatives or secondary assignments, particularly in densely labeled regions. These structured sources of uncertainty are expected to have the greatest impact under regimes of extreme class imbalance, low fluorophore brightness, strong neuropil signal, or pairing of spectrally overlapping reporters. Use of refined segmentation strategies or nuclear-localized reporters could reduce such structured uncertainty in future implementations.” (Discussion)

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      We agree and will delineate interpretability boundaries explicitly.

      “Dual-label assignments are most reliable when fluorophores are spectrally well separated and when signal-to-noise ratios are high. In contrast, spectrally adjacent fluorophore pairs or densely labeled regimes increase ambiguity and false-positive risk. Experimental design should therefore prioritize pairing spectrally distant fluorophores when projection convergence is of primary interest.” (Discussion)

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

      “More broadly, Neuroplex is expected to perform most robustly in regimes characterized by moderate projection convergence, balanced fluorophore representation, bright and spectrally distinct reporters, and adequate signal-to-noise ratio. Imaging directly within a projection target that has received dense retrograde labeling may introduce substantial class imbalance, which simulations predict will reduce detection sensitivity for the dominant fluorophore. In such cases, conservative assignment strategies, reduced spectral complexity, or refinement of ROI definition may improve interpretability. Careful fluorophore selection and pilot validation under intended imaging conditions are therefore recommended prior to large-scale application. Future implementations incorporating nuclear-localized reporters may further reduce segmentation-dependent ambiguity by constraining spectral signals to somatic compartments.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address a few points that are not clear.

      (1) At the end of the Results, the authors assess their approach using only four fluorophores and conclude that Neuroplex works "even" under reduced complexity. There is something I am missing. In my mind, lower complexity should be easier and should work better. As a researcher, I would first assess a four-fluorophores scenario and then step up with complexity, but the authors did the opposite. Also, I think that the present Supplementary Figure 9 should be in the main text; I don't understand why the authors decided to relegate a clear result to the bottom of everything. The authors should give some explanations.

      We agree that reduced spectral complexity should, in principle, improve separability and classification performance. Our original presentation order was intended to first demonstrate feasibility under the most challenging condition (nine fluorophores plus GCaMP), thereby establishing maximal multiplexing capacity. The reduced-complexity experiment was included to demonstrate scalability and generalizability under more typical experimental regimes. However, we agree that this rationale was not sufficiently clear and that the reduced-complexity results merit presentation in the main text.

      Accordingly:

      We have moved former Supplementary Figure 9 into the main Results (Fig. 6).

      We have clarified explicitly why the nine-fluorophore condition was presented first as follows:

      “To evaluate the performance of Neuroplex under more typical experimental regimes with reduced-complexity, we applied the pipeline to two GCaMP transgenic animals injected with a subset of four fluorophores.”

      (2) The question of relative expression is crucial. Among the infected regions, there is the contralateral mPFC and I imagine that if they image there, the contribution of the expressed protein might dominate all other components, preventing detection of other fluorophores, including GCaMP. But is it the case, or would it be possible to detect projecting neurons in that region? I would be surprised that the authors never tried it; this test would simply imply mounting the GRID lens on the other hemisphere.

      This is an important conceptual point.

      Our simulations (Supp. Fig. 5) explicitly model over-representation of a single fluorophore. These results show that heavy class imbalance primarily increases false negatives (due to baseline normalization) rather than false positives.

      In the revised manuiscript, we discussed this limitation more explicitly.

      “Relative fluorophore representation within the imaged field of view influences classification robustness. As demonstrated in our simulations of class imbalance (Supp. Fig. 5g–h), extreme over-representation of a single fluorophore primarily increases false-negative rates due to baseline normalization effects. In the present study, we intentionally avoided imaging directly within heavily infected projection targets (e.g., contralateral mPFC) in order to maintain moderate fluorophore representation across ROIs. Imaging in a densely labeled region would represent a more challenging regime, and we would expect reduced sensitivity for the dominant fluorophore under such conditions.” (Dicussion)

      (3) The possibility to utilise Neuroplex goes beyond the type of experiment presented as proof-of-concept in this technical paper. In the Discussion, the authors mention genetically defined subtypes and activity-tagged neurons. But, if one changes the pipeline, can it be used by expressing GECIs with different spectra, or GECIs and genetically-encoded voltage indicators (GEVIs)? I would be very interested in knowing what the authors think about this putative "shortcut".

      We thank the reviewer for this forward-looking and insightful question.

      In principle, the Neuroplex framework could be extended to incorporate spectrally distinct genetically encoded functional indicators, including multi-color GECIs or combinations of GECIs and GEVIs. However, it is important to distinguish this from the identity-assignment strategy implemented in the present study.

      Simultaneous multi-color functional imaging under a head-mounted miniscope is optically more demanding than assigning cell identity from single-color functional recordings followed by high-dimensional spectral readout. Multi-color GECI or GEVI imaging requires real-time excitation and emission separation during dynamic recording, increases optical complexity, and is particularly sensitive to chromatic aberration, photon efficiency, and signal-to-noise constraints imposed by GRIN lenses.

      In contrast, Neuroplex decouples functional acquisition from spectral identity determination. Functional activity is recorded using a single optimized channel, while spectral separation is performed separately under controlled confocal conditions with multiplexed excitation and emission sampling. This design substantially reduces optical burden during behavioral imaging.

      While integration of multiple functional reporters is conceptually feasible within this framework, successful implementation would require careful validation of brightness, spectral separability, and temporal stability for each reporter combination.

      Reviewer #2 (Recommendations for the authors):

      (1) Implement a principled multi-label calling mode for cells with >1 above-threshold fluorophore (e.g., per-fluorophore FDR control or Bayesian posteriors). Report cell-wise weights and re-run key results three ways: single-label, hard multi-label, and soft (probabilistic) assignments; state explicitly how conclusions change.

      We appreciate this suggestion and agree that multi-label or probabilistic calling frameworks are well motivated, particularly for studies in which projection convergence is the central biological question. In the current manuscript, however, our goal is to establish a practically deployable proof-of-principle pipeline for linking miniscope functional recordings to a high-dimensional spectral-identity readout. Consistent with this scope, we used a conservative winner-take-all (WTA) strategy for primary analyses to prioritize specificity under realistic noise and background conditions, and we treated multi-hit events descriptively. Importantly, the qualitative conclusions regarding projection-resolved functional stratification are unchanged when secondary-hit distributions are examined.

      In the revised manuscript, we explicitly stated that: (i) single-label assignment is a conservative analysis choice rather than a biological claim of exclusivity, and (ii) multi-label or probabilistic calling is a natural extension for future work, as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (2) Add ground truth for dual projectors in a subset (paired orthogonal tracers or staged injections) and provide a confusion matrix including dual-positives; use this to calibrate thresholds/priors.

      We agree that ground truth validation of dual projectors using orthogonal tracers or staged injections would be valuable, particularly for calibrating priors and enabling confusion-matrix-based evaluation. However, these experiments require additional cohorts and experimental design beyond the scope of the current proof-of-principle technical manuscript. Our goal here is to demonstrate the feasibility of multiplexed identification and projection-resolved stratification within a single animal, not to provide definitive anatomical quantification of collateralization.

      We have revised the manuscript to clearly state that dual-label in vivo observations are descriptive and that studies aimed at quantitative convergence mapping should incorporate orthogonal ground truth validation.

      “Accurate quantification of projection convergence would benefit from orthogonal ground-truth validation (e.g., paired tracers or staged injections) to establish confusion matrices for dual positives and to calibrate thresholds or priors.”

      (3) Propagate uncertainty from simulations and registration/segmentation to subtype fractions and behavior effects (error bars or sensitivity analyses).

      We agree that formal uncertainty propagation is appropriate for studies focused on precisely quantifying subtype proportions or effect sizes. In this manuscript, subtype fractions and behavioral comparisons are presented primarily as demonstrations of the feasibility of projection-resolved functional stratification, rather than definitive anatomical measurements. Simulation analyses are included to characterize expected performance under defined noise and background regimes, but we did not propagate these uncertainties into downstream confidence bounds in this proof-of-principle work.

      We have revised the manuscript to clarify this explicitly as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (4) Mitigate sources of spurious multi-hits (neuropil handling, ROI mask erosion, nuclear-localized reporters, spectral basis choices) and quantify their impact on dual-label recovery.

      We agree that neuropil contamination, ROI boundary choices, and spectral basis selection can influence multi-hit rates. In the current manuscript, we already implement background subtraction and evaluate multi-hit behavior through simulations under realistic background and noise regimes. Quantitative evaluation of additional mitigation strategies (e.g., ROI erosion comparisons) would require new analyses beyond the current scope.

      We have revised the Discussion to include concrete best-practice recommendations (e.g., fluorophore pairing, conservative interpretation of multi-hits, and potential use of nuclear-localized reporters).

      “Multi-hit events can reflect true biological collateralization but may also arise from structured sources of ambiguity such as neuropil contamination, partial ROI overlap, or imperfect ROI boundaries. These factors may bias spectral estimates and contribute to secondary assignments, particularly in densely labeled regions. Practical mitigation strategies include conservative assignment rules, improved segmentation, and use of nuclear-localized reporters to reduce neuropil contribution. ”

      (5) Clarify claims in the main text/figures wherever exclusivity is implied; label which panels use single-label vs multi-label/soft assignments.

      We agree and thank the reviewer for emphasizing clarity. We did not intend to imply projection exclusivity. We have revised the manuscript text and figure legends to explicitly state where single-label (winner-take-all) assignment is used, and to avoid language that could be read as claiming exclusive projection identity as follows:

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using conservative winner-take-all rule. This assignment reflects the strongest spectral contribution and does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study addresses a critical and timely question regarding the role of a subpopulation of cortical interneurons (Chrna2-expressing Martinotti cells) in motor learning and cortical dynamics. However, while some of the behavior and imaging data are impressive, the small sample sizes and incomplete behavioral and activity analyses make interpretation difficult; therefore, they are insufficient to support the central conclusions. The study may be of interest to neuroscientists studying cortical neural circuits, motor learning, and motor control.

      We thank the reviewers and the editors for the insightful comments. We are pleased to report that the raised issues with the manuscript can be addressed by improving clarity in our writing of specific sections and by providing additional analysis. Specifically, it was not clear in the manuscript text that although we show illustrative data with a lower number of animals, our conclusions are supported by data with a larger and sufficient sample size. Also, the description of our control experiments has been improved to clarify our proper treatment controls. We therefore clarify below that our study presents compelling and sufficient evidence to support our conclusions. We have responded to all the comments, explaining how each concern has been addressed. All line and figure numbers mentioned here refer to the numbering of the reviewed manuscript version. All references are cited as DOIs.

      Reviewer #1 (Public review):

      There are many major issues with the study. The findings across experiments are inconsistent, and it is unclear how the authors performed their analyses or why specific time points and comparisons were chosen. The study requires major re-analysis and additional experiments to substantiate its conclusions.

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions.

      (1a) Behavior task - the pellet-reaching task is a well-established paradigm in the motor learning field. Why did the authors choose to quantify performance using "success pellets per minute" instead of the more conventional "success rate" (see PMID 19946267, 31901303, 34437845, 24805237)? It is also confusing that the authors describe sessions 1-5 as being performed on a spoon, while from session 6 onward, the pellets are presented on a plate. However, in lines 710-713, the authors define session 1 as "naive," session 2 as "learning," session 5 as "training," and "retraining" as a condition in which a more challenging pellet presentation was introduced. Does "naive session 1" refer to the first spoon session or to session 6 (when the food is presented on a plate)? The same ambiguity applies to "learning session 2," "training session 5," and so on. Furthermore, what criteria did the authors use to designate specific sessions as "learning" versus "training"? Are these definitions based on behavioral performance thresholds or some biological mechanisms? Clarifying these distinctions is essential for interpreting the behavioral results.

      We agree that success rate is a more conventional measure than the number of successful prehensions per minute. We have changed all behavior quantifications to success rate. Note that all behavioral conclusions drawn before are still valid under the new quantification (see Figures 1, 4, and 5). Importantly, the terms “learning,” “training,” and “retraining” were defined based on task structure and prior literature on motor learning stages rather than predetermined behavioral performance thresholds. These labels reflect progression through the task design (initial acquisition, continued practice under stable conditions, and adaptation to altered task demands), not biologically distinct or threshold-defined phases. We have revised the Methods section to make these definitions and transitions explicit to avoid ambiguity in interpreting the behavioral results.

      (1b) Judging from Figures 1F and 4B, even in WT mice, it is not convincing that the animals have actually learned the task. In all figures, the mice generally achieve 10-20 pellets per minute across sessions. The only sessions showing slightly higher performance are session 5 in Figure 1F ("train") and sessions 12 and 13 in Figure 4B ("CLZ"). In the classical pellet-reaching task, animals are typically trained for 10-12 sessions (approximately 60 trials per session, one session per day), and a clear performance improvement is observed over time. The authors should therefore present performance data for each individual session to determine whether there is any consistent improvement across days. As currently shown, performance appears largely unchanged across sessions, raising doubts about whether motor learning actually occurred.

      As described in the methods Single pellet prehension task section, in our setup box, the elevated plate slot for pellet delivery is at a challenging position, outside the slit and 2cm to the right, forcing the mice to use the left paw. Therefore, mice need to be trained in gradually harder positions, using a spoon to deliver the pellet instead of placing it directly at the plate slot. Due to the gradually increasing difficulty in the task, the success rate curve remains flat, while the total number of attempts and number of successful prehensions per minute increase (Figure 1 F-H). We therefore argue that motor learning indeed occurred, with a relatively constant success rate when performing a gradually harder task. Further, the success rate and number of successful prehensions of our mice is within levels previously reported for trained mice (10.3791/51238). We added the precise plate slot position in the methods section to make clearer the need of a gradually increasing difficulty delivery method.

      (1c) The authors also appear to neglect existing literature on the role of SST-INs in motor learning and local circuit plasticity (e.g., PMID 26098758, 36099920). Although the current study focuses on a specific subpopulation of SST-INs, the results reported here are entirely opposite to those of previous studies. The authors should, at a minimum, acknowledge these discrepancies and discuss potential reasons for the differing outcomes in the Discussion section.

      We thank the reviewer for pointing this out. It is by no means a neglect, but a careful balance discussing previous literature that can be fairly compared with our findings. It is becoming increasingly clear — with mounting evidence from modern transcriptomic and connectomic studies — that the canonical “three‑cardinal” interneuron populations (SST⁺, PV⁺, VIP⁺) represent oversimplified groupings that mask considerable heterogeneity. For example, in a comprehensive single-cell RNA‑sequencing (scRNA‑seq) study covering ~1.3 million cells from mouse cortex and hippocampus, the authors identified dozens of discrete GABAergic subtypes beyond the classical marker-defined classes, revealing continuous and graded variation in molecular identity across cortical and hippocampal regions (10.1016/j.cell.2021.04.021). Moreover, a recent study focusing on SST-expressing interneurons demonstrated that even within the SST class there are multiple subtypes with distinct laminar distributions, axonal projection patterns, and circuit connectivity — for instance, two different Martinotti subtypes vs. a non-Martinotti SST subtype targeting different pyramidal neuron types and dendritic compartments (10.1016/j.neuron.2023.05.032). Finally, developmental single‑cell transcriptomics shows that interneuron diversity is already apparent at early postmitotic stages, indicating that these subtypes are pre-specified rather than being mere activity‑dependent states (10.1038/s41467‑018‑07458‑1). These findings argue strongly that the traditional SST⁺ / PV⁺ / VIP⁺ classification, while useful as a coarse heuristic, fails to capture the rich diversity in molecular, morphological, and functional phenotypes that likely underlie distinct roles in circuit computation and behavior.

      The consequence of this is that studies using any of these three markers must be cautiously interpreted since in reality, several quite different neuronal populations are studied at once, especially if no efforts were made to tease out which of the participating populations (inside the “cardinal” population) contribute to the effects seen. Most likely, the reported results are based on a mixed population - in the worst case scenario - populations with opposite effects. In any case, we have now included the role of SST-INs in motor learning and M1 circuitry in the discussion section. We also respectfully disagree that our findings are the opposite of previous SST-IN studies. We show that increasing Ma2 excitability improved execution of an already learned movement, while 10.1038/nn.4049 showed that both activating (which is different from increasing excitability) and inhibiting SST-INs impaired the learning of a stereotyped movement. Similarly, 10.1016/j.neuron.2022.08.018 showed that increasing SST-INs excitability impairs motor learning, not execution of a previously learned movement. While we found that increasing excitability of Ma2 cells did not affect motor learning, note that the Ma2 are a subset of martinotti cells with homogeneous electrophysiological and morphological properties (10.1371/journal.pbio.2001392), and martinotti cells themselves are a subset of SST+ cells (10.1016/j.neuron.2023.05.032). The discussion has been updated to include this reasoning.

      (2a) Calcium imaging - The methodology for quantifying fluorescence changes is confusing and insufficiently described. The use of absolute dF values ("detrended by baseline subtraction," lines 565-567) for analyses that compare activity across cells and animals (e.g., Figure 1H) is highly unconventional and problematic. Calcium imaging is typically reported as dF/F0 or z-scores to account for large variations in baseline fluorescence (F0) due to differences in GCaMP expression, cell size, and imaging quality. Absolute dF values are uninterpretable without reference to baseline intensity - for example, a dF of 5 corresponds to a 100% change in a dim cell (F0 = 5) but only a 1% change in a bright cell (F0 = 500). This issue could confound all subsequent population-level analyses (e.g., mean or median activity) and across-group comparisons. Moreover, while some figures indicate that normalization was performed, the Methods section lacks any detailed description of how this normalization was implemented. The critical parameters used to define the baseline are also omitted. The authors should reprocess the imaging data using a standardized dF/F0 or z-score approach, explicitly define the baseline calculation procedure, and revise all related figures and statistical analyses accordingly.

      The calcium imaging used here is 1-photon microendoscopic video data. To our knowledge, it is not possible to extract the true cell baseline over time from 1-photon data, since the background component includes signals from multiple sources, and usually has fluctuations larger than the neural signal itself. We agree that absolute dF values cannot be compared across cells, and that is not what we report here. The CNMF-E algorithm outputs the temporal activity of each neuron with the background component already removed (10.7554/eLife.28728) and therefore the baseline subtraction used in our study is already standardized (10.7554/eLife.38173). Note that although it is common in the literature to record 1-photon data and perform similar preprocessing (some form of baseline subtraction and/or normalization by noise std), referring to the resulting trace as dF/F, that is not entirely correct, since true F0 extraction is not possible. We thus chose to refer to the resulting preprocessed traces as what they actually are - dF detrended (raw trace with estimated background components removed). However, we agree that a better description of the process would be helpful in our manuscript, and that the nomenclature might be confusing to readers. We therefore expanded the methods section to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined. We also updated the example traces in Figure 1E to now show the raw traces, the estimated background components and the detrended traces.

      (2b) Figure 1G - It is unclear why neural activity during successful trials is already lower one second before movement onset. Full traces with longer duration before and after movement onset should also be shown. Additionally, only data from "session 2 (learning)" and a single neuron are presented. The authors should present data across all sessions and multiple neurons to determine whether this observation is consistent and whether it depends on the stage of learning.

      We agree that it would be beneficial to show longer traces as an example of prehension-related activity, so we expanded Figure 1I to show a longer trace for a single neuron. We added to Supplemental Figure 2 plots showing longer traces from all sessions including all neurons for both genotypes.

      (2c) Figure 1H - The authors report that chemogenetic activation of Chrna2 cells induces differential changes in PyrN activity between successful and failed trials. However, one would expect that activating all Chrna2 cells would strongly suppress PyrN activity rather than amplifying the activity differences between trials. The authors should clarify the mechanism by which Chrna2 cell activation could exaggerate the divergence in PyrN responses between successful and failed trials. Perhaps, performing calcium imaging of Chrna2 cells themselves during successful versus failed trials would provide insight into their endogenous activity patterns and help interpret how their activation influences PyrN activity during successful and failed trials.

      The reviewer is correct to assume that increasing excitability of Ma2 cells would suppress PC activity. As shown in Supplemental Figure 2I, that is exactly what we observe when considering only non-prehension related activity. Thus, it is very interesting that the opposite effect is seen for prehension-related activity. Also, this finding perfectly aligns with our results from the assembly analysis showing that assembly activity is decreased within the prehension window compared to outside the prehension window. Unfortunately, imaging Ma2 cells would only add information to this study in understanding their influence on PCs if we image both populations simultaneously, which require equipment and reagents we do not currently have. Fortunately, however, the endogenous activity patterns of Ma2 cells and the direct connectivity between Ma2 and pyramidal cells was already previously investigated in detail (10.1371/journal.pbio.2001392), therefore we expanded the discussion to better explain that the differential changes in PC when increasing Ma2 excitability could be due to increased PC synchronization, since a single Ma2 connects to several PCs, and upon inhibition release all connected PCs fire synchronously.

      (2d) Figure 1H - Also, in general, the Cre+ (red) data points appear consistently higher in activity than the Cre- (black) points. This is counterintuitive, as activating Chrna2 cells should enhance inhibition and thereby reduce PyrN activity. The authors should clarify how Cre+ animals exhibit higher overall PyrN activity under a manipulation expected to suppress it. This discrepancy raises concerns about the interpretation of the chemogenetic activation effects and the underlying circuit logic.

      As explained above, increasing Ma2 excitability indeed decreased non-prehension related PC activity, and the proposed mechanism has been added to the discussion section. We also made

      clearer in the results section that we are referring to prehension-related PC activity, and emphasize that overall non-prehension related PC activity is decreased.

      (3) The statistical comparisons throughout the manuscript are confusing. In many cases, the authors appear to perform multiple comparisons only among the N, L, T, and R conditions within the WT group. However, the central goal of this study should be to assess differences between the WT and hM3D groups. In fact, it is unclear why the authors only provide p-values for some comparisons but not for the majority of the groups.

      We agree that a clearer description of the statistical analysis is warranted. We expanded the statistical analysis methods section to clarify, among other things, that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and only positive p-values are reported in the figures, therefore the absence of p-value for a comparison means that is not significant.

      (4a) Figure 4 - It is hard to understand why the authors introduce LFP experiments here, and the results are difficult to interpret in isolation. The authors should consider combining LFP recordings with calcium imaging (as in Figure 1) or, alternatively, repeating calcium imaging throughout the entire re-training period. This would provide a clearer link between circuit activity and behavior and strengthen the conclusions regarding Chrna2 cell function during re-training.

      Unfortunately, it is not possible in our setup to record calcium imaging and LFP simultaneously, since the implants needed for the miniscope occupy the entire space above the animal’s cranium. To record calcium imaging during the execution of learned movements is also impractical. If the animals were to be implanted before the training phase, the signal will likely be too degraded for recordings after the training sessions, since the miniscope signal quality decreases over time, and over successive miniscope attachments. If the animals were to be implanted between the training and retraining phase (as the LFP group), the gap between training and retraining would be even larger, at least 28 days (as opposed to 16 days for the LFP group), which would affect the performance in the task. Therefore, LFP recordings provide understanding of the higher-level changes happening in neural activity when excitation is increased in Ma2 cells during the execution of learned movements. We respectfully disagree that the results from the LFP group cannot be interpreted in isolation, since we found that mice with increased excitability of Ma2 cells display increased low theta and gamma power during the prehension movement. As discussed in the manuscript, the increased high gamma band power when Ma2 cells are overexcitable, particularly for the successful trials in the planning phase, suggest that Ma2 cells may have a role influencing theta and gamma oscillations during motor performance (lines 1348-1355).

      (4b) It is unclear why CLZ has no apparent effect in session 11, yet induces a large performance increase in sessions 12 and 13. Even then, the performance in sessions 12 and 13 (30 successful pellets) is roughly comparable to Session 5 in Figure 1F. Given this, it is questionable whether the authors can conclude that Chrna2 cell activation truly facilitates previously acquired motor skills?

      We understand that a source of confusion for the behavioral data in the LFP group was the absence of data from sessions 1-7, together with the missing explanation about the task changing from spoon to plate (as explained in answers to question 1a and 1b). Since the animals are getting pellets from the spoon in session 5 (easier) and from the plate in later sessions (harder), the fact that animals achieved the same performance in the plate as they had on the last spoon session indicates they relearned the movement. To further clarify the training development, we added the full set of sessions (1-13) to Supplemental Figure 7, indicating the spoon-to-plate switch after session 5 and the 16-days gap between sessions 7 and 8 (due to viral injection and electrodes implant surgeries).

      (5) Figure 5 - The authors report decreased performance in the pasta-handling task (presumably representing a newly learned skill) but observe no difference in the pellet-reaching task (presumably an already acquired skill). This appears to contradict the authors’ main claim that Chrna2 cell activation facilitates previously acquired motor skills.

      We respectfully disagree that the results for the pasta-handling conflict with the finding that increasing Ma2 excitability facilitates previously acquired movements. The pasta handling specifically measures forepaw dexterity (as outlined in lines 442-444), therefore assessing forelimb function unrelated to learning. Mice perform a set of stereotyped movements to manipulate the pasta, therefore no learning is required (note that animals were habituated to the arena, followed by a single test session, with no training sessions). We do specifically mention in the results section that "we used the pasta handling task to assess forepaw dexterity that does not require learning" (lines 1137-1139). Our findings support our reported conclusion that "Ma2 cells may have a role in orchestrating precise forelimb movements that do not require previous specific training" (lines 1154-1156).

      (6) Supplementary Figure 1 - The c-Fos staining appears unusually clean. Previous studies have shown that even in home-cage mice, there are substantial numbers of c-Fos+ cells in M1 under basal conditions (PMID 31901303, 31901303). Additionally, the authors should present Chrna2 cell labeling and c-Fos staining in separate channels. As currently shown, it is difficult to determine whether the c-Fos+ cells are truly Chrna2+ cells.

      Our c-Fos stain does work well after having improved this method in several of our projects. Unfortunately, we could not check the references mentioned in the comment, since it points to a study that did not mention c-Fos (maybe incorrect PMID code?). However, we found our images to have similar c-Fos levels in control as other studies (for example 10.3389/fnana.2014.00013 Figure 1A and 10.1109/TBME.2024.3401136 Supplemental Figure 2C). Thus, we do find background activity of c-Fos in both Cre+ and control mice, but the c-Fos stain appears clean because of the strong up-regulation and fluorescent signal in exogenously activated hM3Dq+ cells. Also, we noticed that the manuscript was missing a methods section for the c-Fos experiments, therefore we added a section detailing the hM3Dq activation validation (lines 487-498). Further, the figure now displays separate channels for hM3Dq + cells (magenta) and c-Fos (cyan) for better clarity.

      (7) Overall, the authors selectively report statistical comparisons only for findings that support their claims, while most other potentially informative comparisons are omitted. Complete and transparent reporting is necessary for proper interpretation of the data.

      As explained above (comment 3), we expanded the statistical description in the methods to explain that all possible pairwise comparisons were performed and appropriately corrected for multiple comparisons, and that omitted comparisons are non-significant.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure legends - The authors should provide more detailed information in the figure legends, such as N values. It is also not explained what the bold bars, as well as the highest and lowest bars, represent. Clear labeling is essential for proper interpretation of the data.

      We revised all figure legends to add n-numbers for all quantification plots, and expanded the Statistical analysis methods section to explain the labeling of all quantifications.

      (2) Presentation of plots - The authors need to improve the clarity and completeness of their figure presentations. For example:

      (a) In Figure 1F, it is unclear whether the results were obtained under chemogenetic activation, as this information is missing from both the figure and the legend. Currently, it could be a comparison of Cre+ mice with Cre- mice without any manipulations.

      (b) In Figure 1H, p-values are reported, but it is not specified which groups are being compared. As mentioned above, why are p-values only given to some comparisons? Does that mean the others are not significant?

      (c) In Figure 1D, a scale bar should be provided.

      (d) In Figure 1E, the y-axis (fluorescence) scale should be clearly indicated.

      We thank the reviewer’s attention to the figure details. We added the missing scale bars for Figures 1D-E. We also clarified in the results section that all miniscope recordings were performed under clozapine treatment. As answered above (comments 3 and 7), we expanded the methods section to state that although all comparisons were made and appropriately corrected for multiple comparisons, only significant comparisons were reported. As for the groups being compared, every significance bar clearly connects two groups, which are the ones being compared. We also expanded the Statistical Analysis section to state that “Significance bars without ticks represent pairwise comparisons, while significance bars with downward ticks represent an effect.”.

      Reviewer #2 (Public review):

      The main limitation of the study lies in its small sample sizes and the absence of key control experiments, which substantially weaken the strength of the conclusions. Core findings of this paper, such as the lack of effect of Ma2 cell activation on motor learning, as well as the altered neuronal activity, rely on a sample size of n=3 mice per condition, which is likely underpowered to detect differences in behavior and contributes to the somewhat disconnected results on calcium activity, activity timing, and neuronal assembly activity.

      We understand that the source of confusion is the number of mice used for calcium imaging and the number of mice used for assessing the effect of Ma2 increased excitability in motor learning. The core finding that Ma2 increased excitability did not alter motor learning is supported by the data shown previously in Supplemental Figure 5 (now Figure 1F-H), with n=6 Cre+ and n=7 controls, which has enough statistical power to detect the effect of training session (F (3,33) = 9.254, power = 0.997) and should have enough power to detect the effect of group (estimated power of 0.835 for F(1,11)). The behavior performance of the miniscope-recorded mice was shown in the previous version for transparency, however no conclusion was drawn based on that data. To improve clarity, we now present data from the previous Supplemental Figure 5 as Figures 1F–H. This dataset clearly demonstrates that increased excitability of Ma2 cells did not affect motor learning. In addition, note that all quantification and conclusions drawn about neuronal activity are based on robust sample sizes: 1070 cells for controls and 403 for Chrna2-Cre+, or 70 assemblies for controls and 48 for Chrna2-Cre+. These sample sizes ensure sufficient statistical power, as demonstrated by the multiple significant effects and pairwise differences reported in our study. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice on behavioral outcomes.

      More comprehensive analyses and data presentation are also needed to substantiate the results. For example, examining calcium activity and behavioral performance on a trial-by-trial basis could clarify whether closely spaced reaching attempts influence baseline signals and skew interpretation.

      We agree and we performed a trial-by-trial analysis to verify the effect of adjacent prehensions in the trial signal. We found that only 17.7% of adjacent trials were affected by a previous trial. In addition we selected only trials not preceded by another trial for at least 6s, and evaluated whether activity immediately before the trial (-3 to -1s) is different from the activity long before the trial (-5 to -3s). The rationale is that if a trial would affect the baseline, then activity immediately before would be different from the activity long before the trial. In this analysis, we found no genotype- or session-related differences in baseline amplitude between epochs. Together these results confirm that prehension-related activity does not systematically alter non-prehension epochs. The results are shown in Supplemental Figure 3.

      The study uses cre-negative mice as controls for hM3Dq-mediated activation, which does not account for potential effects of Cre-dependent viral expression that occur only in Cre-positive mice. This important control would be necessary to substantiate the conclusion that it is increased Ma2 cell activity that drives the observed changes in behavior and cortical activity.

      Having a control group of Cre+ mice injected with cre-dependent vector control carrying, for example, only fluorescence, would add one more layer of certainty that the effects observed here are due to CLZ-induced hM3Dq activation. We do not agree, however, that it is necessary to confirm our findings. Cre-dependent expression alone was already extensively demonstrated to have no effect by comparing a DREADD activator to a vehicle treatment (for example 10.7554/eLife.38052, 10.1523/JNEUROSCI.0537-18.2018, 10.7554/eLife.67822). We also showed this for our LFP group (Figure 4), further confirming no effect of Cre-dependent hM3Dq expression alone.

      An unspecific effect of clozapine, where the treatment affects animals without the hM3Dq receptor, would be much more likely. We do control for this by giving the same treatment to Cre+ and Cre- mice. Moreover, since we use a low dose of clozapine, a lack of hM3Dq activation would be more likely, which we also controlled for with the c-Fos experiment as explained in the answer to the Minor point 1. Nevertheless, we added to the discussion that although we find it highly unlikely that the effects found here are due to Cre-dependent viral expression, we have not recorded Cre+ animals expressing control vectors instead of hM3Dq (lines 1360-1375).

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) One of the main findings in this paper is that Chrna2-Cre cell activation did not affect learning of the prehension task; however, the presented data do not convincingly support this claim. Looking at Fig.1F, Cre+ mice appear to have an overall lower number of successful prehensions compared to control mice. If this is not statistically significant, it is likely because n=3 mice for each group is underpowered. To better judge the behavior of these mice, it would be necessary to plot success rate and overall number of prehensions over the entire course of training, in addition to successes per minute. Given that n=3, plotting all individual data points would make more sense than showing a violin plot. Relatedly, in Supplemental Figure 5, there appears to be a clear effect on reduced success rates in Cre+ mice, which is stated in the figure legends, whereas the result section states: we found no effect of genotype on prehension success rates (lines 895-896). The authors should ensure that these behavior experiments are sufficiently powered to detect potential differences in learning between groups and present the complete data and statistical analysis.

      As explained on Comment 1, the finding that Ma2 increased excitability did not alter motor learning is not based on the data on the previous Figure 1F (n=3 Cre+ and n=3 controls, shown for transparency). Instead, it is supported by the data in the previous Supplemental Figure 5, now Figures 1F-H, with n=6 Cre+ and n=7 controls, for which we found only overall effects of training session, but no effect of genotype, with no significant post-hoc pairwise comparisons. We agree that plotting the success rate, total number of prehensions and successful prehensions per minute, for all 6 sessions, allows better evaluation of the mice behavior. We moved the Supplemental Figure 5 into Figure 1, plotting the three measures for the full set of sessions, with individual data points within the violin plots, and expanded the statistical results description on the main text. We reiterate that no underpowered tests were conducted in this study, and no conclusions were drawn on n = 3 controls and 3 Chrna2-Cre+ mice.

      (2) The authors mention that a significant fraction of prehension trials overlapped with a preceding prehension attempt. Were those attempts excluded from the analysis? The stark differences in calcium signals at baseline before prehension onset in some sessions (Figure 1G, Supplementary Figure 2D) suggest that trials preceding closely in time might play a role and could skew the analysis and interpretation.

      Overlapping trials were not excluded from the previous analysis. As summarized in our response to Comment 2, and expanded in the results section (lines 876-894), we found that only 17.7% of adjacent trials were affected by a previous trial, and that when selecting only trials not preceded by another trial for at least 6s, we found no effect of prehension-related activity in the baseline preceding the trials.

      (3) Relatedly, to test the differences in calcium activity before and after prehension onset, it would be clearer to use a delta F/F measure where the 1 second before onset is used as baseline.

      Since a large proportion of neurons are more active before the onset (on the movement planning phase, Figure 2C), the activity 1s before the movement onset cannot be considered as F0. Dividing the activity during the movement by the activity during the planning phase would generate a different measure, a form of execution/planning ratio. We performed this analysis as an additional measure and found a three-way interaction effect of genotype, session, and prehension accuracy, driven by genotype effects on early sessions, indicating that Ma2 activity might be involved in the planning/execution activity balance. Those results are now described in the results section and shown at the Supplemental Figure 4.

      (4) For the experiments in which mice were trained prior to Ma2 cell activation (Fig.4), the behavior in sessions 8-10 does not seem to have reached a plateau yet, and the increase in successful prehensions in sessions 11-13 of Cre+ mice could just be a continuation of training. It would be more convincing to show the original training curve of those mice in sessions 1-7. Additionally, the authors should perform a two-way ANOVA test for the interaction of drug and genotype, rather than two separate one-way ANOVAs.

      We agree, and we now show the curve for sessions 1-7 in Supplemental Figure 7, showing that the success ratio for sessions 8-10 is similar to session 7. Also, a 2-way ANOVA was already performed, although the full report was missing from the manuscript. We switched from successful prehensions per minute to success ratio (see Reviewer #1 comment 1a) and now include the full report, in which we found an overall effect of session, and when grouping by genotype, we found an effect for Cre+ but not control mice (lines 1065-1072).

      Minor points

      (1) The validation experiment for the efficacy of hM3Dq is somewhat confusing. It is surprising that the few hM3Dq-mCherry expressing cells in the cre-negative mice did not show increased c-Fos staining since non-specific leaky hM3Dq expression would presumably still lead to a functional DREADD. The better control for validating the efficacy of hM3Dq-mediated Chrna2-Cre cell activation would be to show c-Fos staining in Cre+ mice with or without clozapine injection. This would control for non-specific c-Fos expression and neuronal activation purely by expression of the DREADD. In cre-negative control mice, the comparison should also be between mice with and without clozapine injection to control for non-specific neuronal activation regardless of hM3Dq expression.

      We thank the reviewer for raising this point and agree that validation of hM3Dq efficacy and specificity requires careful interpretation. In principle, any hM3Dq-expressing cell, including the few hM3Dq-mCherry+ cells observed in Cre– mice, could respond to clozapine. However, in practice, effective DREADD activation depends on sufficient receptor expression levels and on the pharmacodynamics of clozapine in the brain (Gomez et al., 2017, Science, 10.1126/science.aan2475). In our dataset, even in Chrna2-Cre+ mice, only ~76% of hM3Dq+ cells showed c-Fos induction after clozapine, indicating that receptor expression and/or ligand access is not uniform across cells. Consistent with this, the very sparse and weak hM3Dq expression observed in Cre- mice resulted in only 0.8% of hM3Dq+ cells showing c-Fos induction, which is in line with previous reports demonstrating that low-level “leaky” expression is insufficient to drive neuronal activation (e.g. 10.1038/s41467-019-12236-z; 10.1523/JNEUROSCI.0537-18.2018; 10.1523/ENEURO.0363-21.2021).

      The reviewer also suggests that an ideal validation would compare Cre+ mice with and without clozapine to control for any c-Fos induction driven purely by DREADD expression. We agree that such a comparison is informative, and note that in our experiments the c-Fos assay was designed specifically to test whether the low clozapine dose used (0.01 mg/kg) is sufficient to activate hM3Dq in Ma2 cells, rather than to assay baseline effects of viral expression.

      Importantly, non-specific effects of clozapine itself were controlled for throughout the study by administering the same clozapine dose to both Chrna2-Cre+ and Cre– mice in all behavioral and physiological experiments. Thus, any clozapine-driven neuronal activation independent of hM3Dq would be expected to appear in both groups.

      Together, these results indicate that (i) the clozapine dose used is sufficient to robustly activate hM3Dq-expressing Ma2 cells, (ii) sparse leaky expression in Cre– mice is not sufficient to drive measurable activation, and (iii) the effects reported in the manuscript are unlikely to be explained by non-specific clozapine actions or by viral expression alone.

      (2) The authors state in the methods section that "only neurons that displayed a significant change comparing the before onset and after onset phases" were included in the analysis. This appears to bias the data towards neurons that change their activity with the prehension movement. If this is the intention, the authors should clearly state this and their rationale in the results section and show what proportion of recorded neurons fall into this category.

      Yes, thanks for pointing this out, the explanation for this exclusion criteria is missing. We expanded the methods section “Neural activity around prehensions” to explain that since we are evaluating the role of Ma2 cells in the prehension-related activity of pyramidal cells, we excluded neurons with no prehension-related activity. We also stated in the expanded text that 15.97% of recorded neurons were excluded due to no prehension-related activity.

      (3) I don’t understand the peak PC activity latency shown in Figure 2D. How is it possible that there are negative peak latencies during the prehension phase, which is defined as >0sec, (upper right panel), and positive peak latencies in the before prehension phase, which is defined as <0sec, (lower right panel)?

      As stated in lines 939-941 and in the figure 2C legend, neurons were sorted into "before prehension" or "during prehension" neurons according to their activity during the successful prehension. One of our main findings is that the pyramidal cells temporal patterns were strongly affected by prehension accuracy (lines 941-944) meaning that a significant number of neurons shifted prehension phases when performing a failed prehension (as illustrated in Figure 2C, note how the temporal pattern is not kept from successful to failed prehensions). That is why, for failed prehensions, there are negative latencies for neurons that were classified as "during prehension" and positive latencies for neurons classified as "before prehension" in successful trials. We expanded the sorting explanation in the results section (lines 944-950) to better highlight the latency change between different prehension accuracies.

      (4) Please specify how baseline subtraction (detrending) was performed for the calcium image analysis.

      We expanded the methods section “Neural signal extraction” to better explain that we will now refer to F0 as the background component (and refer to our resulting traces as dF/F) and explain how it was determined (lines 614-619).

      (5) The authors state that they found a "dissociation between changes in neural activity and performance outcomes". Since they only analyzed motor performance by quantifying successful prehensions, this statement should be caveated with the notion that other aspects of the behavior (e.g., trajectories/speed) could be affected but were not measured.

      We agree, and expanded the discussion section to acknowledge that we focussed the behavioral aspects to success ratio, and that other measures not investigated could also be affected (lines ????-????).

      (6) Are the differences in theta and gamma power specific to the prehension trials, or does Ma2 cell activation generally increase LFP activity in those bands?

      We thank the reviewer for the question, as we had not analyzed general LFP activity in the previous version. We performed the same analysis now including only LFP from epochs outside prehension windows across the full sessions. We found that Mα2 cell activation actually reduces LFP power across all bands specifically in Session 13 when no prehension is being performed. These findings are now included as Supplemental Figure 7.

      (7) Please define terms that might not be familiar to a typical reader in the field, such as "assemblies", when first introducing them in the text.

      We revised the introduction where we now define assemblies (lines 85-88).

      (8) Please specify the n-numbers for each figure throughout the manuscript. For example, in some figures, the number of trials or the number of neurons is used; however, it is not clear what this number is.

      We agree that although the n-numbers are stated in the text, it would be clearer to add them also to the figure legends. All figure legends now contain n-numbers for panels showing quantifications.

      (9) Relatedly, while the inclusion of supplemental tables with expanded statistical results is commendable, several statistical test details are missing, such as for Figure 5.

      We have fully revised the text to add any missing statistical details for the statements in the Supplemental Tables.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Nio and colleagues address an important question about how the cerebellum and ventral tegmental area (VTA) contribute to the extinction learning of conditioned fear associations. This work tackles a critical gap in the existing literature and provides new insights into this question in humans through the use of high-field neuroimaging with robust methodology. The presented results are novel and will broadly interest both the extinction learning and cerebellar research communities. As such, this is a very timely and impactful manuscript. However, there are several points that could be addressed during the review process to strengthen the claims and enhance their value for readers and the broader scientific community.

      (1) Reward Interpretation and Skin Conductance Responses (SCR)

      A central premise of the manuscript is that 'unexpected omissions of expected aversive events' are rewarding, which plays a critical role in extinction learning. The authors also suggest that the cerebellum is involved in reward processing. However, it is unclear how this conclusion can be directly drawn from their task, which does not explicitly model 'reward.' Instead, the interpretation relies on SCR, which seems more indicative of association or prediction rather than reward per se. Is SCR a valid metric of reward experienced during the extinction of feared associations? Or could these findings reflect processes tied more closely to predictive learning? Please, discuss.

      We thank the reviewer for raising this important point. We agree that skin conductance responses (SCRs) do not directly index reward. More generally, SCRs reflect autonomic arousal in response to salient or motivationally significant stimuli and are closely linked to expectancy and contingency awareness. In our study, SCRs served as a read-out of the participants’ expectation of a US, and were used to fit the hyperparameters of a reinforcement-learning-based deep learning model, which then provided per-trial estimates of prediction and prediction error values. These estimates capture predictive learning about the occurrence of the aversive US, rather than reward per se. The interpretation of unexpected US omissions as “reward-like” prediction errors relies on prior literature, particularly rodent studies showing that dopaminergic neurons in the VTA respond to omitted aversive stimuli and drive extinction learning via projections to the nucleus accumbens (Kalisch et al., 2019; Salinas-Hernández et al., 2018, 2023). We therefore interpret our cerebellar activations during unexpected omissions as being compatible with the processing of reward-like prediction errors, while acknowledging that this inference is indirect.

      To clarify this reasoning, we made revisions to the Introduction and Discussion to (i) state explicitly that SCRs do not directly measure reward but were incorporated into the reinforcement learning model as an index of autonomic arousal related to US expectancy and predictive learning, and (ii) consistently replace the term “reward prediction error” with “reward-like prediction error” throughout.

      (2) Reinforcement Agent and SCR Modeling

      The modeling approach with the deep reinforcement agent treats SCR as a personalized expectation of shock for a given trial. However, this interpretation seems misaligned with participants' actual experience - they are aware of the shock but exhibit evolving responses to it over time. Why is this operationalization useful or valid? It would benefit the manuscript to provide a clearer justification for this approach.

      This point is well taken. We did not collect trial-by-trial expectancy ratings, as frequent button-box responses would have induced cerebellar activations unrelated to fear (extinction) learning. Subjective expectancy was assessed only at the end of each experimental phase. As frequently done in the human fear conditioning literature, we used trial-by-trial SCR data (Lonsdorf et al., 2017). Although SCRs show correspondence with US expectancy ratings, they are inherently noisy and show substantial variability across trials and participants (Constantinou et al., 2021). Therefore, individual trial-by-trial responses cannot be used to directly infer US predictions. Accordingly, we used group-averaged SCR data to fit model hyperparameters in a grid search across parameter settings. The best-fitting hyperparameters were then applied to 100 randomly initialized agents, and their outputs were averaged to generate trial-wise estimates of predictions and prediction errors. These averaged values were used as parametric modulators in the fMRI analyses. We have revised the Introduction and Methods to make this procedure clearer.

      (3) Clarity and Visualization of Results

      The results section is challenging to follow, and the visualization and quantification of findings could be significantly improved. Terms like 'trending' appear frequently - what does this mean, and is it worth reporting? Adding clear statistical quantifications alongside additional visualizations (e.g., bar or violin plots of group means within specific subregions within the cerebellum, or grouped mean activity in VTA and DCN) would enhance clarity and allow readers to better assess the distribution and systematicity of effects. Furthermore, the figures are overly complex and difficult to read due to the heavy use of abbreviations. Consider splitting figures by either phase of the experiment or regions, and move some details to the supplemental material for improved readability.

      We agree with the reviewer that the clarity of results can be improved and have revised the manuscript accordingly. Specifically:

      (1) We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. To address this, uncorrected t-maps are now displayed with a grey striped background frame, and colorbar labels have been enlarged to emphasize whether TFCE/FWE-corrected or uncorrected t-values are shown.

      (2) We added a supplementary table (Table S7) reporting group-level summary statistics for all fMRI contrasts presented in the manuscript, including group means, standard deviations, effect sizes (Cohen’s d), and 95% confidence intervals for cerebellar cortex, cerebellar nuclei, and VTA VOIs. We hope that this helps with the interpretation of effect magnitude and variability across fMRI analyses.

      (3) To improve readability, we split overly complex figures: Figure 2 now separates CS-related prediction from US-related presentation contrasts (which are now revised Figures 4 and 5), and Figure 3 separates event-based and parametric modulation contrasts (which are now revised Figures 6 and 7).

      (4) We also reduced abbreviations in the figures, and provide full definitions and explanations also including the original abbreviations in the main text and figure captions for clarity.

      We considered the suggestion to split figures further by region or by phase. However, we believe it is more informative to present the cerebellar cortex, nuclei, and VTA together for each contrast, and to keep all phases side by side, as this allows readers to directly assess commonalities across phases. We therefore chose to keep the same overall structure, but simplified the figures in other ways (e.g. splitting by contrast type) to improve overall readability. We hope that these changes address the reviewer’s concerns by simplifying the presentation, removing abbreviations, and providing clearer quantification of results.

      (4) Theoretical Context for Paradigm Phases

      The manuscript benefits from the comprehensive experimental paradigm, which includes multiple phases (acquisition, extinction, recall, reacquisition, re-extinction). This design has great potential for providing a more holistic view of conditioned fear learning and extinction. However, the manuscript lacks clarity on what insights can be drawn from these distinct phases. What theoretical framework underpins the different stages, and how should the results be interpreted in this context? At present, the findings seem like a display of similar patterns across phases without sufficient interpretation. Providing a stronger theoretical rationale and reorganizing the results by experimental phase could significantly improve readability and impact.

      We thank the reviewer for this constructive suggestion. We would first like to mention that the primary aim of this manuscript is not to analyze differences between phases, but rather to highlight the commonalities. Across different learning contexts, we consistently observed reward-like prediction error-related activations in the cerebellum and VTA. This consistency and connectivity between the cerebellum and VTA, despite phase-to-phase differences, is the most important finding of our study.

      We agree, however, that the manuscript did not sufficiently explain how each phase differs conceptually, which is important for readers to understand why the consistency of responses is notable. We therefore expanded the Introduction and Discussion to provide clearer theoretical context for each phase. More specifically, the phases can be understood as follows:

      Extinction (day 2): Because acquisition was conducted with a 100% reinforcement rate, unexpected US omissions during initial extinction trials maximize reward-like prediction errors and yield stronger, more uniform expectations across participants compared to a partial reinforcement rate. This phase should therefore provide the clearest opportunity to observe cerebellar-VTA contributions to the processing of reward-like prediction errors.

      Recall (day 3): Despite allowing for the consolidation of extinction learning, the recall test often still elicits conditioned fear responses to the CS+, that is, shows spontaneous recovery of the initial fear association (Bouton, 2002). In these trials, the non-occurrence of the US is unexpected. In this context, US omission-related activations reflect reward-like prediction errors during renewed fear responding in the presence of both a fear memory and an extinction memory. This contrasts with extinction training on day 2, where prediction errors arose primarily against the background of the recently acquired fear memory, without a competing extinction memory.

      Reacquisition (day 3): Unlike acquisition, reacquisition used a partial reinforcement rate, such that non-reinforced CS+ trials were interspersed between reinforced CS+ trials (similar to the partially reinforced phase used by Ernst et al., 2019). Because reacquisition occurs in the presence of savings, that is, the presence of a previously acquired fear memory, US expectancy increases rapidly following reinforced trials and relearning occurs faster (Bouton, 2004). Importantly, partial reinforcement maintains high US expectancy and therefore allows prediction errors to remain sustained across omission trials (Figure 9).

      Reextinction (day 3): Reextinction is an additional extinction phase but without a consolidation interval, and with an already established fear extinction memory. Because reextinction followed the partially reinforced reacquisition phase, prediction errors during early reextinction decayed more slowly than during extinction on day 2 (following the fully reinforced acquisition phase on day 1) (Figure 9). Together, reacquisition and reextinction were designed to maximize the number and persistence of unexpected US omissions, thereby providing additional opportunities to examine reward-like prediction-error signaling.

      By clarifying this framework, we aim to show that while the learning context and history differ across phases, the consistent cerebellum-VTA activation and connectivity related to unexpected US omissions underlines the robustness of the effect. We chose not to reorganize the Results by phase, as our central conclusion rests on similarities rather than differences. Instead, we have clarified the theoretical background in the revised manuscript to help readers interpret both the commonalities and the potential sources of variability.

      (5) Cerebellum-VTA Connectivity Analysis

      The authors argue that the cerebellum modulates VTA activity, yet they perform the PPI analysis in the reverse direction. Why does this make sense? In their DCM analysis, they found a bidirectional relationship (both cerebellum - VTA and VTA-cerebellum), yet the discussion focused on connectivity from the cerebellum to VTA. A more careful interpretation of the connectivity findings would be useful - especially the strong claims in the discussion on the cerebellum providing the reward signal to the VTA should be tempered.

      We thank the reviewer for highlighting this issue. In our primary analysis, we used the VTA as the PPI seed and observed trend-level connectivity with the cerebellum. When we reversed the analysis and used the cerebellar volume of interest (VOI) from the conjunction analysis as the seed, effects in the VTA were substantially weaker. We believe this reflects the broad connectivity profile of the cerebellar VOI (i.e., not specific to the VTA) as well as general limitations of PPI in our study, including the small number of unexpected omission trials and the lack of specificity to reward-like prediction errors (e.g., connectivity also appeared during US presentation). For transparency, we now report the cerebellar-seed PPI results in the Supplementary information (Figure S3). Given their limited robustness, we chose not to include the corresponding VTA maps in the main figures.

      Finally, we agree that our conclusions regarding cerebellum-VTA interactions should be framed more cautiously. While the DCM analyses support bidirectional connectivity, our original discussion placed disproportionate emphasis on cerebellum-to-VTA influences. We have revised the text to provide a more balanced interpretation that also considers VTA-to-cerebellum connectivity.

      Reviewer #2 (Public review):

      Summary

      Building upon the group's previous work, this study used a 3-day threat acquisition, extinction, recall, reextinction, and reacquisition paradigm with 7T imaging to probe the mechanism by which the cerebellum contributes to fear extinction learning. The authors hypothesize this may be via its connection to the VTA, a known modulator of fear extinction due to its role in reward processing. Using complementary analysis methods, the authors demonstrate that activity with the cerebellum, DNC, and VTA is modulated by predictions about the occurrence of the US, which shows regional specificity. They show trend-level evidence that there is increased functional connectivity between the cerebellum and VTA during all phases of the paradigm with unexpected omissions. They also present a DCM which indicates that the cerebellum could positively modulate VTA activity during extinction learning. This study adds to a growing literature supporting the role of the historically overlooked cerebellum in the control of emotions and suggests that an interaction between the cerebellum and VTA should be considered in the existing model of the fear extinction network.

      Strengths

      The authors address their research question using a number of complementary methods, including parametric modulation by model-derived expectation parameters, PPI, and DCM, in a logical and easily understood way. I feel the authors provide a balanced interpretation of their findings, presenting numerous interpretations and offering insight with regard to reward vs attention or unsigned prediction errors and the directionality of the interaction they identify. The manuscript is a timely addition to growing literature highlighting the role of the cerebellum in fear conditioning, and emotion generation and regulation more generally.

      Weaknesses

      Subjective and skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both domains persisted after extinction training. I do not feel that this negates the findings of this manuscript, though it raises questions about the parametric modulators used, and the interpretation of the neural mechanisms proposed if they do not strongly relate to updated subjective appraisals (the goal of extinction therapy). My interpretation of the manuscript suggests there are some key results based upon contrasts that have as few as three events; I am a little unsure about the power and reliability of these effects, though I await author clarification on this matter. There are a number of unaddressed deviations from the pre-registered protocol that I have asked the authors to elaborate upon.

      We thank the reviewer for the thoughtful and constructive evaluation of our work. We appreciate that the manuscript and methods were found to be clearly presented, and we welcome the suggestions for clarification and improvement. Below we address the specific concerns regarding extinction learning in behavioral measures, the reliability of event-based contrasts with few trials, and deviations from the preregistration.

      Extinction in self-reports and skin conductance responses (SCRs)

      The reviewer is correct that CS+/CS- differentiation persisted after extinction. Although there was no differentiation in SCRs at the end of extinction, post-extinction self-reports continued to do so, albeit to a lesser degree, which is in line with previous literature on dissociation of outcome measures during fear conditioning (Lipp et al., 2003). This residual subjective differentiation is also consistent with extinction forming an inhibitory memory trace that suppresses, rather than erases, the original fear association (Bouton, 2002; Milad & Quirk, 2012), and a single extinction session is often insufficient to eliminate differential responding (Craske et al., 2014; Vervliet et al., 2013). However, both measures showed significant effects of extinction learning.

      We included additional analyses of self-reports across phases. Importantly, CS+ ratings were significantly reduced during extinction and recall compared to acquisition (all p ≤ 0.001), whereas CS- ratings remained unchanged (all p > 0.532). This pattern demonstrates that the magnitude of the CS+/CS- difference was significantly reduced relative to acquisition, indicating that extinction learning did occur (Doubliez et al., 2025).

      For physiological responses, extinction learning was shown in PSRs but not conclusively in SCRs. PSRs showed a significant reduction of CS+ responses across extinction, while CS- responses remained unchanged. SCRs showed a reduction of CS+/CS- differentiation across extinction; however, this effect remained at trend level, as the Stimulus x Time interaction did not reach significance (p = 0.053). This pattern is consistent with early differentiation followed by rapid attenuation under the full reinforcement structure of the paradigm (100% reinforcement during acquisition and 0% during extinction). Under such conditions, participants rapidly learn that the US is no longer delivered during extinction, such that physiological responses are largely confined to the first few trials, leaving limited power to detect extinction effects in noisier measures such as SCRs. To address the lower robustness of SCR effects, as recommended by the reviewer, we therefore included PSRs in the main Results section, which provide converging physiological evidence for extinction learning.

      Of note, on day 3, both physiological measures and self-reports again showed CS+/CS- differentiation, consistent with spontaneous recovery, a well-established phenomenon reflecting the persistence of the original fear trace after consolidation (Bouton, 2002; Vervliet et al., 2013).

      Taken together, these findings demonstrate that the paradigm successfully induced both acquisition and extinction of conditioned fear, even though residual fear responses persisted.

      Reliability of event-based contrasts with three trials

      The initial decision to use three events for event-based contrasts was based on SCR and PSR data, which showed that differentiation between CS+ and CS- occurred almost exclusively in the first few trials of extinction and recall. Consistent with the full reinforcement described above, prediction errors were expected to be high in the very first extinction trials, and to decay rapidly. Thus, the usual half-block division (e.g., first eight trials) would have included many trials without meaningful prediction errors.

      We acknowledge that contrasts based on three trials provide limited statistical power. To address this concern, we added a supplementary table showing summary statistics for contrast estimates in the cerebellar cortex, cerebellar nuclei, and VTA VOIs across all fMRI analyses (Table S7), including both the event-based and parametric modulation approaches. Importantly, the event-based contrasts showed moderate to strong effects despite being restricted to the first three unexpected omission trials. Moreover, the parametric modulation analyses, which incorporate all available trials, yielded results that were consistent with the three-trial event-based contrasts and with the patterns shown in the main figures. This convergence between event-based and parametric approaches strengthens our confidence that the observed effects are reliable.

      Deviations from preregistration

      We acknowledge that deviations from the preregistered protocol were not fully documented and have now added this information. The main deviation concerned our event-based analyses: while the preregistration planned early vs. late block comparisons, in practice the rapid decay of SCRs under our 100% and 0% reinforcement rates rendered later trials uninformative for prediction error analyses. We therefore focused on the first three trials, when prediction errors are expected to be present. These behavioral findings are also consistent with Doubliez et al. (2025), who used the same paradigm and observed similar rapid SCR decay. Other deviations, such as not reporting exploratory whole-brain DCM analyses, are now clearly stated for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Point - Paradigm Details

      Providing additional details about the experimental paradigm in the main text (e.g., the nature of the visual stimuli associated with shocks) would enhance the manuscript's clarity. Some of the information currently in supplementary Figure 5 could be incorporated into the main text to enhance the understanding of the paradigm

      We agree that the current structure reduces clarity, as the paradigm is only explained in detail after the results. To improve readability, we have moved parts of Figure 5 (illustrating the paradigm and scanner setup) to the beginning of the manuscript (now revised Figure 1). In addition, information from Figure 5, including details of the visual stimuli, is now added to the Introduction.

      Reviewer #2 (Recommendations for the authors):

      Methods

      Can the authors please clarify what part of the task went into [US post CS+ > no US post CS-] contrast? Is this the time immediately after the CS presentations, when the US has just occurred/not occurred, or rather more like the CS+>CS- contrast except including trials confounded by the US (i.e. [CS+/US > CS -])?

      The contrasts are based on an event-related separation of CS and US. The CS was presented for 6 seconds, with its onset modeled in the GLM as a zero-duration event (delta function). The CS offset coincided with either the delivery or omission of the US, which was likewise modeled as a zero-duration event. Thus, CS onset and offset were modeled separately. The no-US events were further distinguished by whether they followed a CS+ or a CS-. Accordingly, we analyzed both CS and US-related contrasts; for example, the CS+ > CS- contrast reflects CS-related differentiation at CS onset (0 s), whereas [US post CS+ > no US post CS-] reflects (no-)US-related activity at CS offset (6 s; US delivered from 5.9-6.0 s). We have added further clarification to the Methods section.

      I was a bit unclear on what this sentence of the methods meant "Notably, all single trials comprised CS+ trials, with CS- trials also being modeled as single trials to facilitate paired analysis", does this mean that some contrasts had 6 events in total - e.g. the first 3 unexpected omissions vs 3 x CS-. If so, which CS- were selected for the comparison?

      We agree that this sentence was unclear and have revised it. Our intention was to describe that when CS+ trials were modeled as single trials in the GLM (e.g., each CS+ onset and its associated [no-]US event modeled as separate regressors), the CS- trials were modeled in the same way. This ensured that paired analyses would be possible if required.

      For reacquisition and reextinction, single-trial modeling was necessary, as the last unexpected omission of reacquisition is also the first unexpected omission of reextinction. Modeling trials separately allows us to examine the first three unexpected US omissions in each phase independently.

      The event-based contrasts for unexpected US omissions were defined in line with a previous study of our group. For example, during extinction we contrasted the first three unexpected US omissions following CS+ with all expected omissions following CS- (i.e. [first 3 no US post CS+ > no US post CS-], corresponding to 3 vs. 16 events). The weights of events were automatically scaled by SPM12 so that both sides of the contrast carried equal total weight (e.g. positive events weighted 1/3, negative events weighted -1/16). This procedure matches the approach in Ernst et al. (2019), where in partially reinforced acquisition 6 unexpected omissions after CS+ were contrasted with 16 expected omissions after CS-.

      More generally, can the authors please comment on the power and reliability of analyses that include only 3 events in a condition [e.g. the first 3 unexpected omissions]?

      It is not clear if the (US post CS+ > no US post CS-) phases were included. In your pre-registration you say "we will use a "no US post CS+ > no US post CS-" fMRI contrast, where "no US post CS+" designates unexpected omission events in early extinction, early recall (depending on behavioral data which might indicate a return of fear) and a volatile phase (where unexpected omissions occur in the first part of the volatile phase, i.e. reacquisition).", but my reading of the manuscript was that it included both early and late "see 1st level analysis = US post CS+, no US post CS+, no US post CS- separately for each phase; 2nd level = contrast included unexpected omission of the US (no US post CS+ > no US post CS-)". Please clarify and if necessary explain the deviation from preregistration.

      We agree that this point requires clarification. In the preregistration, we planned to divide phases into early and late blocks (no US post CS+ > no US post CS-). However, as already outlined in our response (Reviewer 2, public review response: Reliability of event-based contrasts with three trials), both our preliminary behavioral data and subsequent modeling analyses indicated that differentiation between CS+ and CS- declined extremely rapidly under the 100% reinforcement schedule, leaving likely little or no prediction error beyond the first few trials. Based on this, we adapted the event-based analyses to focus on the first three unexpected omission trials in extinction, recall, and reextinction, where prediction errors are expected to be present. In reacquisition, only three omission events occur by design (83% reinforcement), so this naturally constrained the analysis to three trials. We now explicitly describe this deviation from the preregistration in the revised manuscript.

      As outlined in the same response, we recognize that contrasts based on three trials provide limited statistical power, and addressed this point by providing additional summary VOI statistics of contrast estimates for both event-based and parametric modulation contrasts, which show moderate-to-strong effect sizes and convergence across methods, which we argue supports that using the first three trials is a reliable approach (Reviewer 1, public review response, point (3) Clarity and Visualization of Results).

      Finally, with regard to the reviewer’s specific question: yes, US post CS+ > no US post CS- contrasts were examined for acquisition training, primarily to demonstrate US-related activation (see revised Figure 3).

      Results

      Page 5 + 6: Including the interaction effects for pupil size responses during extinction and reextinction in the SCR section seems unjustified. I appreciate that the SCR data does not significantly support the key claim that extinction learning towards the CS+ occurred, but I do not feel it is acceptable to draw from the other measure for this effect alone. If the PSR measure is of primary/significant importance to support the validity of your paradigm, please consider adding all of these results to the main manuscript.

      We agree with this point and have moved the PSR analysis to the main manuscript. In addition, the SCR Results section no longer includes the PSR analyses, and clearly states the absence of a significant Stimulus x Time interaction effect in extinction (p = 0.053). For completeness, we additionally report trend-level post hoc tests showing CS+/CS- differentiation during early extinction but not during late extinction, consistent with an initial differentiation that attenuates across extinction training.

      Subjective and (some) skin conductance responses do not completely support the success of the learning paradigm. For example, CS+/CS- differentiation in both subjective domains and SCRs persisted after extinction training. Can the authors comment on how this might influence the interpretation of their results more generally? What does it mean if these expectations do not appropriately translate to updated subjective appraisals in your participants, contrary to the model from which the parametric modulators were derived would predict?

      The persistence of CS+/CS- differentiation in self-reports after extinction, and the return of CS+/CS- differentiation in both self-reports and physiological measures during the recall test, is not unexpected. For self-reports administered after extinction, such persistent CS+/CS- differences are commonly observed in the human fear extinction literature (Hermans et al., 2006; see also Lipp et al., 2003), and may reflect that initial extinction learning establishes a new inhibitory association that suppresses, but does not erase, the original fear memory (Bouton, 2002). At recall on day 3, the remaining differentiation in both self-reports and physiological responses is consistent with spontaneous recovery, a well-documented phenomenon in extinction research (Bouton, 2002). As noted earlier (Reviewer 2, public review response: Extinction in self-reports and skin conductance responses (SCRs)), additional analyses showed that ratings were significantly reduced after extinction and recall compared to acquisition. Thus, while residual differentiation in self-reports remained after extinction and recall, its magnitude was diminished, indicating that extinction learning occurred but was incomplete. This pattern is consistent with partial updating of subjective appraisals in accordance with the reinforcement-learning model used to derive the parametric modulators, rather than a failure of updating.

      Figures

      Figure 1: Please ensure that the summary of your results in the figure legend is consistent with the quantitative results reported. Example 1: "On day 2, there was a loss of differentiation during extinction training.", however, a significant effect of the stimulus, and time remained (but no interaction). Please tone down this interpretation, or make it clearer how the difference in the initial extinction trials was quantified. If the ANOVA-type analysis was only performed in the first half, this was not clear. Example 2: "During initial reacquisition, there were again differential responses to the CS+ and CS-, which decreased in reextinction and the unexpected US phase". I appreciate that you refer to the difference decreasing, rather than disappearing altogether, but the magnitude of this difference is not reported in the manuscript, and there does remain a significant difference in the amplitude.

      We thank the reviewer for this helpful feedback. We have revised the figure legends to tone down overly strong statements and ensure that all descriptions are in correspondence with the quantitative results. For clarity, we have also added significance markers for (trend-level) post hoc comparisons (CS+/CS- differentiation within early and late blocks for each phase) to revised Figures 2 and 3 displaying SCRs and PSRs.

      Figure 2, 3, 4: I found it quite confusing to have uncorrected and corrected results displayed in the same way in the same figure. E.g. Figure 2A which, as far as I can tell shows trend-level results for the cerebellum, and corrected results for the VTA. For Figures 2 and 3 it was also not immediately clear which colour bar related to which map. Figure 4A appeared to be missing colour bars. I suggest the authors consider (as much as possible) standardising the colour bar scales, such that the maps across figures/sub-plots are more directly comparable, and differentiate more clearly between corrected and uncorrected results. The 3D renders in Figures 2 and 3 are a little hard to see - would it be possible to make it not so transparent?

      We use “trend-level” to refer to uncorrected voxelwise t-maps at p < 0.05, and “significant” to refer to TFCE/FWE-corrected effects at p < 0.05. This distinction was not sufficiently clear in the original figures. In the revised figures, uncorrected t-maps are displayed with a grey striped background frame. Colorbar scales were not standardized, as different panels display different statistical quantities (TFCE values versus t-values), and scaling was chosen to visualize variation within each contrast rather than enforce comparability across panels, which would have reduced interpretability. In addition, the missing colorbar in Figure 8A (formerly Figure 4A) has now been added; it matches the colorbar shown in Figure 8B. See also Reviewer 1, public review response, point (3) Clarity and Visualization of Results.

      Is it possible to annotate significant effects on Figure 1 and Supplement Figure 1? The use of square markers makes it quite hard to tell the value of each point, which, given the small scale of the y-axis is quite important for interpretation. Could the authors consider remaking these plots with smaller dots?

      We have added post hoc significance markers to Figures 2 and 3 displaying SCRs and PSRs to facilitate interpretation. These markers reflect post hoc comparisons of CS+/CS- differentiation within early and late blocks. In cases where the Stimulus x Time interaction was not significant, the corresponding post hoc markers are still shown but are indicated in red to denote their trend-level status. In addition, the plots have been remade with smaller dots to make individual values clearer.

      Discussion

      The authors state "Because aversive stimulus presentation results in pronounced cerebellar activations, we were unable to separate cerebellar activation related to the unexpected (initial acquisition trials) and the expected (late acquisition trials) presentation of the US." Could the authors compare between early[CS+>CS-] and late[CS+>CS-] acquisition (which I believe were created in the event-based analysis but results not reported), or between the first 3[CS+ with US>CS-] and later [CS+ with US>CS-] to assess this?

      In our terminology, the suggested comparisons (early vs. late [CS+ > CS-] or first three vs. last three [CS+ > CS-]) reflect changes in US prediction rather than prediction error. The statement in the Discussion refers specifically to cerebellar activation during US presentation, where distinguishing between expected and unexpected presentations is complicated by the strong cerebellar activation elicited by the electrical US itself. Moreover, when comparing early “unexpected” US presentations with later “expected” ones, the relatively higher activity in early trials could reflect habituation of the US sensation (i.e., non-associative learning) rather than a prediction error, making interpretation difficult.

      Because the current manuscript focuses on reward-like prediction errors, we did not report these US prediction or presentation contrasts in detail. In brief, the suggested comparisons of early versus late CS-related differentiation (CS+ > CS-), revealed only limited trend-level activity. In contrast, US-related responses during acquisition showed robust activations in the cerebellar cortex, DCN, and VTA across the acquisition phase. Comparisons between the first three US presentations and later US presentations showed broadly distributed and stronger responses during early acquisition than during later US presentations. This pattern seems to be more consistent with non-associative effects, such as sensory habituation to the electrical stimulation, rather than with prediction-error–related processing. We have therefore not included them in the manuscript, but would be open to providing them in the Supplementary Information if the editor or reviewers consider them essential.

      General

      In your pre-registered analysis plan you state "we will explore the use of DCM in a larger network that encompasses known constituents of the fear extinction network, in addition to the cerebellum and VTA.". You have plenty of results to discuss in the current manuscript and adding this may complicate the narrative, but that being said, please either perform and include this analysis as you proposed or explicitly mention why this was not completed. You could also consider adding a whole-brain activation map for the key phases of the experiment. Please also double-check other pre-registered points, for example - the sample size justification is also different.

      We decided not to include whole-brain DCM analyses in this manuscript and not to report whole-brain activation results extensively, as the study was primarily hypothesis-driven with a focus on cerebellum-VTA interactions. While we recognize that whole-brain analyses are of interest and plan to explore them in future work, they were considered outside the scope of the current paper. This deviation from the preregistration is now explicitly noted in the revised manuscript.

      Regarding the sample size justification, the preregistration contained an error: the parameters were reported incorrectly. The correct sample size justification was already provided in the original 2019 grant application and is correctly reported in the current manuscript. The underlying power analysis was the same, but with different alpha levels depending on whether the study involved healthy participants (where larger samples are feasible) or rare patient populations (where stricter alpha levels are not practical). We have clarified this point in the manuscript under deviations from the preregistration.

      Additional changes made in manuscript by authors

      To provide a complete overview, we also note changes made independently of specific reviewer comments:

      Methods

      In the computational modeling section, “reextinction” was mistakenly mentioned where “reacquisition phase” was intended (the initial phase of the volatile phase before experience replay). This has been corrected.

      The term “trial sequence” is used in computational modeling, whereas counterbalancing in the fear conditioning methods used different terminology. We added a clarifying sentence in the modeling section to make this consistent.

      References in the pupil size analysis section (Jentsch et al. 2020; Mathôt et al. 2017) were misplaced and have now been moved earlier in the sentence.

      The citation for MRIcroGL software was updated to the current Nature Methods reference.

      We added a reference to Doubliez et al. 2025 which used the same three-day paradigm in a behavioral study showing similar physiological responses.

      Supplementary information

      During revision, we noted that the SCR statistics had been computed on an earlier preprocessed dataset version, whereas the finalized corrected dataset was already used for plotting and for estimating prediction and prediction-error values in the reinforcement-learning model. We therefore recomputed the SCR statistics on the finalized dataset for the sake of consistency; this did not change any main effects, interactions, or conclusions, with the only difference being an exploratory late-acquisition CS+/CS- post hoc shifting from non-significant to p < 0.05 (interaction still non-significant). Updated statistics are reported in the Supplementary information.

      Post hoc significant differences in Table S3 are now marked in bold, as the formatting was missing previously.

      To align behavioral analyses more closely with the event-based fMRI approach, we additionally examined physiological responses using a first three versus last three trial division within each phase. These analyses yielded patterns consistent with those obtained using the original early/late block division and are reported in the Supplementary Information.

      We added a new supplementary figure (Figure S4) showing the location of the cerebellar VOI on a SUIT flatmap and added a corresponding cross-reference in the Methods section (Volumes of interest (VOI) definition)

      References

      Bouton, M. E. (2002). Context, ambiguity, and unlearning: sources of relapse after behavioral extinction. Biological Psychiatry, 52(10), 976–986. https://doi.org/10.1016/S0006-3223(02)01546-9

      Bouton, M. E. (2004). Context and Behavioral Processes in Extinction: Table 1. Learning & Memory, 11(5), 485–494. https://doi.org/10.1101/lm.78804

      Constantinou, E., Purves, K. L., McGregor, T., Lester, K. J., Barry, T. J., Treanor, M., Craske, M. G., & Eley, T. C. (2021). Measuring fear: Association among different measures of fear learning. Journal of Behavior Therapy and Experimental Psychiatry, 70(September 2020), 101618. https://doi.org/10.1016/j.jbtep.2020.101618

      Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T., & Vervliet, B. (2014). Maximizing exposure therapy: An inhibitory learning approach. Behaviour Research and Therapy, 58, 10–23. https://doi.org/10.1016/j.brat.2014.04.006

      Doubliez, A., Köster, K., Müntefering, L., Nio, E., Diekmann, N., Thieme, A., Albayrak, B., Nicksirat, S. A., Erdlenbruch, F., Batsikadze, G., Ernst, T. M., Cheng, S., Merz, C. J., & Timmann, D. (2025). Dopaminergic drugs modulate fear extinction-related processes in humans, but effects are mild. Brain Communications, 7(5), fcaf333. https://doi.org/10.1093/braincomms/fcaf333

      Ernst, T. M., Brol, A. E., Gratz, M., Ritter, C., Bingel, U., Schlamann, M., Maderwald, S., Quick, H. H., Merz, C. J., & Timmann, D. (2019). The cerebellum is involved in processing of predictions and prediction errors in a fear conditioning paradigm. ELife, 8, e46831. https://doi.org/10.7554/eLife.46831

      Hermans, D., Craske, M. G., Mineka, S., & Lovibond, P. F. (2006). Extinction in Human Fear Conditioning. Biological Psychiatry, 60(4), 361–368. https://doi.org/10.1016/j.biopsych.2005.10.006

      Kalisch, R., Gerlicher, A. M. V., & Duvarci, S. (2019). A Dopaminergic Basis for Fear Extinction. Trends in Cognitive Sciences, 23(4), 274–277. https://doi.org/10.1016/j.tics.2019.01.013

      Lipp, O. V., Oughton, N., & LeLievre, J. (2003). Evaluative learning in human Pavlovian conditioning: Extinct, but still there? Learning and Motivation, 34(3), 219–239. https://doi.org/10.1016/S0023-9690(03)00011-0

      Lonsdorf, T. B., Menz, M. M., Andreatta, M., Fullana, M. A., Golkar, A., Haaker, J., Heitland, I., Hermann, A., Kuhn, M., Kruse, O., Meir Drexler, S., Meulders, A., Nees, F., Pittig, A., Richter, J., Römer, S., Shiban, Y., Schmitz, A., Straube, B., … Merz, C. J. (2017). Don’t fear ‘fear conditioning’: Methodological considerations for the design and analysis of studies on human fear acquisition, extinction, and return of fear. Neuroscience and Biobehavioral Reviews, 77, 247–285. https://doi.org/10.1016/j.neubiorev.2017.02.026

      Milad, M. R., & Quirk, G. J. (2012). Fear Extinction as a Model for Translational Neuroscience: Ten Years of Progress. Annual Review of Psychology, 63(1), 129–151. https://doi.org/10.1146/annurev.psych.121208.131631

      Salinas-Hernández, X. I., Vogel, P., Betz, S., Kalisch, R., Sigurdsson, T., & Duvarci, S. (2018). Dopamine neurons drive fear extinction learning by signaling the omission of expected aversive outcomes. ELife, 7, e38818. https://doi.org/10.7554/eLife.38818

      Salinas-Hernández, X. I., Zafiri, D., Sigurdsson, T., & Duvarci, S. (2023). Functional architecture of dopamine neurons driving fear extinction learning. Neuron, 111(23), 3854-3870.e5. https://doi.org/10.1016/j.neuron.2023.08.025

      Vervliet, B., Craske, M. G., & Hermans, D. (2013). Fear extinction and relapse: State of the art. Annual Review of Clinical Psychology, 9(March 2013), 215–248. https://doi.org/10.1146/annurev-clinpsy-050212-185542

    1. Author Response:

      Reviewer #1 (Public review):

      Summary and Strengths:

      Shin et al deepen our understanding of high-frequency oscillations in the frontal cortex during REM in a manner that sheds important light on the roles of these events. In particular, they reveal that cortical HFOs are modulated by theta oscillations, occur in chains and recruit cortical neuronal activation patterns in a manner that is distinct from other high-frequency events during non-REM or in the hippocampus. They also show that these events occur during increased oscillatory cross-talk between hippocampus and cortex and may protect cortical neurons from downregulation of firing during sleep. Overall, this is important work with several novel observations pointing towards an important role for these events that will become increasingly understood over time.

      I also wanted to comment that 2D is a beautiful illustration of separate and essentially exclusive communication channels used during HF events in NREM vs REM. They almost perfectly complement each other's frequencies.

      We thank the Reviewer for the positive comments and for highlighting the importance of our work, especially the distinct communication patterns during NREM and REM cortical high-frequency events.

      Weaknesses:

      I have only one major scientific critique: I believe we need to see quantification of how phasic REM theta waves with versus without HFOs differ. What do REM HFOs add to the "normal" theta oscillation? Without this comparison, it is more difficult to interpret the meaning of these events. Given that HFO chains have IEIs around the time of a theta cycle duration, are the repeating spiking activities stronger during HFO repeats than during adjacent theta waves without HFOs?

      We agree with the Reviewer that differences in activity during HFOs versus theta in the absence of HFOs is an important comparison to make to determine whether activity during HFOs reflect a unique state of information processing during REM sleep, or is redundant with theta oscillation signatures. We attempt to clarify this point in Figure S4I where we examined PFC population activity during theta periods outside of HFOs. Here, we extracted REM theta periods at least 250 ms away from detected HFOs and split the theta cycles into quartiles based on the theta power at the preferred theta phase bin determined by theta-coupled-HFOs (during that specific sleep session). We expect that using the preferred phase of HFOs is the most accurate choice for this comparison (compared to random phases). Lastly, we aligned PFC population activity to these theta phases and found that even in the highest theta power quartile, theta modulated fluctuations in PFC population activity were absent without HFOs. This indicates that theta-associated HFOs are the primary driver or signature of the observed population activity patterns (Figures 1H, 3F, S4I). An explanation of this procedure can be found in the Methods section under “Control for periods of high theta power”.

      Regarding the comment “what REM HFOs add to the "normal" theta oscillation”, we hypothesize that generation of HFOs and associated population activity is the result of theta-mediated input from other brain regions that converge on PFC. It is possible that CA1 is a candidate region, since we observed that theta frequency activity in CA1 leads PFC (Figure 4K, Phase slope index result). Additionally, the high concentration of acetylcholine and the high inhibitory tone in REM sleep is conducive to local suppression in response to external drive, as shown in the model and noted in the Discussion. Thus, we propose that HFOs delineate transient windows where sparse populations of PFC neurons are activated in the backdrop of overall suppression, potentially to link specific ensembles across PFC and other brain areas such as the hippocampus – a phenomenon that differs from baseline theta activity in REM.

      To address this point, we will provide additional analyses investigating PFC activity profiles during theta periods adjacent to HFOs. We will also reorganize the results and figures to highlight these important control analyses.

      What percentage of theta waves contain HFOs, and what is the firing rate during those theta waves with vs without HFOs? Is there differential firing rate modulation? The authors may even consider that all REM-HFO-specific quantifications should be shown as differential from phasic theta cycles without HFOs.

      To address these points, we will perform the requested analyses and explicitly quantify firing rate differences during HFO and non-HFO theta periods for further clarification.

      As a non-scientific comment on the manuscript itself: unfortunately, the paper is difficult to read and understand at times, requiring great effort by the reader. This is to an extent that communication is hindered. The paper is dense with changing methods, often from panel to panel. Unfortunately, the panel quantifications are not explained in the results section in a manner that readers can understand without going to read the methods, often for each individual panel. These measures should be explained in a way that lets readers understand the conclusions of each panel and what gross calculations were used to reach those. Instead, too much jargon is used rather than clear descriptions of the overall calculations being done for each panel.

      The point is well-taken and we apologize for the dense text and lack of methodological detail in the results section. We agree with the Reviewer that enhancing clarity and adding additional details about the quantitative methods within the main text and figure panels/legends would improve readability and make the manuscript more accessible for a wider audience.

      To address this point, we will include important details in the results section and legends to clarify the methods and calculations used. We will also reorganize the manuscript text and reorder some figure panels for readability, and update the Methods section to parallel the Results/Figure order to the extent possible.

      The authors mention in the discussion section that they see increased functional connectivity between mPFC and CA1, but most data suggesting this seems to be based on LFP rather than spiking. Functional connectivity is best defined by spiking-spiking relationships. And these authors have spiking data. So I believe either the descriptive language should be pulled back to something like "oscillatory coupling" or more analyses should be dedicated to showing spike-spike coordination across regions.

      To address this point, we will temper the claims of functional connectivity and replace all instances with “oscillatory coupling”.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate high-frequency oscillations (HFOs) in the prefrontal cortex during REM sleep. They identify a specific pattern where these HFOs occur in "chains" that are phase-locked to theta oscillations, primarily during the "phasic" periods of REM. The study contrasts these events with isolated HFOs and NREM ripples, suggesting a unique role for these chains in coordinating activity between the prefrontal cortex and the hippocampus. Most notably, the authors report that a specific subset of hippocampal cells-those that co-fire with the prefrontal cortex during these HFOs-increase their firing rates over the course of sleep, suggesting a potential mechanism for selective memory consolidation.

      Strengths:

      The study addresses an under-explored area of sleep physiology: the fine-grained temporal coordination between the cortex and hippocampus during REM sleep. The identification of HFO "chains" and their association with higher theta power provides an interesting framework for understanding how the brain might organize information transfer outside of NREM sleep. The observation that specific hippocampal populations show differential firing rate changes based on their participation in these HFO events is a striking finding that warrants further investigation.

      We thank the Reviewer for finding our work interesting and for the positive comments regarding our manuscript.

      Weaknesses:

      The primary weakness of the study lies in the lack of a clear distinction between global brain states and the specific events being analyzed. Because the authors compare HFOs across different sleep stages (NREM, tonic REM, and phasic REM) without sufficient controls, it is difficult to determine if the observed differences are intrinsic to the HFOs themselves or simply a reflection of the different physiological states in which they occur.

      We appreciate this concern. We do agree that the generation of these ripples/HFOs in NREM and REM sleep are inextricably linked to global brain state (ex. cholinergic tone, as shown in the model), which results in differing patterns of activity across sleep states. However, we also show that activity associated with ripples and HFOs in NREM and REM sleep, respectively, delineate unique periods that underlie intra- and interregional interactions that differ from activity associated with other phenomena, such as spindles or baseline theta periods, in each respective sleep state. Regarding NREM PFC ripples, in our previous publication (Shin and Jadhav 2024), we show that PFC ripples are strongly associated with spindles and slow oscillations, but when PFC activity was assessed by aligning to each of these events separately, we observed significant differences in activity profiles (Shin and Jadhav 2024), indicating that NREM PFC ripples are indeed periods of differential PFC activity during which local reactivation is particularly strong. Similarly, here, in REM sleep, we see that PFC HFOs are strongly coupled with gamma oscillations and that these two frequency bands separately engage PFC neurons (Figures 2C, S3J, differences in phase locking preference of PFC neurons to gamma and HFO). While we observed strong theta modulated neuronal population activity in response to HFOs (Figure 1H), we did not observe the same for gamma events that were uncoupled from HFOs (Figure S3L, right). However, we did observe the population activity suppression when examining gamma events that were coupled with HFOs, but the theta modulated activity was largely absent (Figure S3L, left), indicating that, in terms of higher frequency oscillations, precise alignment to HFOs drives the theta modulated activity. Furthermore, we provide a control for baseline theta periods outside of HFOs to demonstrate that the phasic, theta-modulated activity (Figures 1H, 3F) is due to association with HFOs, and not a common feature during baseline theta activity (Figure S4I). Together, these results demonstrate that the theta modulated, phasic PFC activity that we report is primarily associated with the presence of HFOs.

      To address this point, we will provide a more detailed explanation for the theta controls that we performed, and conduct additional analyses to control for different baseline periods during REM sleep, similar to the response to Reviewer 1’s first comment.

      Furthermore, the evidence for "structured reactivation" is not yet convincing. The temporal alignment of these reactivation events appears inconsistent, with peaks occurring well before the HFO itself, and the analysis does not sufficiently control for pre-existing cellular assembly strengths.

      We thank the Reviewer for raising these important points. Regarding the temporal alignment of assemblies during REM HFOs, since gamma activity is linked to and precedes HFO activity in REM (Figure S3F,G), we posit that assembly activation preceding HFO alignment may be gamma frequency driven. Indeed, we do observe gamma-associated peaks in PFC population activity temporally adjacent to the start of HFO chains in REM (Figure S5F), which we propose is driving the assembly activation.

      Related to our response to Reviewer 1, the hypothesis that we have regarding this finding is that theta-mediated input to PFC, possibly from several brain areas including the hippocampus, converges and elicits cross-frequency activity spanning gamma and HFO bands. We hypothesize that these gamma and HFO oscillations work in concert to evoke the structured reactivation.

      Furthermore, as the Reviewer accurately points out, we are not able to determine whether the assembly patterns active during the REM HFOs pre-existed prior to their assessment during sleep. Since there was not enough REM sleep during the earlier sleep epochs, we were not able to investigate assembly activation patterns during REM in the first pre-task sleep session prior to W-Track exposure.

      To address these points, we will provide additional support for our claims, add clarification to major points, and expand on the methods used to assess structured reactivation. We will also analyze the spatial rate maps of assemblies during behavior on the W-Track and attempt to link these representations to assembly activity during REM HFOs. If sufficient controls cannot be provided, we will temper the claims of “reactivation” and replace all mentions with assembly “activation”.

      Additionally, some of the sleep architecture presented appears atypical, such as very short REM bouts and direct NREM-to-REM transitions that bypass standard progression, raising questions about the consistency of the sleep detection across animals.

      The reviewer is presumably referring to the hypnograms in Figure S1H. In Figure S1H, we presented concatenated hypnograms across all 9 sleep sessions, regardless of whether they were included for analysis. Furthermore, these hypnograms illustrate the output of just the sleep scoring algorithm and do not take into account the secondary, manual inspection that is performed to confirm sleep epoch inclusion. Individual epoch sleep state plots (e.g. Figure S1B) were visually inspected to confirm robust increases in theta-to-delta ratio detected in the absence of movement – epochs where microarousals or persistent subthreshold fluctuations in animal movement induced noisy TD ratio increases, and thus inaccurate REM designation, were excluded. We also want to note that omitting the edge cases, which is a minor part of the REM sleep data, does not change any results.

      Another consideration is that these animals were running a strenuous learning task that required repeated traversal of multiple maze arms over multiple behavioral session, which likely increased sleep pressure and thus may have altered sleep state dynamics in a subset of animals (Leemburg et al. 2010; Yang et al. 2012).

      To address these points, we will provide updated hypnograms that explicitly highlight the epochs used in analysis to resolve ambiguities. We will also further demonstrate that our procedure for sleep state designation is accurate and consistent across animals with supporting materials, including additional sleep stage classification examples, and REM-specific sleep examples marking tonic and phasic REM.

      Finally, the study does not account for potential confounds like baseline firing rates when interpreting the behavior of "high-cofiring" neurons, which may simply be the most active cells in the population.

      When we compared low and high cofiring neurons in CA1, we did indeed compare baseline firing rates between the two groups and found no differences. We compared both mean firing rates across entire sleep sessions as well as mean firing rates restricted to REM sleep (Figure S7A). We apologize that this important control was not emphasized more clearly.

      To address this point, we will explicitly reference this figure in the main text as a standalone point.

      Reviewer #3 (Public review):

      Summary:

      Shin et al. examine hippocampal-prefrontal interactions during sleep using simultaneous CA1 and prefrontal cortex recordings in rats performing a spatial memory task. They identify high-frequency oscillation (HFO) events in PFC during REM sleep that occur in theta-modulated chains and are associated with increased CA1-PFC coherence and sequential, sparse reactivation of cortical ensembles. This pattern contrasts with the synchronous reactivation observed during NREM cortical ripples. Together with a simple cholinergic network model, the authors propose that REM HFO chains represent a distinct mechanism for hippocampal-cortical coordination that complements NREM ripple-mediated processing during sleep.

      Strengths:

      A major strength of the work is the extensive electrophysiological dataset, which includes simultaneous recordings of large neuronal populations in both hippocampus and prefrontal cortex across behaviour and subsequent sleep. The analyses linking high-frequency events to population dynamics, interregional coherence, and ensemble reactivation are technically sophisticated and provide an incredibly detailed description of REM-associated cortical activity patterns. In particular, the demonstration that REM HFOs occur in chains aligned to theta phase and organise sequential activation of cortical assemblies represents a potentially important advance in understanding the neural structure of REM sleep activity. The integration of experimental data with a computational model further provides a useful framework for interpreting the observed differences between REM and NREM network states in terms of neuromodulatory influences.

      We thank the Reviewer for finding our work important and for the positive comments regarding the manuscript.

      Weaknesses:

      While overall this study provides a highly valuable body of work, there are two primary limitations, which, if overcome, would provide substantially more significance to the overall characterisation of REM HFOs. Specifically:

      (1) Distinction from wake HFOs

      The results largely support the authors' claim that REM HFO chains represent a distinct pattern of neural coordination compared to NREM cortical ripples. The analyses consistently show differences between REM and NREM events in terms of neuronal modulation, ensemble structure, and interregional coupling. However, similar high-frequency events during wake are not examined. Since REM sleep shares several network features with wakefulness, including strong theta oscillations, evaluating whether comparable PFC HFOs occur during wake would provide clarity on whether these events are specific to REM sleep (and its associated functions) or represent a more general theta-associated phenomenon.

      We thank the Reviewer for this suggestion. Indeed, this is an important comparison to make, since electrophysiological patterns of activity are similar across wake and REM sleep states.

      To address this point, we will detect and analyze HFOs during running behavior on the W-Track to determine if they elicit similar, phasic population responses in PFC.

      (2) Link to memory consolidation

      The manuscript proposes throughout that REM HFO chains may contribute to memory consolidation by coordinating hippocampal-cortical reactivation, but the evidence for this functional role remains indirect. The authors do highlight this as a limitation of the study - the inability to link their findings to learning - but it is not clear why. Further details of the behaviour results should be included. If no learning occurred across the eight behavioural sessions, this should be reported. If learning did occur, but could not be linked to HFO events, this should also be reported.

      This point is well-taken and we will reduce emphasis on memory consolidation in the manuscript. We do want to note that the primary focus here was to investigate new cortical-hippocampal activity patterns during sleep states that are established to be important for memory consolidation, in this case, REM sleep. Indeed, several major discoveries of reactivation and cortical-hippocampal physiological patterns in rodent sleep and wake states thought to be important for memory consolidation were initially reported without a link to memory consolidation, e.g., NREM hippocampal reactivation and replay (Wilson and McNaughton 1994; Lee and Wilson 2002), cortical – hippocampal activity coordination in slow-wave sleep (Siapas and Wilson 1998; Ji and Wilson 2007), waking replay in hippocampus (Foster and Wilson 2006; Karlsson and Frank 2009), etc. As Reviewer 1 noted, we expect that an important role for these novel events reported here will become increasingly understood over time.

      The connection between learning and REM HFO activity is a line of investigation that we find very interesting. However, due to the experimental design and the rapid pace at which the animals learn this task (Shin, Tang, and Jadhav 2019), we were not able to robustly relate REM HFO activity to learning. Firstly, with our threshold criteria for REM sleep detection (>10 s) as well as a total REM sleep duration criterion for sessions, most of the sleep epochs included for analysis came from the later sessions when REM sleep was more abundant (Figure SF,G). Consequently, many of the sleep sessions following the earlier behavioral/learning sessions were excluded. Making a claim about the contribution of REM HFOs to the learning process requires the inclusion of REM sleep periods after each behavior session to examine incremental changes in response to learning. Furthermore, a comparison of these REM sleep periods to pre-task REM sleep (pre-task sleep session #1 prior to task exposure) is important to demonstrate that any changes are dependent on experience. However, we were unable to make this comparison due to lack of REM sleep in pre-task sleep session #1. It is likely that an investigation of the role of these novel events in memory consolidation may require rodent task designs that are known to require REM sleep, such as inference tasks (Abdou et al. 2024; Ellenbogen et al. 2007), motor learning (Nitsche et al. 2010), or emotional memory (van der Helm and Walker 2011; Cairney et al. 2015).

      To address this point, we will reinforce this as a limitation of our study, reduce emphasis on memory consolidation, and further clarify that we were not able to link REM HFO activity to learning. We will also include additional details about the behavioral results.

      References

      Abdou, K., M. Nomoto, M. H. Aly, A. Z. Ibrahim, K. Choko, R. Okubo-Suzuki, S. I. Muramatsu, and K. Inokuchi. 2024. 'Prefrontal coding of learned and inferred knowledge during REM and NREM sleep', Nat Commun, 15: 4566.

      Cairney, S. A., S. J. Durrant, R. Power, and P. A. Lewis. 2015. 'Complementary roles of slow-wave sleep and rapid eye movement sleep in emotional memory consolidation', Cereb Cortex, 25: 1565–75.

      Ellenbogen, J. M., P. T. Hu, J. D. Payne, D. Titone, and M. P. Walker. 2007. 'Human relational memory requires time and sleep', Proc Natl Acad Sci U S A, 104: 7723–8.

      Foster, D. J., and M. A. Wilson. 2006. 'Reverse replay of behavioural sequences in hippocampal place cells during the awake state', Nature, 440: 680–3.

      Ji, D., and M. A. Wilson. 2007. 'Coordinated memory replay in the visual cortex and hippocampus during sleep', Nat Neurosci, 10: 100–7.

      Karlsson, M. P., and L. M. Frank. 2009. 'Awake replay of remote experiences in the hippocampus', Nat Neurosci, 12: 913–8.

      Lee, A. K., and M. A. Wilson. 2002. 'Memory of sequential experience in the hippocampus during slow wave sleep', Neuron, 36: 1183–94.

      Leemburg, S., V. V. Vyazovskiy, U. Olcese, C. L. Bassetti, G. Tononi, and C. Cirelli. 2010. 'Sleep homeostasis in the rat is preserved during chronic sleep restriction', Proc Natl Acad Sci U S A, 107: 15939–44.

      Nitsche, M. A., M. Jakoubkova, N. Thirugnanasambandam, L. Schmalfuss, S. Hullemann, K. Sonka, W. Paulus, C. Trenkwalder, and S. Happe. 2010. 'Contribution of the premotor cortex to consolidation of motor sequence learning in humans during sleep', J Neurophysiol, 104: 2603–14.

      Shin, J. D., and S. P. Jadhav. 2024. 'Prefrontal cortical ripples mediate top-down suppression of hippocampal reactivation during sleep memory consolidation', Curr Biol, 34: 2801–11 e9.

      Shin, J. D., W. Tang, and S. P. Jadhav. 2019. 'Dynamics of Awake Hippocampal-Prefrontal Replay for Spatial Learning and Memory-Guided Decision Making', Neuron, 104: 1110–25 e7.

      Siapas, A. G., and M. A. Wilson. 1998. 'Coordinated interactions between hippocampal ripples and cortical spindles during slow-wave sleep', Neuron, 21: 1123–8.

      van der Helm, E., and M. P. Walker. 2011. 'Sleep and Emotional Memory Processing', Sleep Med Clin, 6: 31–43.

      Wilson, M. A., and B. L. McNaughton. 1994. 'Reactivation of hippocampal ensemble memories during sleep', Science, 265: 676–9.

      Yang, S. R., H. Sun, Z. L. Huang, M. H. Yao, and W. M. Qu. 2012. 'Repeated sleep restriction in adolescent rats altered sleep patterns and impaired spatial learning/memory ability', Sleep, 35: 849–59.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Here, the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep, and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation, possibly by altering the usage of Aspartate codon.

      Comments on revised version:

      The authors satisfactorily addressed my comments, even though the precise mechanism by which Mettl5 regulates translation of clock genes remains to be firmly demonstrated.

      Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined a potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. A major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. In this revision, the authors have added a more thorough analysis of clock gene expression and show that PER protein levels are increased relative to wild type animals a specific times of day, indicating increased stability of the protein. Given that PER inhibits its own transcription, the per RNA is low in the mutants. Efforts toward a more detailed understanding of how clock gene expression was altered in the mutants, as well as other clarification of sleep phenotypes throughout is appreciated. As noted above, a strength of this work is its relevance to a human developmental disorder as well as the transcriptomic and ribosomal profiling of the mutant. However, there still remain some minor weaknesses in the manuscript. This reviewer is not in agreement with the interpretation of the epigenetic experiments. Specifically, co-expression of Clk[jrk] or per [01] with the mettl5 mutant recovered the nighttime sleep phenotype, but was additive to the daytime sleep phenotype such that double mutants showed higher sleep. This effect should be acknowledged and discussed. Overall, this is an interesting paper that indicates a molecular link between mettl5 and the circadian clock in regulation of sleep.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors misunderstood my original comment for Fig 1A. Please provide an explanation for the significance of the boxed region. There is little or no detail in the legend to help guide the reader.

      The information has been added to the figure legends for Figure 1A.

      Efforts toward improving analysis of circadian genes as well as sleep phenotypes (sleep onset time, rebound, etc) is much appreciated, thank you. However, Figure S1H and G panel labels are mixed up; please label in the order that they appear and that they correspond to the main text. Why is Figure S1H labeled "ZT 14"?

      Sleep latency is defined as the time from preparing to sleep to actually falling asleep. In this study, it specifically refers to the time taken for each individual fly to reach the sleep phenotype (i.e., 25 minutes of continuous sleep). We noted that this label was misleading, as the actual time to reach the sleep phenotype varied among individual flies. Therefore, in the revised figures, we have removed the ZT14 label. In addition, we have corrected the labeling of Figures S1G and S1H to ensure they appear in the correct order and correspond accurately to the descriptions in the main text.

      Unfortunately, based on Fig S1A-C, I am not convinced that mettl5 localizes to neurons, as there are no cells that show double labelling. This figure does not support the statement: "we found expression in both neurons (colocalizing with ELAV staining: Figure S1A-C) (lines 91-92), and "Mettl5-Gal4 is expressed in distinct neurons and glia that appear crucial for sleep regulation." (line 297). What "distinct" sleep related neurons were labeled? The staining in Fig S1A shows a different distribution from that in Fig S1D, and so it's possible this was a technical issue. Is there a better example?

      Thank you for your careful review and valuable comments. We agree that the colocalization of METTL5 with the neuronal marker ELAV is relatively sparse. However, as indicated by the arrows in Fig S1A–C, we did observe a few cells showing clear double labeling. These examples support the presence of METTL5 expression in neurons, albeit at a low frequency.

      In Figure 4G-H, please indicate the time of day of tissue collection.

      In Figure 4G-H, the tissue was collected at ZT0. We have now indicated this time point in the figure and legend to clarify the experimental timing.

      As noted in the public comment, I remain in disagreement with the assessment that "the double mutant showed the similar phenotype as downstream genes". The striking significant increase in daytime sleep in the double mutants remains unexplained. No further experiments are necessary, but this should be acknowledged in the text. Instead of an epistatic effect, given that overall sleep is high in the double mutants, another possible explanation is that the flies are sick and so are less active and sleeping more.

      Thank you for your suggestion. This has been acknowledged in the text. “Genetic epistasis experiments further supported this model, with clock gene mutants modified Mettl5 mutant phenotypes that suggesting both Clock and  Per downstream of Mettl5 (Figure 4I-N, Table 1). Secondary effect may exist for the significant increase in daytime sleep in the double mutants.”

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Fisher et al describes the molecular mechanism underlying how G beta gamma subunits engage with the beta 3 isoform of PLC. The paper used a combination of cryo EM, BRET assays, and biochemical assays of PLC beta activity. A key discovery is that G beta gamma is not sufficient to drive membrane binding by itself, and instead promotes G alpha activation. The work is important, but suffers slightly from some ambiguity in the actual interface that is present in their cryo EM model, as crosslinkers could stabilise a transient and non-native complex. This is somewhat abrogated by the careful mutational analysis, which shows that mutation of any of these three sites does somewhat block PLC beta G beta gamma activation. However, there could be some improvement in the presentation of this data, as well as possible mutant selection. Overall, this paper is a nice complement to the Falzone et al paper, showing the membrane-bound complex of PLCB3 on membranes, with this work building on this work, highlighting the importance this will have in our full understanding of PLC beta activation.

      Thank you for the positive feedback.

      Major concerns:

      My biggest concern is the potential that this interface is artefactual based on the crosslinking strategy utilised. Here are thoughts on how this could be better validated, presented in a more convincing way.

      (1) The authors' main claim is that there is a degree of plasticity of G beta gamma binding to the PLC beta 3 isoform, with three possible binding sites. The main complication of this is, of course, the possibility that the crosslinking stabilises a non-native complex, driven by a mutated cysteine.

      Because of this, any other additional details about this interface are going to be critical for the scientific audience to judge if this is accurate.

      What would greatly help Figure 1 is an evolutionary conservation analysis of the novel Gbg interface in PLC, to see how well this is conserved, and compare this to the conservation of the previously annotated sites. Conservation of these sites on both the G beta gamma and PLC side would help justify this as a native complex.

      This will also help orient the reader to the identity of the mutated residues assayed in Figure 3.

      We agree that crosslinking can result in the capture a non-physiologically relevant interface. However, we do not observe any crosslinking between Gbg and a PLCb3 variant that retains a cysteine in the disordered region of the X–Y linker, nor crosslinking between PLCb3 and any other cysteine present in the Gbg heterodimer. The evolutionary conservation analysis is a great suggestion and will included in the revision for both Gbg and PLCb.

      (2) The g beta gamma orientation is also different than what I have observed in previous g beta gamma effector structures. Is there any precedent for this as an effector interface? A supplemental figure comparing this structure to other g beta gamma interfaces from other enzymes, for example recent Tesmer structure with PI3K.

      Yes, this is not the more typically observed Gbg–effector interaction, which is mediated by the narrow face of the Gbgtoroid. We are not aware of other structures in which Gbg interacts with a binding partner in the same way. A supplemental figure comparing this Gbg–PLCb interaction to the Gbg–PI3K and Gbg–GRK2 structures will be included in the revision.

      (3) The mutational analysis in Figure 2D-G seems to give some strange results, and I have some question why certain residues were chosen rather than others. Mutation of the Gbg side will be more complicated, as of course that can affect any of the three surfaces. My main question is that, from the way Figure 2A is oriented, the main salt bridge in their novel interface to me looks like R199-D228, with K183 being in the wrong orientation to E226, and D167 being far from any charged residues. Why did the authors not make the corresponding R199 to D or E mutation?

      Thank you for pointing this out. We are in the process of testing the PLCb3 R199E mutant in our assays and will include the results in the revised manuscript.

      (4) To help the reader's interpretation of Figure 2A, I would recommend a supplemental figure showing the density for interfacial residues, as that also would increase confidence in the interface.

      Thank you for the suggestion, this will be included in the revised manuscript.

      Reviewer #2 (Public review):

      In this manuscript, the authors dissect how Gβγ potentiates PLCβ3 signaling in cells. Using engineered crosslinking to stabilize a Gβγ-PLCβ3 complex, single particle cryo-EM, and cell-based functional assays, they identify and map multiple putative Gβγ interaction surfaces on PLCβ3, including a previously unrecognized binding mode. Structure-guided mutagenesis supports the functional relevance of these interactions and suggests that Gβγ potentiation is not primarily mediated by PLCβ3 membrane recruitment, but instead enhances PLCβ3 activity after the lipase is already at the membrane.

      Previous reconstitution work on the membrane surface (Falzone & MacKinnon, 2023) proposed a recruitment/partitioning-centric model in which Gβγ increases PLCβ3 output largely by elevating its membrane surface concentration, whereas Gαq primarily increases catalytic turnover; under those reconstitution conditions, the two inputs can combine approximately multiplicatively. In receptor-driven cellular signaling, however, PLCβ3 is robustly recruited to the plasma membrane upon Gαq activation, which raises the question of whether Gβγ contributes mainly through additional recruitment or through a post-recruitment mechanism once PLCβ3 is already at the membrane.

      This manuscript helps address that gap by using membrane-anchored PLCβ3 and complementary cellular readouts to separate "getting PLCβ3 to the membrane" from "boosting activity once PLCβ3 is already there." Their results argue that, in cells, membrane recruitment is largely dominated by Gαq·GTP, while Gβγ can further potentiate PIP2 hydrolysis after membrane association, consistent with a modulatory role at the membrane rather than primary recruitment.

      Overall, the work provides a structural and mechanistic framework for Gβγ-PLCβ3 cooperation and helps clarify the basis of Gq pathway amplification. The manuscript is generally strong, but some issues need to be addressed.

      Thank you for the positive comments.

      Major comments:

      (1) BMOE/BM(PEG)2 crosslinking may enforce a non-native docking geometry, potentially compromising the physiological relevance and precision of the Gβγ-PLCβ3 interface as described. Although a >50% 1:1 crosslinked complex is formed and remains active, the solution maps show lower local resolution for Gβγ, consistent with a dynamic, potentially heterogeneous, interface. One interface is captured via a single engineered cysteine pair (PLCβ3 E60C-Gβ C271), which could potentially bias the pose. It would be helpful if the authors could provide additional orthogonal support (e.g., alternative crosslinked sites) and bolster the clarification of its uniqueness and relevance.

      We did attempt to isolate other crosslinked complexes. PLCb3-D892 self-crosslinked under all reaction conditions, while PLCb3-D892 XY<sub>Cys</sub> , which retains an endogenous cysteine within the X–Y linker (C516), did not result in any crosslinked product when incubated with Gbg. Only the PLCb3-D892 E60C crosslinked to Gbg as confirmed by SDS-PAGE and SEC. All experiments also used wild-type Gb which contains two solvent-exposed cysteines in the effector binding site (C204 and C271). The greatest number of particles correspond to crosslinking between Gb C271 and E60C in PLCb3-D892. Crosslinking between PLCb3-D892 E60C and other residues in Gbg is possible, but there are not sufficient particle numbers corresponding to these species for 2D classing and reconstruction. These observations, together with the high efficiency of crosslinking, are consistent with a stable and persistent interaction.

      (2) In the crosslinked structure, the authors report that GβD228 interacts with PLCβ3 R199 and K183. In Figure 2A, R199 appears closer to Gβ D228 than K183, yet only K183 is functionally tested. Testing R199 (e.g., R199E/R199A) would strengthen the structure-guided validation of this interface.

      We agree, and functional analysis of PLCb3 R199E will be included in the revision.

      (3) The mutagenesis strategy appears inconsistent across figures/assays, which makes it difficult to interpret phenotypes and directly link the functional data to the proposed interfaces. For example, in Figure 2E, we see R185L but R215E, while residue L40 is mutated to Gly in the IP accumulation assays but to Glu/Lys (L40E/K) in the BRET assays (Figures 3B/3D/3F). The authors should (i) clearly justify the rationale for each substitution (conservative vs charge-reversal, interface disruption, etc.) and (ii), where possible, test the same mutants across assays (or provide evidence that alternative substitutions yield consistent conclusions).

      The mutagenesis experiments were initially carried out independently in the Lambert and Lyon groups. As the study progressed, additional mutations were designed based on prior results. The L40G mutation is one such example. Given its modest impact on activity in the IP accumulation assay, the L40E and L40K mutants designed to maximally disrupt the interface in the BRET experiments. The revision will include the rationale behind different substitutions and discussion of any potential differences.

      Reviewer #3 (Public review):

      Summary:

      PLCβ3 is activated by both Gαq and Gβγ subunits. This paper follows previous solutions and cryoEM studies of PLCβ3 / Gβγ, trying to understand the molecular details of activation using cellular BRET assays and cryoEM.

      Strengths:

      The authors find evidence for multiple binding sites on PLCβ3 for Gβγ and suggest that Gβγ is not bone fide activator per se but enhances Gαq activation by positioning the catalytic site towards substrate, although this is not completely convincing. Although these sites may not naturally be operative, the authors might want to develop the potential role of these sites.

      The authors also find that this activation is not through recruitment of the enzyme to the membrane by Gβγ released upon G protein activation, in accord with other PLCβ enzymes, but not for PLCβ3, and again, the authors might want to develop this point further.

      Thank you for the suggestions.

      Weaknesses:

      (1) I'm confused as to why the authors feel that their mechanism is distinct from the two-state enzyme, the synergistic activation proposed by Ross in 2011, using a primarily thermodynamic argument. As written, the authors appear to be very reliant on structural and BRET studies that do not give the details that would disprove this interpretation. The main issue is that the author's mechanism does not fully explain how Gβγ activation occurs for PLCβ2 in reconstituted systems in the absence of Gαq subunits.

      The reconstitution experiments rely on nM-mM concentrations of purified proteins and liposomes that contain up to 30% PI (4,5)2. PLCb2 and PLCb3 show dose-dependent increases in activity with increasing concentrations of Gbg. PLCb enzymes that interact with the liposomes would encounter liposome-tethered Gbg subunits, which would in turn bind the lipase, tethering to the membrane and helping position the active site for catalysis. While there is not yet experimental evidence that Gbg binding can displace the Ha2’ helix, it could facilitate interfacial activation given the net negative charge of PI (4,5) P2. In addition, PLCb2 is fundamentally different from the other PLCb isoforms in its sensitivity to heterotrimeric G proteins. Given its decreased sensitivity to Ga<sub>q</sub> and increased basal activity, it is possible that autoinhibition by the proximal CTD is weaker. PLCb2 is also abundantly expressed in neutrophils, along with more Gi-coupled receptors. Thus, it is possible that Gbg directly activates PLCb2 in these cells, but future experiments are required to definitively answer this question.

      (2) In a recent study, McKinnon presents a model showing that Gαq and Gβγ activate PLCβ3 by two distinct pathways and that activation by Gβγ occurs through membrane recruitment. It is not surprising that the authors find that this is not true since the pelleting method used by McKinnon is subject to error. The authors should directly address the limitations of this previous work and the changes in proteoliposomes with sedimentation that alter partition coefficients. Although the inability of Gβγ to drive membrane binding is in accord with the quantitative studies of Scarlata, showing that the affinity of PLCβ3 to Gβγ is fairly weak as compared to the intrinsic membrane partition coefficient.

      Thank you for raising this point. The changes in composition, size, and structure when pelleting proteoliposomes may complicate data interpretation and will be discussed in the revision.

      (3) It was proposed many years ago that in signaling complexes Gαq - Gβγ may not have to fully dissociate when binding PLCβ, but rather shift their relative orientation when binding to PLCβ to allow activation. Is their model consistent with this? Is it possible that PLCβ3 keeps Gβγ from diffusing to enhance the rate of Gq / Gβγ re-association?

      The crosslinked complex is compatible with simultaneous binding of a Gbg –Gbg heterotrimer to the PLCb3 without disrupting the observed interface. It is possible that Gbg could interact with Gbg bound to the PH domain or the EF hands in the previously reported reconstruction. If so, the interaction would be mediated by the N-terminal helix of Gbg. Alternatively, the intrinsic GAP activity of PLCb3 may also prevent Gbg from diffusing to promote heterotrimer reassociation.

      (4) The authors find that Gβγ binds multiple sites, and it is clear that the PH domain site is the primary one in accord with previous work. Could these weaker sites be an artifact of the elevated concentrations used in cryoEM and BRET assays?

      Assuming the PH domain is the primary Gbg binding site, it is possible that the secondary EF hand site observed by Falzone and Mackinnon reflects high protein concentrations. However, it seems unlikely that we would reach these concentrations within cells. Our functional data is also consistent with the Gbg binding site in the EF hands playing a functional role in increasing PLCb activity.

      (5) Although their assays infer differences in binding affinities, it would strengthen the paper if the authors could estimate the association energies of these different binding sites. This estimation would also address the concern stated above.

      We appreciate this suggestion and will keep it in mind as we complete the revision.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study makes a fundamental contribution to our understanding of interocular suppression, particularly continuous flash suppression (CFS). Using neuroimaging data from two macaque monkeys, the study provides compelling evidence that CFS suppresses orientation responses in neurons within V1. These findings enrich the CFS literature by demonstrating that neural activity under CFS may prevent high-level visual and cognitive processing.

      Comments on revisions:

      The authors have addressed all my previous comments.

      Thanks for the very warm comments!

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to investigate the degree to which low-level stimulus features (i.e., grating orientation) are processed in V1 when stimuli are not consciously perceived under conditions of continuous flash suppression (CFS). The authors measured the activity of a population of V1 neurons at single neuron resolution in awake fixating monkeys while they viewed dichoptic stimuli that consisted of an oriented grating presented to one eye and a noise stimulus to the other eye. Under such conditions, the mask stimulus can prevent conscious perception of the grating stimulus. By measuring the activity of neurons (with Ca2+ imaging) that preferred one or the other eye, the authors tested the degree of orientation processing that occurs during CFS.

      Strengths:

      The greatest strength of this study is the spatial resolution of the measurement and the ability to quantify stimulus representations during CSF in populations of neurons preferring the eye stimulated by either the grating or the mask. There have been a number of prominent fMRI studies of CFS, but all of them have had the limitation of pooling responses across neurons preferring either eye, effectively measuring the summed response across ocular dominance columns. The ability to isolate separate populations offers an exciting opportunity to study the precise neural mechanisms that give rise to CFS, and potentially provide insights into nonconscious stimulus processing.

      Weaknesses:

      However, while this is an impressive experimental setup, the major weakness of this study is that the experiments don't advance any theoretical account of why CFS occurs or what CFS implies for conscious visual perception. There are two broad camps of thinking with regard to CFS. On the one hand, Watanabe et al., 2011 reported that V1 activity remained intact during

      CFS, implying that CFS interrupts stimulus processing downstream of V1. On the other hand, Yuval-Greenberg and Heeger (2013) showed that V1 activity is in fact reduced during CFS. By using a parametric experimental design, they measured the impact of the mask on the stimulus response as a function of contrast, and concluded that the mask reduces the gain of neural responses to the grating stimulus. They presented a theoretical model in which the mask effectively reduced the SNR of the grating, making it invisible in the same way that reducing contrast makes a stimulus invisible.

      In the first submission of the manuscript, the authors incorrectly described the Yuval-Greenberg & Heeger (2013) paper and Watanabe et al. (2011) papers, suggesting that they had observed the same or similar effects of CFS on V1 activity, when in fact they had described opposite results. Reviewer 1 also observed that the authors appeared to be confused in their reading of these highly relevant papers. In the revision, the authors have reworked this paragraph, now correctly describing these sets of opposing results. However, I still do not understand what the authors are trying to argue: "...these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses." I do not understand what is meant by "pure" in this case.

      This is clarified as: “Nevertheless, these studies contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility, which were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, that is, the difference of BOLD signals between binocular masking and stimulus alone conditions.” (line 63)

      Regardless, it is clear that the measurements in the present study strongly support the interpretation of Yuval-Greenberg & Heeger (i.e., that V1 activity is degraded by CFS, 'akin' to a loss in the contrast-to-noise ratio of neural activity). It would be appropriate for the authors to communicate this clearly.

      We agree and added the following sentence in the text: “These results support the conclusion of Yuval-Greenberg and Heeger (2013) that V1 activity is degraded by CFS, ‘akin’ to a loss in the contrast-to-noise ratio of neural activity” (line 122)

      I continue to be of the opinion that this study is lacking an adequate model of interocular interactions that might explain the Ca2+ imaging. The machine learning results are not terribly surprising - multivariate methods, such as SVMs, are more sensitive than univariate approaches. So it is plausible that an SVM can support decoding of the coarse orientation information, even when no tuning is evident in the univariate analyses. However, the link between this result and the underlying neurophysiology is opaque. The failure to model the neural data with an explicit model is a missed opportunity.

      We agree and put “An ocular-dominance-dependent gain control model” back to the text. Fig. 2D now shows the results of model fitting.

      (line 167)

      An ocular-dominance-dependent gain control model

      We developed an ocular dominance-dependent gain control model to account for the impact of CFS on V1 population orientation tuning. The model development followed two steps.

      Step I. Population orientation tuning functions before CFS

      The population orientation tuning functions due to monocular stimulation exhibited different amplitudes among OD groups (Fig. 2D, red curves), which could be simulated with Equation 1, an OD-weighted Gaussian basis function:

      where parameters A, σ, and B corresponded to the amplitude, standard deviation, and minimal response of the Gaussian basis function, respectively, and θ represented the preferred orientation of a bin of neurons relative to the actual orientation of the grating stimulus. The weight parameter w was the mean of linearly transformed ODIs of neurons in a neuronal group, which equated to (ODI +1)/2 or 1 - (ODI + 1)/2, depending on contralateral or ipsilateral eye grating stimulation, and ranged from 0-1. Thus, a smaller w would indicate a higher preference for the eye seeing the grating, and a larger w would indicate a higher preference for the unstimulated eye (or the eye seeing the flashing masker under CFS). The w equated to 0.33, 0.50, and 0.67 in Monkey A, and 0.32, 0.5, and 0.68 in Monkey B, for the grating eye-preferring group, binocular group, and the masker eye-preferring group, respectively. The exponent s represented a nonlinear transformation.

      Equation 1 fitted the baseline data well (Fig. 2D, red curves), resulting in goodness-of-fit (R<sup>2</sup>) values at 0.94 and 0.95 for the two monkeys, respectively. This indicated that the equation captured the OD-dependent population orientation tuning characteristics of V1 neurons with monocular stimulation before CFS.

      Step II. The impacts of CFS

      In step II, the model introduced several binocular combination factors to account for population orientation tuning functions under CFS.

      To account for the OD-dependent changes of orientation tuning bandwidths under CFS, a w-dependent inhibition factor wt was introduced, which scaled the σ of the tuning functions, changing the monocular tunings R into R’:

      This allowed different groups of neurons to exhibit various degrees of orientation tuning function broadening, capturing the pattern in which neurons preferring the eye seeing the grating displayed a sharper population orientation tuning curve under CFS than those preferring the eye seeing the masker.

      Previous studies have shown that binocular neuronal responses can be modeled by incorporating interocular suppression and summation processes (Kato et al., 1981; Dougherty, Cox, Westerberg, & Maier, 2019; Zhang et al., 2024). Therefore, R’ was further normalized by the neural response to the flashing masker to simulate interocular suppression, which was the first component of Equation 3. Additionally, the neural response to the flashing masker was summed to simulate binocular summation, which was the second component of Equation 3. These two components when summed, determining the final neural responses under CFS:

      where N was the empirical neural response to the monocularly presented flashing masker stimulation, a and b were scaling parameters, and k and m were nonlinearity parameters. The interocular normalization by masker response led to amplitude reduction of population orientation tuning functions for different groups of neurons, while the binocular summation with masker response elevated the minimal responses of tuning functions to their corresponding heights.

      During the step II model fitting, the parameters A, σ, and s were inherited from the monocular tuning fits derived from Equation 1 and served as inputs, while the parameters a, k, b, m, and t were optimized. The model captured the CFS modulation on population orientation tuning curves well, with R2 = 0.99 and 0.98 for Monkeys A and B, respectively (Fig. 2D, red curves).

      Reviewer #3 (Public review):

      Summary:

      In this study, Tang, Yu & colleagues investigate the impact of continuous flash suppression (CFS) on the responses of V1 neurons using 2-photon calcium imaging. The report that CFS substantially suppressed V1 orientation responses. This suppression happens in a graded fashion depending on the binocular preference of the neuron: neurons preferring the eye that was presented with the marker stimuli were most suppressed, while the neurons preferring the eye to which the grating stimuli were presented were least suppressed. Binocular neuron exhibited an intermediate level of suppression.

      Strengths:

      The imaging techniques are cutting-edge.

      Weaknesses:

      The strength of CFS suppression varies across animals, but the authors attribute this to comparable heterogeneity in the human psychophysics literature.

      Comments on revisions:

      The authors have addressed my comments from the previous round of review, and I have no further comments

      Thanks!

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNANA hybrid formation and DNA replication initiation' remains conjecture.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      We agree that orthogonally validation of origins detected by stranded SNS-seq is necessary and we are working on it.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      The size and overlap of peaks depend on the length of the SNS. In our study, the width of the peaks corresponds to the size of the short nascent strands (0.5–2.5 kb) chosen as the starting material, whereas the width of the peaks in Pratto et al., Cell, 2021 are much larger (few kb). This could be due to the longer SNS used in the Pratto et al. study. Consequently, the overlap of the longer SNS is more pronounced since the SNS fibres elongate in both directions: at the 3′ end by DNA polymerase and at the 5′ end by ligation of Okazaki fragments. Additionally, the genomic regions displayed in our Figure 2A and in Pratto et al, Figure 1B are presented at substantially different resolutions, with a roughly ten‑fold difference in scale.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      As the reviewer suggested, we have produced scaled plots of the stranded SNS-seq origins over genic and intergenic regions (see Figure 3, which is attached along with the Reviewer #2 (Recommendations for the authors)). However, we would prefer to keep the unscaled versions in the manuscript and add a note in the text as part of the Version of Record, explaining that the origins are evenly distributed throughout intergenic regions rather than being centred within them.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding potentially wider origins.

      We'll modify the text as part of the Version of Record.

      Are claims well substantiated?:

      The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.<br /> Impact:

      This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      - Two other attempts to identify origins in T. brucei - ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154) - were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      - We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      - MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Four broad issues need to be addressed.

      (1) The authors have attempted to test the overlap between ORC1/CDC6 (an ORC subunit) binding in the genome and SNS-seq. If there were an overlap, this would provide evidence that the SNS-seq signals represent origins. However, the analysis provided is inadequate: merely a statement that "we obtained an overlap of 4.2% between origins and ORC1/CDC6 binding sites within a window of {plus minus}2 kb and 6.2% in the window of {plus minus}3 kb". Nowhere are these data shown or properly discussed:

      a) The authors need to provide a diagram showing where in the genome the very small amount of overlapping SNS-seq and ORC1/CDC6 binding occurs, and to clearly show and state how many of the intergenic SNS-seq peaks are sites of ORC1/CDC6 binding. In the absence of such analysis, a key question is unanswered: is there any evidence of ORC1/CDC6 (or ORC more broadly) binding at the SNS-seq signals within the polycistronic transcription units?

      In the original version of the manuscript, these data were already presented as percentages in the text and as a metaplot (Supplementary Fig. 8C).

      We based our analysis on the set of 350 TbORC1/CDC6 binding sites available on TriTrypDB at the time of analysis. This dataset was a filtered subset of the originally reported TbORC1/CDC6 ChIP‑on‑chip peaks (personal communication, TriTrypDB). Since then, the unfiltered dataset has been made available. We therefore re‑analyzed the overlap using this dataset, to which we applied a filtering that yielded 990 binding sites closely matching the 953 sites reported by the McCulloch group. We need to stress here that the original 953 sites reported by the McCulloch group (Tiengwe et al., 2012 PMID: 22840408), is not available anymore and that the authors:

      - do not provide genomic coordinates for the 953 binding sites and

      - do not release any scripts or methodology that would allow independent reproduction of the 953 sites.

      A similar remark also applies to the MFA-seq data (see below).

      To address the reviewer’s request, we have now:

      (1) Recalculated the overlap using the updated TbORC1/CDC6 dataset (990 binding sites) from TriTrypDB.

      (2) Added the absolute number of overlapping SNS‑seq origins and TbORC1/CDC6 binding sites in the Results section for clarity.

      (3) Included the TbORC1/CDC6 binding sites in the chromosomal overview (newly added to Supplementary Fig. 8A), so that their genomic localization relative to SNS‑seq peaks is visually accessible.

      (4) Revised the metaplots of TbORC1/CDC6 distribution around SNS‑seq origins using the updated dataset (Supplementary Fig. 8C).

      With these improvements, we now find that:

      - Within ±2 kb, 12.9% (253) of SNS‑seq origins overlap with 25.6% of TbORC1/CDC6 binding sites.

      - Within ±3 kb, 18.8% (370) of SNS‑seq origins overlap with 37.4% of TbORC1/CDC6 binding sites.

      The updated metaplot shows a clear depletion of TbORC1/CDC6 signal at the origin center, with modest enrichment ~5 kb upstream and downstream. The underlying reason for this pattern remains unknown, and we agree that additional studies will be needed to understand it.

      b) Equally, the authors need to explain what they conclude from this analysis. They make a comparison with T. cruzi ORC1/CDC6 and SNS-seq overlap, which does not illuminate what the data tell us. For instance, if there is no or minimal overlap between ORC1/CDC6 binding and SNS-seq peaks within the polycistronic transcription units, do they conclude that the major SNS-seq signal they detail is evidence for ORC-independent DNA replication? If there is no overlap, what further evidence can they provide that these signals truly are origins?

      First, we would like to clarify that, to date, there is no evidence supporting ORC‑independent DNA replication in T. brucei, and—importantly—no published data demonstrating that TbORC1/CDC6 is universally required for DNA replication initiation. Because of this, we consider that it would be inappropriate to conclude that regions lacking detectable TbORC1/CDC6 signal undergo ORC‑independent initiation. We would prefer not to speculate in the absence of supporting evidence and would gratefully consider any reference the reviewer wishes to provide on this subject.

      Second, the low overlap between TbORC1/CDC6 binding sites and SNS‑seq origins does not, in our view, invalidate our mapping of replication initiation sites. Multiple factors contribute to this:

      (1) Low overlap between ORC1/CDC6 and origin‑mapping techniques has been repeatedly reported across kinetoplastids. For instance, in T. cruzi, 88.2% of origins detected by DNAscent nanopore sequencing showed no overlap with TcORC1/CDC6–Ty1 ChIP signal within ±3 kb, and only 11.7% co‑localized. This is strikingly similar to our observations in T. brucei. Thus, our data are consistent with the broader pattern in trypanosomatids rather than an exception.

      (2) The origin topology detected by stranded SNS‑seq is supported by several genomic characteristic found frequently in other eukaryotes, including:

      - A highly specific and polarized poly(dA)/poly(dT) sequence environment.

      - Strand‑specific G4 structures positioned around origin centers.

      - A conserved nucleosome‑depleted region flanked by well‑positioned nucleosomes.

      These features are absent from shuffled controls, appear at high significance, and recapitulate hallmark signatures of replication origins in other eukaryotes.

      Together, these findings give us confidence that the SNS‑seq peaks represent genuine origins - despite the incomplete overlap with TbORC1/CDC6 binding.

      Third, we fully agree with the reviewer that a definitive conclusion would require an additional, independent validation method.

      Given the lack of complete ORC subunit datasets and the unusual biology of trypanosomatid replication complexes, we believe that the cautious interpretation above is the most appropriate.

      c) The authors state (Discussion): "Validation of origins is generally a difficult task, particularly in trypanosomatids, where proteins involved in the initiation of DNA replication are difficult to determine. Few proteins have been described as potential ORC subunits (reviewed in 61), and none of them have been shown to be a specific marker that indicates the origins." There are two problems with the statement. First, most of the subunits of ORC have now been described in T. brucei; the authors should make this clear. Second, mapping of ORC1/CDC6 localisation, contrary to what the authors state here, shows precise correlation with the peaks of every MFA-seq signal described (see Tiengwe et al, Cell Reports, 2012); thus, ORC1/CDC6 binding provides evidence that MFA-seq is detecting origins, something that cannot be said for SNS-seq. The authors need to correct this misleading paragraph.

      As suggested, we have removed the paragraph from the Discussion to avoid confusion. However, we disagree with the reviewer's assessment and clarify below our position regarding the issues raised.

      First, we agree that five candidate ORC subunits have now been identified in T. brucei. Our intention was not to suggest the contrary, but rather to emphasize that, although candidate ORC components have been described, direct functional evidence for their roles in replication initiation is still limited. For this reason, we were cautious in referring to any ORC component as a definitive marker of replication origins.

      Second, regarding the reviewer’s statement that TbORC1/CDC6 binding “shows precise correlation with the peaks of every MFA‑seq signal”, we respectfully disagree based on several observations:

      (1) MFA‑seq does not identify individual origin centers, but rather broad replicated regions that often span hundreds of kilobases. By design, this method cannot define the number or position of discrete origins within each peak. For that reason, MFA-seq regions do not have the resolution required to validate TbORC1/CDC6 binding sites as individual origins.

      (2) In the published datasets (Tiengwe et al., Devlin et al.), no metaplots or locus‑wide quantification of the overlap between MFA‑seq peaks and TbORC1/CDC6 binding were provided. The coordinates or the approach used to define the discrete regions that they define as the originsin the MFA‑seq broad peaks have never been described or made available, making it difficult to evaluate the claimed correspondence.

      (3) Notably, McCulloch’s group later reported that only 4.4% of the 953 TbORC1/CDC6 sites overlapped with their 42 MFA‑seq “origins”, underscoring that the degree of correspondence is in fact limited (PMID: 29491738).

      (4) Finally, as noted in our response to point (1b), low overlap between ORC1/CDC6 binding sites and origin‑mapping techniques is a consistent observation across kinetoplastids, including T. cruzi, where DNAscent‑mapped origins show only ~12% overlap with TcORC1/CDC6 ChIP signals. This suggests that the limited overlap we observe is not unique to our dataset.

      For these reasons, we are not convinced that the TbORC1/CDC6 binding sites have been shown to align precisely with MFA seq peaks, nor that these datasets definitively validate origin mapping in T. brucei. Nevertheless, to avoid over‑interpretation and potential confusion, we have removed the paragraph from the Discussion as requested. We hope this clarifies our position and improves the accuracy and neutrality of the manuscript.

      (2) Like for ORC1/CDC6 localisation, the authors' evaluation of the relationship between MFA-seq and SNS-seq mapping is inadequate, and the depth of the analysis and discussion needs to be improved:

      a) The authors state: "We found 28-42% stranded SNS-seq origins overlapped with early and 43-55% overlapped with late S-phase MFA-seq replicated regions (Supplementary Figure 8B)." This seems important and provides (limited) validation of both datasets, but cannot be discerned from the supplied figure. Please provide a metaplot of the two datasets centred on the MFA-seq loci, including the SNS-seq peak amplitude.

      We would like to emphasize that MFA‑seq is not a method designed to map individual origins, and this fundamentally limits the interpretability of metaplots centered on MFA-seq regions. MFA‑seq identifies broad replication‑enriched domains, typically spanning 100–500 kb, within which multiple origins may fire asynchronously across the cell population.

      This concern is reinforced by the original MFA‑seq publications (Tiengwe et al., 2012; Devlin et al., 2016), which:

      - do not provide positional data for the 42-47 MFA‑inferred origins,

      - do not describe the computational method used to derive individual origin coordinates from the broad peaks, and

      - do not release any scripts or methodology that would allow independent reproduction of the claimed origin positions.

      Because of this, it is not possible to reconstruct or validate how the 42 MFA‑seq “origin” sites were defined, nor to use those coordinates as anchors for metaplot analyses.

      Most importantly, we disagree with the underlying assumption that each MFA‑seq peak corresponds to exactly one origin. This assumption runs counter to the principle of the technique, which identifies regions of higher DNA content in replicating cells than in non-replicating cells; it is also contradicted by our stranded SNS‑seq data and by DNA combing measurements:

      - SNS‑seq detects multiple discrete origins within the same genomic regions that produce a single broad MFA‑seq peak.

      - DNA combing reveals inter‑origin distances of ~36–422 kb (median ~150 kb) (PMID: 26976742), which is far shorter than the ~400–600 kb replication domains identified by MFA‑seq.

      - Furthermore, with only 42 origins detected by MFA-seq, it is not possible to achieve complete genome replication in T. brucei during S-phase. DNA combing has found that the average speed of replication forks in the procyclic forms is 1.9 Kb/min. (PMID: 26976742). Dividing the size of the Trypanosoma brucei brucei TREU927 genome (26.1 Mb) by 42 origins (PMID: 22840408) shows that 621 Kb must be replicated during the S phase. Using the calculated average replication speed of 1.9 Kb/min, we can estimate that the replication of 621 Kb would take 327 min (5.45 hours) (621 Kb/1.9 Kb/min = 327 min). However, this exceeds the estimated length of the S-phase in these parasites, which is 2.31 hours (138.6 minutes) (PMID: 32397111, 31811174, 28258618) or less, 1.36 hours (PMID: 2190996, 10574712) in Trypanosoma brucei procyclic forms. Therefore, more than 42 origins are necessary to complete replication during the short S phase.

      This makes it unlikely that MFA-seq regions represent single functional origins. For these reasons, a metaplot centered on MFA‑seq “loci” may lead to misinterpretations and would not provide biologically meaningful information.

      We hope that the expanded explanation clarifies our interpretation of the relationship between these two complementary, but fundamentally different, methods.

      b) The authors state that "Our results showed that the origins are predominantly located in the intergenic regions within the PTUs (Figure 2C)'. This finding cannot be discerned from this figure, which does not show 'strand switch regions' (SSRs; transcription start/stop sites), where MFA-seq predicts all origins to localise. The authors need to acknowledge this difference and must show a comparison of SNS-seq data, including peak amplitude, around all SSRs (whether predicted by MFA-seq to act as origins or not, since all appear to bind ORC1/CDC6).

      We have now provided the metaplots showing the overlap between stranded SNS-seq origins and SSRs (see Supplementary Figure 8D). This difference has been acknowledged and discussed in the revised manuscript.

      c) Finally, the authors' interpretation that around 30-55% of SNS-seq peaks overlap with MFA-seq 'origins' is highly questionable. MFA-seq peaks are regions of increased DNA content in replicating cells relative to non-replicating cells, and so the entire region under the MFA-seq peak is not necessarily an origin, but is likely to be a more discrete locus (eg, the SSR, where ORC1/CDC6 mainly localises). They should correct the wording and discuss what significance they see in this overlap; for instance, do they think SNS-seq 'clusters' are more pronounced within the MFA-seq peaks and, if so, what might this mean, and why does it not correlate with ORC1/CDC6 localisation?

      As the reviewer notes, ‘MFA‑seq peaks are regions of increased DNA content, and so the entire region under the MFA-seq peak is not necessarily an origin but is likely to be a more discrete locus’. This is exactly why MFA‑seq is inappropriate for identifying discrete/individual origins: within these replicated domains, multiple origins can fire, as revealed both by stranded SNS‑seq mapping.

      Regarding the overlap between SNS‑seq origins and MFA‑seq peaks, we agree with the reviewer that this overlap should not be interpreted as validating MFA‑seq “origin positions.” Instead, we now describe it more accurately as the proportion of discrete SNS‑seq origins that fall within broader MFA‑seq replication domains. This is expected, because SNS‑seq identifies individual initiation events, whereas MFA‑seq identifies S‑phase replication domains averaged across a population. Our stranded SNS‑seq data do not show enhanced origin accumulation within MFA-seq regions, and we find no correlation with TbORC1/CDC6 positions. This is now discussed.

      Regarding SSRs, we do not share the view that they should be considered privileged initiation sites. After remapping the TbORC1/CDC6 ChIP‑on‑chip dataset (see above) to the T. brucei Lister 427–2018 genome (Supplementary Fig. 8A), we observed that TbORC1/CDC6 binding is distributed throughout the chromosomes, not restricted to SSRs. To quantify this, we analyzed the overlap between TbORC1/CDC6 sites and all annotated SSR classes (dSSRs, cSSRs, and head‑to‑tail regions, as defined in Kim et al. 2009). The results show that:

      Only 10% of TbORC1/CDC6 binding sites fall within 40% of all SSRs.

      At the level of individual SSR types:

      - TTS: 3.3% of TTS overlap with 0.3% of TbORC1/CDC6 sites.

      - TSS: 67% of TSS overlap with 6.1% of TbORC1/CDC6 sites.

      - Head‑to‑tail regions: 54.2% overlap with 3.6% of TbORC1/CDC6 sites.

      These analyses demonstrate that most TbORC1/CDC6 sites are not located at SSRs, contradicting the idea that SSRs represent primary or exclusive origin sites.

      Author response image 1.

      Overlap between TbORC1/CDC6-12Myc binding sites (Tiengwe 2012, Cell Reports) and strand‑switch regions (SSRs). Venn diagram showing the overlap of 990TbORC1/CDC6-12Mycbinding sites (Retrieved from TritrypDB filtered at score 22 to achieve a number of binding sites similar to the one (953 binding sites) published in Tiengwe 2012, Cell Reports) and SSR sites in the genome (Kim 2018, NAR). The intersection shows that 10.3% of Orc1/CDC6 binding sites overlap with 41.8% SSRs. The intersection is subdivided into TSS (orange), TTS in (blue) and HT in (green).

      (3) A key objection to the data presentation is the decision to limit SNS-seq mapping to the intergenic regions. In addition to overlooking the SSRs (see above, 2), so-called subtelomeres, which account for nearly 50% of the T. brucei genome and are largely untranscribed, are not shown or discussed at all. Providing this data will improve clarity and also provide a key test of one of the predictions that the authors make: "most origins are localized in actively transcribed regions, which could lead to collisions between DNA replication and the transcription machinery. This spatial coincidence implies that transcription and replication must occur in a highly ordered and cooperative manner in T. brucei."

      We do not understand why this reviewer concluded that we took 'the decision to limit the mapping of SNS-seq to intergenic regions'. This is a factual error.

      To be clearer,

      (2) We now explicitly present the distribution of SNS‑seq origins across core and subtelomeric regions in the revised Figure 2D, making clear that origin mapping was performed genome‑wide.

      (2) And that SNS‑seq origins are also present in subtelomeric regions. We have revised the manuscript to avoid any implication that origin firing is restricted only to actively transcribed regions. Our data show that most SNS‑seq origins lie within intergenic regions of PTUs, but a minority are found outside these regions—including subtelomeres and SSRs. The revised text reflects this nuance and highlights that the spatial relationship between transcription and replication is strong but not exclusive.

      These additions undoubtedly ensure that the genomic-wide nature of SNS-seq analysis is transparent to the reader and should therefore remove this reviewer's “key objection”.

      a) The authors must show SNS-seq mapping to the subtelomeres (in addition to around the SSRs; see comment (2). If no SNS-seq peaks are detected in the subtelomeres, what do the authors conclude about how the genome is duplicated? If SNS-seq peaks are detected in the subtelomeres, do they correspond with the ordered nucleosomes in this part of the genome described by Maree et al (PMID: 28344657); if so, might SNS-seq signal localisation not be directed by transcription but chromatin?

      We have now presented the proportion of origins in subtelomeric regions (see Figure 2B).

      As illustrated in the metaplots in Author response image 2, the distribution of nucleosomes around the subtelomeric origins is similar to the distribution shown for all origins in the manuscript. We do not see the pattern of nucleosomes as described by Maree et al (PMID: 28344657) over ORC1/CDC6 binding sites in this part of the genome.

      Author response image 2.

      Metaplots showing the mean nuclesome signal over centred SNS-seq origins in subtelomeric regions. Two replicates from Maree et al 2019 (PMID: 28344657).

      We never claimed that transcription directs the localisation of the SNS-seq signal. We did not conduct experiments to address this issue. In contrast, we consider that the organisation of chromatin exerts a significant influence on the selection of active origins.

      (4) The major conclusion of the manuscript is that the SNS-seq signal corresponds very precisely to the locations of RNA-DNA hybrids (R-loops). Given all the limitations discussed above, can the authors rule out the possibility that SNS-seq is merely mapping DNA-DNA hybrids and is not, in fact, detecting origins?

      a) It is legitimate to speculate about the possibility that the very extensive overlap between SNS-seq and DRIP-seq signals within polycistronic transcription units (between ORFs) might suggest that DRIP-seq data detects nascent strands at replication origins, rather than R-loops at sites of pre-mRNA processing, as previously suggested by Briggs et al (PMID: 30304482). (eg, 'we disclosed for the first time a strong link between R-loop formation and DNA replication initiation'; 'The RNA:DNA hybrids are formed at initiation sites by RNA priming of SNS and Okazaki fragments'). However, the authors should acknowledge that alternative explanations for the localisation and potential functions of inter-CDS R-loops have been suggested,

      We do not find extensive overlap between stranded SNS-seq and DRIP-seq signal. We have observed only a minor proportion (1.7%) of the previously reported DRIP-seq signal to overlap with the origins detected by stranded SNS-seq. The RNA-primed SNS must form RNA:DNA hybrids during the initiation of DNA replication, and that an enrichment of these hybrids around the origins is expected. Therefore, we legitimately speculated that this minor proportion of RNA:DNA hybrids enriched around origin centres could be due to the origin activation.

      We agree that some of the DRIP-seq signals detected around the origins may be sites of pre-mRNA processing, as previously suggested by Briggs et al. (PMID: 30304482). Since there is no data proving implication of pre-mRNA processing into DNA replication initiation we prefer not to speculate about it.

      b) More importantly, the authors should provide experimental evidence that tests such a mechanistic prediction of R-loops and origins: for instance, have they attempted to remove R-loops, eg, by treatment with RNase H, and checked that the SNS-seq signal is unaltered? In the absence of such data, they cannot exclude the possibility that their work has revealed an overlooked problem with SNS-seq (which may not be limited to T. brucei; are matched DRIP-seq and SNS-seq datasets available to correlate these signals in a range of organisms?).

      We have not attempted RNase H treatment for a fundamental methodological reason: it seems highly improbable that RNA:DNA hybrids would persist through the multiple denaturation steps inherent to the SNS‑seq enrichment protocol. Published biophysical measurements show that RNA:DNA hybrids melt at ~95 °C (Roberts & Crothers, Science, 1992; PMID: 1279808), which is the temperature repeatedly applied during SNS isolation. Under these conditions, persistent RNA:DNA hybrids cannot remain intact and therefore cannot be responsible for the SNS‑seq peaks detected.

      We do not interpret our findings as revealing an “overlooked problem with SNS‑seq.” Instead, we consider that the enrichment of RNA:DNA hybrids around origins observed in DRIP‑seq is biologically meaningful and expected, given that replication initiation involves RNA‑primed nascent strands and that DRIP‑seq detects such structures.

      Reviewer #2 (Recommendations for the authors):

      I have some minor concerns that do not affect the main conclusions of the manuscript:

      (1) Figure 2B: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions.

      That is correct. The regions displayed in the heatmaps are genic and intergenic region sorted by size. We did not want to convey with this metaplot that the origins are accumulating at the centres of the intergenic region but mainly that genic regions are mostly devoid of origins and the intergenic regions enriched in origins.

      (2) Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding origins that are potentially wider. Nevertheless, the violin plot shows that the majority of origins are shorter than 500 nt. In the end, the size of regions detected as the origin is not important. What gives the resolution of stranded-SNS-seq is the ability to identify the centre of the origin between the minus and plus peaks.

      (3) Data in the manuscript were sometimes not presented in an easy-to-read manner. In some cases, this was due to benign things, such as missing labels for the mean frequency plots (e.g., Figure 2B, blue and green) or very small fonts for axes (Figure 2B). Sometimes, due to the plot types that were chosen, such as pie-charts (Figure 2C, see https://medium.com/analytics-vidhya/dont-use-pie-charts-in-data-analysis-6c005723e657), stacked bar plots (Figure 6B), or showing cumulative distributions (Figure 5C, and Figure 2D) it makes it difficult to judge the actual distribution.

      Wherever possible, the size of the small fonts was increased to the maximum. Missing labels were added to the mean frequency plots. We increased the font size for the axes in the frequency plots.

      However, we found cumulative distributions useful. If you have a more specific proposal for replacing cumulative distributions, we would be very grateful to hear it. We also hope that magnifying the figures in TIFF format with a higher resolution will improve visibility.

      (4) Figure 2B: This data would be better presented with all regions stretched to the same size (the reason is explained in the public review).

      We performed the scaled plots for the stranded SNS-seq origins over the genic and intergenic regions as the reviewer suggested (see Author response image 3), but we prefer to keep the unscaled versions in the manuscript.

      Author response image 3.

      Distribution of mapped origins in scaled genic and intergenic regions. Scaled heatmaps present the distribution of the mapped origins and shuffled controls within scaled genic and intergenic regions (± 2 kb).

      (5) Line 149: "The number of origins in both cells was 148 compared using normalised mapped reads": Supplementary Figure 2D mentions that conditions were subsampled to the same amount. I would mention that explicitly in the main text ("compared using normalized, subsampled mapped reads"), as 'normalizing' would not include 'subsampling' for me. Also, I could not find the methods section that the authors refer to here.

      Thanks for the suggestion. We changed the text to make this point clearer. In the methods section, the subsampling process was referred to as 'PCF down-sampling', but we changed now the name to 'Read sub-sampling' to be more consistent in the edited version of the manuscript.

      (6) Figure 2C: I struggled to understand what gDNA stands for. Maybe it could be replaced with something like distribution in genome?

      Thanks for this suggestion. It is changed to ‘distribution in genomic sequence’.

      (7) Figure 5C: I cannot see how a G4 30 kb from an origin could be relevant. This also does not fit the scale of the author's own model at all (Figure 8).

      The main goal of Figure 5C was to demonstrate the differences between origins and the nearest G4s compared to the shuffled controls. The graph shows that 50% of the origins have a G4 within 2010 bp, whereas the median for the shuffled control is 4154 bp in the case of non-stabilised G4s. Our model is based on Figure 5D, which illustrates the enrichment of G4s and poly(dA) around the centre of origins.

      (8) Figure 6B: could be made supplementary in my opinion. All relevant data is repeated in panel D.

      It is true that Figures 6B and 6C contain some repetition. However, we would prefer to keep Figure 6B because it provides a quantification of the six indicated categories, along with the statistical tests. Figure 6B only presents the three categories that changed significantly. Figure 6D shows distribution but does not contain quantified data.

      (9) Figure 6D: This plot is repeating a lot, within single figures (Figure 6A, top) but also between figures (e.g., Figure 5D, Figure 4B). I'd prefer it if the initial plots of each figure were expanded a bit (here Figure 6A, top) to include some information from the previous figures. Then all these summary plots could be combined into a single figure at the very end (maybe still as different panels to reduce the number of lines in a single plot). Otherwise, each summary plot repeats the tracks of the previous, which becomes very repetitive.

      Our model is based on these summary plots, and we calculated the relative distances between the different elements using them. Two elements were repeated in each plot: the positions of poly(dA) and G4s. These two elements served as reference points to determine the relative positions of the other elements. Following your suggestion would result again in repetitive summary plots at the end, as one combined summary plot would be overloaded with lines and difficult to understand.

      (10) Figure 6D & Figure 7C: Both show predicted G4s; however, on the plus strand, one prediction has a two-peaked shape, the other only a single peak. Is this a mistake?

      The graphs for the predicted G4s do not have the same shape in the two plots as they were performed in different reference genomes for T. brucei. Figure 6C is in the 427-reference genome as the MNase-seq data set was analysed in this reference genome and we re-did the SNS-seq analysis and the G4 prediction in this reference genome to be able to compare them directly. In Figure 7C we are comparing origins DRIP-seq and predicted G4s, in this case all datasets could be compared in the 427-2018 reference genome.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, but in its current form, the study feels incomplete and requires additional work.

      We thank the reviewer for the encouragement and for recognizing the potential of clinical chemistry plasma as an accessible source for cfDNA-based analyses. To address concerns about incompleteness, we conducted additional controlled experiments and a more thorough literature review.

      My major concerns/suggestions are as follows:

      (1) Context and literature

      The introduction provides only limited background on prior attempts to use heparinized plasma for cfDNA work. It is well known that heparin can inhibit PCR and sequencing library preparation, which has historically discouraged its use. The authors should summarize the relevant literature more comprehensively and explain clearly why this approach has not been widely adopted until now, and how their work differs from or overcomes these earlier challenges.

      Thank you, we agree that the review of prior work requires expansion. In the revised manuscript, we expanded the introduction to focus on prior studies and their gaps (lines 53-80).

      (2) Genome-wide coverage

      The analyses focus on correlations in methylation patterns and fragmentation metrics, but there is no evaluation of sequencing coverage across the genome. For both WGS and WMS, it would be important to demonstrate whether cfDNA from heparin plasma provides unbiased coverage, or whether certain genomic regions are systematically under-represented. A comparison against coverage profiles from cell-derived DNA (e.g., PBMC genomic DNA) would help to put the results in context and assess whether the material is suitable for whole-genome applications.

      Thank you for raising this point. We agree that genome-wide coverage distributions should be evaluated alongside correlations in methylation and fragmentation metrics when assessing the effects of sample tube types.

      To address this, we pooled the five healthy subjects in the Tube Comparison Study by tube type to generate two high-depth reference BAMs (EDTA vs. heparin separator). We calculated the mean depth per 1Mb bin across Chr1-22 and normalized with z-score. Overall, the heparin separator samples showed coverage profiles comparable to the matched EDTA samples (Pearson’s r = 0.9988, Spearman’s ρ = 0.9994). The figure has now been added as Supplementary Figure 1.

      Also appreciate the suggestion to compare against gDNA. However, cfDNA and gDNA are expected to exhibit different coverage patterns because cfDNA undergoes non-random fragmentation during its generation and degradation, which makes a direct cfDNA–gDNA comparison difficult to interpret in terms of tube-related bias.

      (3) Viral detection sensitivity

      The study shows strong concordance in viral detection between EDTA and heparin samples, but the sensitivity analysis is lacking. For clinical relevance, it is critical to demonstrate how well heparin-derived plasma performs in low viral load cases. A quantitative comparison of viral read counts and genome coverage across tube types would strengthen the conclusions.

      We agree that evaluating low viral loads is important for test development. While our goal is to evaluate the repurposing of residual plasma from the heparin separator, rather than to establish the analytical sensitivity, we recruited additional paired cases (n=4) together with viral reads below 10 RPM from existing cases (n=12) and examined the correlation of viral read counts between EDTA and heparin separators in this subset. As shown in Author response image 1, viral RPM is strongly correlated between tube types (Pearson’s r = 0.93, P < 0.0001), supporting that the heparin-derived plasma yields quantitatively consistent viral reads relative to EDTA samples. We have updated our sample sheet in Supplementary Table 1 and Fig. 3 accordingly.

      Author response image 1.

      Viral load correlation in cases below 10 RPM

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      We thank the reviewer for the insightful comments. In the revised manuscript, we added controlled experiments specifically designed to address the concerns regarding cfDNA degradation. We have also addressed other concerns in the responses below.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      We thank the reviewer for the encouraging comment.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      Thank you for this important point. We have expanded the introduction to include a thorough review of relevant prior studies (lines 53-80).

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      The concern about degradation is very reasonable based on the literature. In the revised manuscript, we added a controlled experiment mimicking the real-world clinical specimens unprocessed at room temperature.

      In the controlled experiment with delayed processing, paired EDTA and heparin separator tubes from the same blood draw from 6 volunteers were processed with the first soft spin (1600g 10min) after room temperature or 4°C delays (0, 1, 3, and 24 hours) to simulate the real-world delayed processing at the inpatient hospital setting, and then the original tubes were kept in 4°C for a week before the second spin (16000g 10min) to simulate the delayed processing at the research laboratory (Fig. 2). This simulation cannot mimic the outpatient or remote clinic setting that requires transportation. Therefore, we noted this caveat in the Discussion and Abstract.

      From our results, EDTA samples remained largely stable across all test settings (Author response image 2). In contrast, heparin separator tubes held at room temperature showed a clear time-dependent shift in fragmentation, with the most pronounced degradation at 24 hours. Importantly, heparin separator samples processed within a short pre-centrifugation window (for example, within 3 hours) and maintained refrigerated thereafter showed only minimal changes relative to the time 0 controls (Author response image 3). We have updated the Discussion to emphasize this short window plus refrigeration condition as a practical boundary for fragmentomics in heparin separator tubes.

      We addressed the work of Barra et al. (2025, LabMed) in the introduction. In that study, whole blood in heparin tubes was first soft spun and then incubated at 37°C for 24 hours, leading to severe DNA fragmentation. Our data agrees: two matched 37°C, 24-hour pairs of samples produced similar severe fragmentation in heparinized blood (Author response image 4). However, this is not representative of routine (Stanford/UCSF) clinical transport and processing. We revised the manuscript to emphasize that heparin separator tubes are most suitable for downstream cfDNA fragmentomic analyses when the pre-centrifugation interval is minimized and samples are maintained refrigerated before processing whenever feasible.

      Author response image 2.

      Size distribution and end motif rank concordance in EDTA tubes across conditions. Left panels show fragment size distributions. The right panels show the corresponding scatter plots comparing end-motif abundance rankings between conditions. E0, EDTA processed immediately; E4T24, EDTA incubated at 4°C for 24 h; ERT24, EDTA incubated at room temperature for 24 h.

      Author response image 3.

      Size distribution and end motif rank concordance in Heparin separators across conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H4T1/H4T3/H4T24, heparin incubated at 4°C for 1, 3, or 24 h; HRT1/HRT2/HRT3/HRT24, heparin incubated at room temperature for 1, 2, 3, or 24 h.

      Author response image 4.

      Size distribution and end motif rank concordance in extreme incubation conditions. Left panels show fragment size distributions. The right panels show scatter plots comparing end-motif abundance rankings between conditions. H0, heparin processed immediately; H37T24, heparin incubated at 37°C for 24 h.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      We appreciate the reasonable concerns regarding cfDNA degradation and agree that the methylation profile is not a metric for degradation. This point regarding measuring degradation is addressed with new experiments and in our above response to comment (2). We appreciate the suggestion to use Spearman correlation, and we have now incorporated Spearman’s ρ into the updated figures.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      We appreciate the reviewer’s constructive comments regarding the CNV analysis. We added an analysis using 50kb as the bin size (data uploaded to Zenodo). Across matched CNV-positive samples, the CNV patterns remained consistent across tube types, while the expected higher noise was observed. We did not extend the bin size to 1-10kb because at ~5x coverage, such resolution would mainly be noise, rendering the results uninterpretable for CNV calling.We agree that illustrative examples alone are insufficient and that quantitative measures are required. To address this concern, we evaluated concordance across all paired cases by measuring the copy ratio and calculating the Spearman correlation (Fig. 4b). CNV-positive samples had high concordance (n = 6, Spearman’s ρ=0.72-0.96) between tube types and were used primarily for interpretation. Low correlations in CNV-negative samples are not unexpected and were not used for interpretation. In these samples, log2 ratios across all bins cluster tightly around zero in both tube types. Correlation coefficients are highly sensitive to minor fluctuations, thus not informative of biological concordance.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      We appreciate this suggestion. We used another popular and independent CNV caller, CNVkit, in addition to ichorCNA. Although both methods use sequencing depth, they differ in their segmentation algorithm. ichorCNA uses a hidden Markov model-based segmentation optimized for low-pass cfDNA WGS, whereas CNVkit uses circular binary segmentation by default and works well with targeted panels. The CNVkit results are also consistent across different tube types. We have added the CNVkit results to Supplementary Fig. 3.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      We thank the reviewer for highlighting somatic SNV detection as an important cfDNA application. Robust SNV benchmarking typically requires larger plasma input and substantially deeper, targeted sequencing than is feasible with remnant chemistry specimens. In routine workflows, chemistry testing leaves only ~0.5–2 mL residual plasma per tube, which limits the achievable depth for sensitive SNV calling. We have added this limitation to the Abstract and the Discussion (lines 281-285) and clarified that our goal is to repurpose heparin separator residual plasma as a complementary resource to expand biobanking, rather than to replace collection protocols optimized for mutation testing.

      Reviewer #2 (Recommendations for the authors):

      The manuscript does not seem to have been edited thoroughly prior to submission. For example, at lines 94-97, the line spacing is double, which is apparently different from the other surrounding lines. In addition, Figure 5a contains a wrong label of "|y=x" at its top. Figure 5b strongly suggests that Spearman, but not Pearson correlation, should be appropriate for the analysis.

      We thank the reviewer for carefully noting these formatting and labeling issues. Corrections for all points are made in the revised version.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.

      Strengths:

      Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complexes is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.

      Weaknesses:

      (1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional Supplementary Figure illustrating these structural comparisons would be valuable.

      We have expanded the comparative analysis between YbjP and established anchoring or accessory components in other efflux pumps, and we have added Supplementary Figure S3 to illustrate these structural relationships.

      (2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.

      In the revised manuscript we have expanded our discussion of the LTO conformations, including a direct comparison with previously reported structural and biochemical observations, to better contextualize the significance of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.

      Strengths:

      (1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.

      (2) The identification of a previously unknown accessory subunit is an important finding.

      (3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.

      Weaknesses:

      (1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.

      We have reorganized the text to introduce the sample preparation strategy earlier and clarify the points that may cause ambiguity.

      (2) Data Presentation: The manuscript would benefit significantly from improved figures.

      We agree and have revised the figures to improve clarity, consistency, and readability. Additional schematic illustrations have been included.

      (3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.

      We appreciate these suggestions. We added the purification profile to Supplementary Figure S1 and expanded the comparison between our endogenous AcrB structure and previously reported structures from reconstituted systems, including a more detailed discussion of lipid densities.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Structural mechanisms of pump assembly and drug transport in the AcrAB-TolC efflux system" by Ge et al. describes the identification of a previously uncharacterized lipoprotein, YbjP, as a novel partner of the well-studied Enterobacterial tripartite efflux pump AcrAB-TolC. The authors present cryo-electron microscopy structures of the TolC-YbjP subcomplex and the complete AcrABZ-TolC-YbjP assembly. While the identification and structural characterization of YbjP are potentially novel, the stated focus of the manuscript-mechanisms of pump assembly and drug transport - is not sufficiently addressed. The manuscript requires reframing to emphasize the principal novelty associated with YbjP and significant development of the other aspects, especially the claimed novelty of the AcrB drug-efflux cycle.

      Strengths:

      The reported association of YbjP with AcrAB-TolC is novel; however, a recent deposition of a preceding and much more detailed manuscript to the BioRxiv server (Horne et al., https://doi.org/10.1101/2025.03.19.644130) removes much of the immediate novelty.

      Weaknesses:

      While the identification of YbjP is novel, the authors do not appear to acknowledge the precedence of another work (Horne et al., 2025), and it is not cited within the correct context in the manuscript.

      We thank the reviewer for raising this important point regarding the independent nature of our work.

      Our study indeed progressed independently. The process began with our purification of an endogenous protein sample containing the AcrAB-TolC efflux pump. During our cryo-EM analysis, we observed an unassigned density in the map, for which we built a preliminary main-chain model. A subsequent search of structural databases, including AlphaFold predictions, allowed us to identify this density as the protein YbjP. It was only after this identification that we became aware of the related preprint by Horne et al. on BioRxiv (Posted March 19, 2025).

      Therefore, our structural determination of YbjP was conducted entirely independently. We fully acknowledge and respect the work by Horne et al. and have already cited their preprint in our manuscript. While their detailed structural data, maps, and coordinates were not publicly available as of March 13, 2026, we have described their findings appropriately. We agree that our manuscript can better reflect this context and will carefully check for any missing citations to ensure that their contribution is properly and clearly acknowledged.

      We also believe that the two studies are mutually complementary and collectively reinforce the emerging understanding of YbjP.

      Several results presented in the TolC-YbjP section do not represent new findings regarding TolC structure itself.

      We agree that the TolC features we describe are consistent with previously reported structural characteristics. However, these observations could only be confirmed in the context of the newly determined TolC–YbjP subcomplex, which was not available prior to this study. We have clarified this point in the revision to avoid overstating novelty.

      The structure and gating behaviour of TolC should be more thoroughly introduced in the Introduction, including prior work describing channel opening and conformational transitions.

      We appreciate this suggestion and agree that a more comprehensive overview of TolC gating and conformational transitions will strengthen the Introduction. We have revised the text to incorporate relevant prior structural and functional studies.

      The current manuscript does not discuss the mechanistic role of helices H3/H4 and H7/H8 in channel dilation, despite implying that YbjP binding may influence these features.

      Thank you for this comment. The primary novel contributions of this manuscript are the identification of YbjP and the structural characterization of AcrB in three distinct states. The discussion of the dilation mechanism, while included because we observed the closed TolC-YbjP state, is a secondary point. In the revised manuscript, we have expanded this discussion as suggested.

      Only the original closed TolC structure is cited, and the manuscript does not address prior mutational studies involving the D396 region, though this residue is specifically highlighted in the presented structures.

      We appreciate the reviewer drawing attention to this oversight. We have added citations to the relevant mutational and mechanistic studies, including those involving the D396 region, and more clearly discussed these findings in relation to our structural observations.

      The manuscript provides only a general structural alignment between the closed TolC-YbjP subcomplex and the open TolC observed in the full pump assembly. However, multiple open, closed, and intermediate conformations of AcrAB-TolC have already been reported. Thus, YbjP alone cannot be assumed to account for TolC channel gating. A systematic comparison with existing structures is necessary to determine whether YbjP contributes any distinct allosteric modulation.

      We agree with the reviewer’s assessment and appreciate the constructive suggestion. In our revised manuscript, we have expanded the structural comparison to include previously reported open, closed, and intermediate AcrAB–TolC conformations. This expanded analysis will more clearly position our findings within the existing structural framework.

      The analysis of AcrB peristaltic action is superficial, poorly substantiated and importantly, not novel. Several references to the ATP-synthase cycle have been provided, but this has been widely established already some 20 years ago - e.g. https://www.science.org/doi/10.1126/science.1131542.

      We thank the reviewer for this comment. We fully acknowledge the foundational studies that established the AcrB functional cycle and its analogy to the ATP-synthase mechanism. While previous work indeed defined the LTO (Loose, Tight, Open) cycle of AcrB, those structures were obtained using AcrB in isolation. In contrast, our endogenous sample, which includes the native constraints of AcrA from above and the presence of AcrZ, reveals conformational changes in the transmembrane and porter domains that differ from those previously reported. We interpret these differences as reflecting a more physiologically relevant mechanism. In our revision, we provided a detailed discussion to contextualize these distinctions within the existing literature.

      The most significant limitation of the study is the absence of functional characterization of YbjP in vivo or in vitro. While the structural association between YbjP and TolC is interesting, the biological role of YbjP remains unclear.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours.

      Author response image 1.

      To further address this point, we have added a new Supplementary Figure S3 comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      Moreover, the manuscript does not examine structural differences between the presented complex and previously solved AcrAB-TolC or MexAB-OprM assemblies that might support a mechanistic model.

      We thank the reviewer for this suggestion. We now provide a more detailed comparative analysis with previously reported AcrAB–TolC and MexAB–OprM structures, highlighting both similarities and key differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) To address the probable role of YbjP, performing 3D variability analysis on the sub-complex and the complete complex would help clarify whether YbjP participates in channel opening and closing.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC. The structural transition between the closed and open states of TolC has been thoroughly reviewed by Alav et al. (Chem. Rev. 2021).

      Although the particles for the two reconstructions were obtained from the same dataset, inspection of the raw micrographs and the corresponding 2D class averages clearly shows that the particles fall into two distinct populations: one containing only the TolC–YbjP sub-complex and the other containing the full AcrABZ–TolC–YbjP assembly. In other words, the particles correspond to two different complexes, distinguished by the absence or presence of the AcrABZ components, rather than representing two conformational states of a single complex.

      Three-dimensional variability analysis (3DVA) is most appropriate for analyzing structural heterogeneity arising from continuous or discrete conformational changes within the same macromolecular assembly. Because the heterogeneity in our dataset primarily reflects compositional differences between two assemblies rather than conformational variability within a single complex, we believe that applying 3DVA would not be appropriate for this dataset.

      (2) In addition to the above points, a few minor revisions would improve clarity and readability. Some of the representative density maps in the supplementary figures could be refined for clarity. Adjusting formatting elements (e.g., dashed line thickness) may improve visual presentation.

      Supplementary Figures S2, S5, and S6 have been redrawn to reduce the excessive thickness of the density map representations for better visualization.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Xiaofei and colleagues report the high-resolution cryo-EM structure of the TolC-YbjP-AcrABZ complex, as well as the structure of a subcomplex containing only TolC and YbjP. Additionally, they identify a previously unidentified accessory subunit that plays a role in the function of this complex. Overall, this represents an impressive effort in determining the complete endogenous complex from E. coli and performing systematic analyses. I have a few questions regarding the manuscript:

      (1) The authors use the term "native" several times (e.g., lines 24, 73, 157, 256) to refer to the complex reported here. This may cause confusion, given the use of detergent to extract endogenous complexes from E. coli. They should consider excluding the possibility that the subcomplex was formed during the purification process. The term "endogenous" should suffice in this context.

      We have replaced “native” with “endogenous”.

      (2) Lines 26-28: The phrase "its protomers" may lead to ambiguity, as it could refer to either YbjP or TolC.

      The sentence has been updated to “…bridging the TolC protomers at their equatorial domain.”

      (3) Lines 50-51: The text suggests that the assembly of AcrA and AcrB triggers TolC's transition from a closed to an open conformation. Please clarify this point.

      The introduction (lines 50-51) has been expanded to describe the assembly of TolC and AcrAB, as well as the gating transition between the closed and open states of TolC.

      (4) Lines 57-59: Using cryo-EM may get the low-to-medium resolution map, but not using low-to-medium resolution cryo-EM.

      The sentence has been changed to … prior studies using crystallography and cryo-EM have revealed low-to-medium resolution snapshots of the assembled pump.

      (5) Line 73: The authors should consider briefly introducing how they prepared the samples for cryo-EM structural studies, as this is a highlight of the manuscript.

      A detailed, multi-step purification protocol has been added as Supplementary Figure S1A to illustrate the sample preparation procedure.

      (6) Lines 77-82: The authors should label these structural features in the corresponding figures for easier reference, particularly clarifying which part refers to the "equatorial domain."

      We have labeled these structural features in the corresponding figures for clarity, and specifically indicated which region corresponds to the equatorial domain.

      (7) Lines 92-93: The first α-helix of TolC is unclear; the authors should indicate the corresponding residues of this helix in the main text. Additionally, it would be beneficial to illustrate the interface in a figure for easier access.

      We have specified the residues corresponding to the participating α-helix of TolC in the main text and illustrated the interaction interface in a figure (Figure 1F) for better visualization.

      (8) Lines 99-100: Did the authors observe additional density for N-palmitoyl and S-diacylglycerol modifications in their cryo-EM density map? If so, they should highlight this in a figure to demonstrate the importance of these modifications.

      The N-palmitoyl and S-diacylglycerol modifications are embedded in the outer membrane but lack a consistent location within it. As a result, they were averaged out during cryo-EM reconstruction and are not visible in our final map.

      (9) Line 122: Please indicate the 33 nm height in the figure.

      The 33 nm height is composed of a 14 nm TolC channel, a 14 nm periplasmic portion of AcrAB, and a 5 nm transmembrane portion of AcrB, which has been added to the right side of Figure 2B.

      (10) Lines 123-124: This sentence feels out of place. It would be more appropriate to move it to another location, such as the beginning of the Results section, to introduce how the samples were prepared.

      This sentence has been moved to the section “Structure of a TolC–YbjP closed-state complex” to describe the sample preparation.

      (11) Lines 127-128: This section needs to be rewritten for improved clarity.

      This sentence has been rewritten as “This tripartite architecture is stabilized by three distinct sets of interfaces: (i) contacts between the AcrB trimer and the basal regions of AcrA, (ii) extensive AcrA–AcrA lateral interactions within the hexameric ring, and (iii) tip-to-tip junctions formed between the upper AcrA α-helical hairpin and the periplasmic entrance of TolC (Figure 2D).”

      (12) Line 141: Please define terms like DN, DC, PN, and PC upon their first use.

      DN and DC (denoting the N- and C-terminal subdomains of the docking domain), PN and PC (named for the N- and C-terminal subdomains of the periplasmic (porter) domain) have been defined where they first appear in the text.

      (13) The lα helix of AcrB is at least partially buried in the membrane (Liu H. et al, PNAS 2025). The authors should consider including this information in their figures, particularly Figure 2B and Figure 5. As the complex is endogenously purified, are there any differences in AcrB compared to those observed in liposomes, SMALP, or vesicles? Did the authors observe significant lipid densities?

      A structural comparison of the AcrB holocomplex with an AcrB structure determined in the native membrane environment (PDB: 9DXN) has been added as Supplementary Figure S8D. In the transmembrane region of AcrB, some sausage-like densities were observed; however, lipid molecules were not modelled in the study.

      (14) The protein purification profile should be included, at least as a supplementary figure.

      The protein purification profile has been added to Supplementary Figure S1A.

      Reviewer #3 (Recommendations for the authors):

      (1) The identification and structural characterization of YbjP as a novel TolC-associated lipoprotein is potentially interesting, and the cryo-EM structures of the TolC-YbjP subcomplex and the complete pump assembly represent a solid starting point. However, the manuscript currently does not sufficiently support the broader mechanistic conclusions implied by the title regarding pump assembly and drug transport. To strengthen the work, the manuscript would benefit from being refocused to highlight the novelty of YbjP, while also providing a clearer mechanistic rationale for its functional role.

      We thank the reviewer for this helpful comment. We have revised the manuscript to better highlight the novel features of YbjP and provide a clearer mechanistic explanation for its function.

      Most Gram-negative TolC homologs, including P. aeruginosa OprM and E. coli CusC, carry native lipid anchors that attach them to the outer membrane. However, E. coli TolC lacks this N-terminal lipidation site. We propose that YbjP, a dually lipidated protein modified with N-palmitoyl and S-diacylglycerol groups, tethers TolC to the outer membrane and functionally replaces the intrinsic lipid anchors found in other outer membrane factors.

      To support this mechanism, we have added Supplementary Figure S3, which compares the anchoring domains of six representative outer membrane components of efflux pumps.

      (2) The structural features and gating dynamics of TolC should be more thoroughly introduced, including prior work describing channel dilation and helix movements (e.g., PMID: 18406332; PMID: 21245342), and the manuscript should discuss how YbjP may influence these known conformational transitions. The relevance of the D396 region should also be considered in the context of previous mutational analyses (e.g., PMID: 32850959).

      All citations mentioned have been added. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (3) Structural interpretation of the YbjP-containing complexes needs to be strengthened by comparison with the extensive library of available AcrAB-TolC structures in open, closed, and intermediate states (e.g., PMID: 28355133; PMID: 24747401; PMID: 34506732). Such analysis is necessary to determine whether YbjP contributes any distinct allosteric or conformational effects.

      YbjP binds to the equatorial domain of TolC, distant from the tip of its coiled-coil helices. This binding therefore does not interfere with TolC’s functional role, but rather helps anchor TolC within the outer membrane in the correct orientation.

      (4) The speculations regarding the peristaltic nature of AcrB cycling as currently presented in the text and Figure 4 lack novelty and currently reiterate well-established AcrB L/T/O states without offering insight into how YbjP might influence long-range communication within the complex.

      We thank the reviewer for this valuable comment. We agree that the functional rotation mechanism of AcrB with loose, tight and open states has been well documented in previous work.

      In our endogenous intact complex, however, we identified substantial conformational changes in both the porter and transmembrane domains of AcrB that were not observed in earlier isolated structures. To highlight these differences, we have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

      On the basis of these structural observations, we have proposed a distinct drug efflux mechanism, which is now described in detail in the revised manuscript.

      (5) Specific clarification is needed regarding the proposed pathway by which YbjP could modulate AcrA or AcrB, given the spatial separation observed in the structures.

      YbjP binds to the equatorial domain of TolC, which has no effect on AcrA or AcrB.

      (6) The manuscript currently lacks functional validation of YbjP, either in vivo or in vitro. Incorporating even basic assays to test YbjP's contribution to efflux function, pump assembly, or antibiotic resistance would significantly enhance the conclusions.

      To explore the potential physiological role of YbjP, we compared the viability of a ΔybjP mutant in the E. coli C600 background with that of the wild-type C600 strain under ciprofloxacin (CIP) stress. However, we did not observe a detectable difference in survival between the two strains under the tested conditions. This result is consistent with the assay reported in the preprint mentioned by the reviewer, although the stress conditions used in that study differ from ours. (See Author response image 1).

      To further address this point, we have added a new Supplementary Figure (Fig. S3) comparing outer membrane proteins with structural and functional similarities to TolC. As shown in this analysis, many such proteins contain an extracellular N-terminal loop that appears to help anchor or stabilize them within the outer membrane. Notably, TolC lacks such a loop, whereas YbjP contains a corresponding loop region, suggesting that YbjP may potentially play a role in stabilizing or positioning TolC in the outer membrane.

      While our current experiments did not reveal a clear phenotype under CIP stress, the structural observations still suggest that YbjP may have a physiological role. We have therefore expanded the Discussion to more carefully consider possible functional implications of YbjP and to explicitly acknowledge the limitations of the present study regarding its physiological characterization.

      (7) The relationship to the prior BioRxiv work by Horne et al. (March 19, 2025) should be discussed more directly, particularly because it reports the same YbjP-TolC association across two different efflux systems and includes higher-resolution structures and functional evidence. The current citation should be revised to accurately acknowledge the precedence and overlap in findings.

      We thank the reviewer for this important suggestion. We have adjusted the citation to earlier in the manuscript to properly acknowledge the work by Horne et al.

      We fully agree that a direct comparison between our structures and those reported by Horne et al. would be highly valuable. However, although nearly a year has passed since the preprint was posted, their atomic coordinates have not been released in the Protein Data Bank. No detailed structural coordinates or models are provided in the preprint itself, which prevents us from performing a meaningful, structure-based comparison with our own data at this stage.

      (8) The references used to support statements on allosteric pump activation (e.g., lines 182-183) should be updated to include more relevant full-complex studies (e.g., PMID: 28355133; PMID: 33009415; PMID: 33909410), and the manuscript should more clearly articulate any proposed mechanism for signal transmission involving YbjP.

      The citations have been added.

      YbjP does not participate in the opening or closing of the TolC channel. Indeed, the structure of TolC shows no conformational changes upon YbjP binding when compared to the free, closed form of TolC.

      (9) Overall, while the structural identification of YbjP is noteworthy, additional functional data and more rigorous structural comparison are needed to substantiate the proposed model of pump assembly and drug transport. Reframing the manuscript to emphasize the novelty of YbjP and clarifying its potential mechanistic role would strengthen the work significantly.

      We refer the reviewer to our earlier response for additional functional data. We have added Supplementary Figure S8 to compare our AcrB structure with all previously reported conformational states.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

      We appreciate Reviewer #1's positive assessment of the study, including the acknowledgment of our large sample size, comprehensive behavioral battery, and verification of cerebellar atrophy using MRI. We address the concerns raised as follows:

      (1) Voxel-based morphometry (VBM) and anatomical specificity

      We agree that VBM would strengthen anatomical specificity. As noted in our response to private comments, we have carried out these analyses as part of a separate dedicated study, now available as a preprint (“Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, https://doi.org/10.64898/2026.02.13.705695). This work investigates region-level cerebellar aging and its relationship with behavior in detail, including both anatomical and functional parcellations. In short, the preprint demonstrates the absence of structure-function relationship between cerebellar regions (from either anatomical or functional atlases) and cerebellar function. Given the scope of the present manuscript, which focuses primarily on behavioral evidence for cerebellar preservation, we chose not to expand this paper further with VBM results.

      (2) Task classification and cerebellar involvement

      We clarified in the revised manuscript that even “general” measures likely involve cerebellar processing to some extent. We have strengthened the discussion explaining that these measures do not primarily depend on cerebellar function, in contrast to the cerebellar-specific metrics derived from established models (e.g., clock variance in rhythmic tapping). We now explicitly caution against interpreting these general measures as cerebellar-independent.

      (3) Measurement noise and differential sensitivity

      To address the reviewer’s concern that measurement noise may differ between task categories, we now report split-half reliabilities for all measures in the Supplement. These data demonstrate no systematic reliability disadvantage for cerebellar-specific tasks that could explain the pattern of results.

      Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

      We appreciate Reviewer’s recognition of the methodological rigor of the study. The public review focuses on the structure-function relationship for the cerebellum. Given that the volume of the cerebellum is smaller in older adults but that the identified cerebellar function are maintained, we conclude that there is no structure-function relationship. We agree with the reviewer that this could be tested further by looking at different parcellations of the cerebellum and demonstrating the absence of association between smaller regions of the cerebellum and the investigated cerebellar function. We agree with the reviewer that this is interesting but believe that this goes beyond the scope of this already extensive paper. For this reason, detailed analyses of the structure-function relationship are available in the preprint version of another paper entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function”, (https://doi.org/10.64898/2026.02.13.705695). In this preprint, across multiple anatomical and functional parcellations, we found no meaningful association between cerebellar structure and cerebellar-specific behavioral measures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Prefacing these suggestions, I want to commend the authors for undertaking this Herculean effort, recruiting such a large sample and administering an extensive battery of tasks. This is an impressively comprehensive study!

      (1) Lesion-symptom mapping. The authors state that lesion-symptom mapping was beyond the scope of the study, but it is unclear why such an analysis could not be performed. Including it would strengthen inferences linking cerebellar structure to behavioral outcomes and help differentiate cerebellar-specific from general performance measures.

      (2) Inter-measure correlations. For cerebellar-dependent tasks, did the authors examine correlations among behavioral measures? If cerebellar aging effects are relatively uniform across the cerebellar cortex, performance across tasks engaging distinct cerebellar regions should, in theory, covary. Similar pairwise correlations for general measures could provide a useful comparison.

      1 + 2: We fully agree with this two points; however, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work, we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniformly though. There was no relation between behavioral cerebellar-specific measures and regional gray matter structure.

      In this second paper we also analyzed inter-measure correlations between behavioral cerebellar-specific measures. We did not find any correlations between cerebellar outcomes of different tasks, which indeed could indicate that the different tasks engage distinct cerebellar regions. In addition, we did not find any relation between cerebellar outcomes and anatomically or functionally defined cerebellar regions.

      You can find a preprint of the second manuscript entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” here: https://doi.org/10.64898/2026.02.13.705695

      (3) Measurement sensitivity. Could differences in age effects reflect varying measurement noise between cerebellar-specific and general measures? For instance, even among younger participants, cerebellar-related measures (e.g., slope in mental rotation) might exhibit greater variability - given that they depend on more conditions, each with its own noise - than general metrics (e.g., baseline motor variability or choice reaction time estimated from a single condition). This could affect sensitivity to detect age-related change and bias results toward finding effects in general rather than cerebellar-specific measures.

      To address this concern, we computed split-half reliability for both cerebellar-specific and general sensorimotor measures and added these estimates to the supplementary materials. As can be seen from Author response table 1, there is no consistent pattern of lower reliability for cerebellar-specific measures that could plausibly account for the absence of age-related effects.

      Author response table 1.

      Split-half reliabilities

      (4) Task dependence on the cerebellum. It is difficult to argue that measures such as reach accuracy, choice reaction time, or rhythm deviation are non-cerebellar. Ataxia certainly impacts reach accuracy. Although patient evidence is mixed - and even when there is a lack of dissociation (e.g., prolonged choice reaction times in both cerebellar and PD groups) - this does not preclude cerebellar involvement in these measures. Indeed, as the authors stated, claims of cerebellar independence should therefore be made cautiously (can be addressed by VBM in comment 1).

      In the paper we tried to emphasize that the general sensorimotor measures still involve cerebellar functions, as this is the case with many movement-related measures. However we theorized that they do not primarily depend on cerebellar function. For example rhythm deviation in the finger tapping task is influenced by cerebellar timing mechanisms as well as motor execution noise, attention, etc. While the cerebellar-specific measure from this task, which is the clock variance, has been shown to extract the contribution of cerebellar-dependent timing mechanisms to this task (Ivry & Keele, 1989).

      On p.37, we added the following paragraph:

      “Similarly, it is important to recognize that general sensorimotor performance is not independent of cerebellar processing. Many broad measures, such as movement accuracy, reaction time, likely reflect contributions from many different brain regions including the cerebellum. As a result, age‑related differences in general sensorimotor performance may emerge from multiple interacting systems rather than cerebellar function alone.”

      (5) Interpreting preserved or enhanced function. The finding of preserved - or even enhanced - performance in older adults is compelling. The authors interpret this as evidence for cerebellar reserve or compensation for cortical decline. An alternative explanation is that cerebellar structures simply decline more slowly than cortical ones, as their gray-matter data suggest; so rather than cerebellar activity revving up, it may remain the same: For example, following up on several of the authors' prior papers, Cisneros et al. (2024) reported enhanced implicit recalibration with age, potentially reflecting greater reliance on cerebellar forward models as sensory (especially proprioceptive) signals degrade. However, this may reflect reweighting rather than compensation - where cerebellar contributions are not enhanced, but rather preserved as other systems decline more rapidly. It would be valuable for the authors to clarify whether they view their findings as evidence of reweighting (slower decline) or compensation (increased contribution).

      We completely agree with this additional interpretation and added a small section to the discussion about it. However, based on the structural cerebellar measures that we have, it is difficult to state whether the reweighting or compensation theory would be more plausible. In either way, both are in line with the cerebellar reserve theory

      Added to discussion (P. 35):

      Importantly, the relative preservation of cerebellar structure compared to other systems may itself contribute to the maintained cerebellar function observed in older age. Even if structural decline is present, the fact that it progresses more slowly than in many cortical and subcortical regions suggests that a form of structural reserve remains available in the cerebellum. This structural reserve could underlie the continued efficiency of cerebellar circuits and support their capacity to sustain motor functions across aging.

      (6) Mental rotation and the continuity hypothesis. The age-related decline in mental rotation performance, if cerebellar-dependent (see McDougle et al., 2022; note minor inconsistency in citation format throughout the paper), supports emerging theories that the cerebellum supports continuous mental simulations in both cognition and action, whether it's forward model simulation or interval-based timing in the motor control domain or mental rotation/intuitive physics in the cognitive domain (Tsay & Ivry, 2025). Given that mental rotation showed the strongest age effect, it would be fascinating to examine whether this correlates with structural loss in Crus I/II, regions most implicated in higher-order cognitive functions - related to Comment 1 above. Even on a crude level, without correlating with behaviour, do the authors have a map for which areas show greater degeneration than others?

      This is also something we did in the other paper mentioned before (Figure 5 of the new preprint). At a first glimpse, the mental rotation outcomes show a strong positive correlation with Crus I and a negative correlation with Crus II, however none of these were significant and the fact that their sign is opposite suggest that these might be random. Indeed, in the preprint, we also compare age-related changes in grey matter volumes for different anatomical and functional cerebellar regions (Figure 1).

      The inconsistencies in citation format have been fixed as well.

      (7) Continuous age analyses. An exploratory analysis correlating age (as a continuous variable) with each dependent measure might provide greater sensitivity than categorical group comparisons, revealing more graded relationships between age and performance.

      Our experiment was not designed to perform such analysis. Testing for group differences provides more power than testing for correlations. For this reason, given that our clearly separated age groups did not show any behavioral differences, we do not expect such an analysis to provide substantial additional insight. Given that the paper is already very extensive, we haven’t performed this additional analysis.

      Congratulations on this comprehensive piece of work!

      Thank you for your kind words

      Reviewer #2 (Recommendations for the authors):

      In the introduction, the authors note that the current literature on the cerebellum in aging has evidence from "studies that relied on single-task paradigms", including a citation to an eye-blink conditioning study. They then note "instead of capturing a broader range of specific cerebellar functions". What do they mean by this? Eye-blink conditioning, for example, when administered in a delay paradigm, is tied directly to the cerebellum and is arguably a cerebellar function or learning paradigm. Some clarity about his point is needed.

      The meaning of this is that most previous studies examining cerebellar function in older adults relied on a single task, or on tasks that were functionally very similar, such as balance and gait, to assess performance. In contrast, our study incorporated multiple tasks targeting different sensorimotor skills, allowing us to identify broader patterns in cerebellar sensorimotor performance in older adults.

      To make this clearer, we have rephrased the sentence (p.4):

      “However, much of the evidence supporting this theory comes from studies that narrowly focused on a single task (Boisgontier & Nougier, 2013; Miller et al., 2013; Woodruff-Pak et al., 2001) or on assessments within similar cerebellar domains such as balance and gait (Droby et al., 2021; Rosano et al., 2007), instead of capturing a broader range of specific cerebellar functions.”

      The authors note that many cerebellar tasks that are impaired in patients are preserved in older adults. The authors, however, seem to ignore delay eyeblink conditioning. Gerwig and colleagues (2010, Behav Brain Res) have shown that this is impacted in patients, and it is also robustly impacted in aging. Older adults still learn, but the age effects are highly replicable. A clear discussion of eye-blink conditioning and how it fits into this framework, and with your findings here, would be really helpful. It seems like a notable oversight not to have it discussed, given the age effects in this context, even if it was not included as a measure.

      Eye blink conditioning is an interesting example that seems to contradict our theory: eye-blink conditioning is both affected by age and dependent on the cerebellum. However, while age-related changes in cerebellar structure evolve continuously with age, changes in eye-blink conditioning performance remains unchanged between 40 and 80 years old. Therefore, eye-blink conditioning suggest that age-related changes in cerebellar structure are not related to possible age-related changes in function. This discussion was already included in the manuscript on p. 36, which reads as:

      “Similarly, no eye-blink conditioning task was included, as it is heavily influenced by cognitive factors such as awareness and arousal, and fear conditioning (LaBar et al., 2004). Previous work has shown that many variables, such as blink reaction time and motor components of the eyeblink reflex, introduce substantial variability in responses at older age (Woodruff-Pak & Jaeger, 1998). In contrast, this study found that only performance on the rhythmic finger-tapping task, similar to what we included in our battery, emerged as a significant predictor of age-related differences in eye-blink conditioning. Furthermore, age-related differences appeared to plateau after early adulthood, with no significant variation in the percentage of correct responses between ages 40 and 80 (Woodruff-Pak & Jaeger, 1998). Practically, the extended duration of the training protocol also makes this task unsuitable for inclusion in a test battery (Winton et al., 2025).”

      This approach also does not consider variability within older adults. That is, on average, they may do better than patients. But, there are also individual differences in cerebellar metrics (structure, for example) within an older adult sample that are a critical consideration here. When looking at the behavioral plots that include the individual data points (which is a great addition and very helpful), it is clear that variability is prevalent. As noted below, it may still be that cerebellar metrics are associated with behavior, given the high degree of variability within the groups across aging.

      We agree with the reviewer that variability is prevalent, as it is in any experiment. In our latest preprint entitled “Aging is associated with uniform structural decline across cerebellar regions while preserving topological organization and showing no relation with sensorimotor function” (https://doi.org/10.64898/2026.02.13.705695), we investigated whether variability in cerebellar structure could predict variability in cerebellar functions. Across all our tasks, we did not find such association, independently of whether we defined cerebellar regions based on an anatomical atlas or a functional one.

      The use of 23 as the cut-off for MOCA scores is rather low. What was the justification for this within the literature? The authors note wanting to ensure task instructions and those with symptoms of potential MCI, but often 26 is used as a minimum score (with 25 and below being potential MCI).

      In the methods, we refer to the study of Carson et al. (2018) that recommends a cutoff score of 23/30 instead of 26/30 as it shows overall better diagnostic accuracy. We selected this cutoff to emphasize that our sample was not restricted to only the highest‑performing older adults. However, we agree that this is not sufficiently explained in the text, so we briefly clarified this (p.5):

      “We assessed cognitive functioning in both older and older‑old participants using the Montreal Cognitive Assessment (MoCA). A minimum score of 23 out of 30 was required for inclusion, following the recommendation by Carson et al. (2018), who demonstrated that this reduced cutoff yields fewer false positives and provides better overall diagnostic accuracy than the original 26/30 threshold. We adopted this criterion to ensure that our sample was not limited to only the highest‑performing older adults.”

      The authors note that the timing of the visits was adapted based on participant availability. It would be helpful to report the mean length of time between sessions, as well as the range.

      We added this to the method section (p.6):

      “There was no fixed interval between the two behavioral sessions. Ideally, both were scheduled within one week, but in practice, the timing was adapted to participants’ availability. Across all participants, this resulted in a mean inter-session interval of 7.40 days (± 9.03; range = 0-63 days). The average interval between the behavioral sessions and the MRI scanning was 6.86 days (± 8.90; range = 0-83 days).”

      The authors have anatomically defined cerebellar parcellations but have looked solely at total volume measures. What is the rationale for this? If there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging.

      As mentioned in response to one comment from the other reviewer, we investigated this question in our latest preprint (https://doi.org/10.64898/2026.02.13.705695). In this analysis, we did not find any relation between cerebellar outcomes and anatomical or functional cerebellar regions.

      We consider this to be beyond the scope of the present paper, which focuses on the behavioral performances. The total cerebellar volume was added to show that the subject sample we used did actually exhibit atrophy in the cerebellum, but the purpose of the paper was not to focus on the link between structure and function.

      With respect to timing, I recognize that the clock variance is insignificant based on p=.06. However, this is a relatively "close" result. I am very much of the mindset that things are significant or not. Inclusion of Bayesian analyses helps this, but I don't find this particularly convincing. The larger sample of individuals over age 80 is certainly a strength, and I'm not especially concerned about power. But I do wonder about overinterpretation. I would also emphasize the large degree of variability here in the oldest sample. This raises questions about associations with cerebellar metrics. This argument for relative preservation/reserve may be strengthened by looking at individual differences in structure relative to behavior. That is, in areas of the cerebellum where structure is less impacted by aging (as this is not entirely uniform) does this volume predict better behavior in this sample?

      As noted earlier, the relationship between structure and function is examined in our other paper (https://doi.org/10.64898/2026.02.13.705695). Unfortunately, we were unable to include the 80+ group in that analysis because MRI data was available for only 20 older‑old participants and correlations/regression with 20 people are vastly underpowered.

      We also want to point out that the almost significant difference highlighted by the reviewer between age groups actually goes in the direction of the older participants performing better than the young participants.

      The note about the amount of variance in the older-old participants is fair, though.

      The comparison with the Cam-CAN data set seems to be largely qualitative. Why did the authors not make a direct comparison to determine relative similarity in their sample compared to Cam-CAN? This would be a bit more compelling, though I suspect the differences are not statistically reliable (they note the oldest-old in the Leuven sample have a slightly larger volume). I do realize there are sample size differences, but a matched random sub-sample could also be created out of Cam-CAN. Why did they not compute the quadratic model in the Leuven sample as well?

      A quadratic model was not considered very meaningful in the Leuven sample because age was not measured as a continuous variable but categorized into three discrete age groups (which provides more power to look at age-related differences). Our goal was not to determine whether absolute cerebellar volumes matched across datasets, for example, by creating comparable age groups in the Cam‑CAN dataset, but rather to assess whether the pattern of age‑related effects in our sample aligned with those seen in a larger dataset. In our opinion, the current approach sufficiently demonstrates that the age‑related trends we observe are consistent with those reported in Cam‑CAN.

      The analysis of relative cerebellar gray and white matter is quite interesting. However, what about regional patterns to this? It would be particularly interesting to know if some regions are more or less impacted or preserved relative to the cortex. The data are seemingly available based on the processing approach (at least for gray matter). Was a similar analysis also computed in Cam-CAN? Replicating this in an independent sample would also be of interest.

      We agree with the reviewer that this is indeed interesting for further analyses on this dataset. However, it falls beyond the scope of the present paper. Our preprint (https://doi.org/10.64898/2026.02.13.705695) looks at regional patterns for the cerebellum. Other papers have compared age-related decline in different cortical and subcortical regions as discussed on p.35 of our discussion:

      “Given that the cerebellum exhibited a relatively less pronounced structural decline compared to other brain regions as shown here and in another previous study (Taki et al., 2011), it seems more plausible that the cerebellum might compensate for deficits caused by structural changes in other areas rather than vice-versa. Age-related gray and white matter degeneration is usually faster in frontotemporal regions and subcortical regions, including the hippocampus, amygdala and thalamus than in the cerebellum (Fjell et al., 2013; Giorgio et al., 2010; Neufeld et al., 2022). Although this does not directly indicate functional implications, it suggests that cortical regions are less likely to compensate for cerebellar loss when they exhibit more severe degeneration.”

      The authors argue for cerebellar reserve and present compelling behavioral data in support of this with their many tasks. In instances where they look at largely cerebellar-mediated measures, they demonstrate that older adults and the >80 year old group show relatively intact behavior, even those in the group for total cerebellar gray matter volume (and white matter) is significantly smaller than in young adults. As noted, the behavioral data are very compelling, and as an individual who looks at aging populations in their research, seeing areas and domains of preservation is always interesting and useful. This pattern certainly may be consistent with cerebellar reserve. However, it would be more compelling if the authors also looked at these behaviors with respect to cerebellar volume. That is, there is still a great deal of variability in behavior in the older and >80 samples (though also in the young adults) that may still be associated with cerebellar volume. Poorer performance may be present in those with smaller volumes. This would also be somewhat consistent with the notion that these tasks are those that are derived from work in cerebellar degeneration samples. Associations between behavior and cerebellar measures would speak to this. If there are no associations with volume, this would be particularly interesting and compelling in the context of reserve. Alternatively, if there are differential impacts on cerebellar volume with age (Han et al., 2022; Bernard & Seidler, 2013), there may also be positive associations with behavior in regions that are less negatively impacted by volume. This would be consistent with the idea of reserve. One interesting set of correlations that could be considered is with respect to anterior lobules (I-IV and V) relative to the secondary motor representation in VIIIa and VIIIb, such that the latter may show a more robust association with behavior in the positive direction if volume in these regions is less impacted by aging. Not all individuals completed the scan (due to safety and comfort considerations), which would limit statistical power potentially, but this could be conducted in the subset of individuals that have both sets of data.

      This point overlaps with the issues raised by the other reviewer in comments 1 and 2, which highlights the importance of this point. Yet, we decided to address this analysis in a separate paper. In the current manuscript, our primary focus was on the behavioral aspects, as these are already quite extensive on their own. In our subsequent work (https://doi.org/10.64898/2026.02.13.705695), we conducted an in-depth investigation into the relationship between cerebellar-specific measures and cerebellar structure across distinct cerebellar regions (including anatomical regions and functionally defined regions according to the atlas of Nettekoven et al., 2024). We found that aging does not affect the cerebellum uniformly, but that some anatomical regions exhibit stronger age effects. For the functionally defined regions the age effects were uniform though. There was no relation between behavioral cerebellar-specific measures and anatomical or functional cerebellar regions.

      Some of the assertions the authors make in the discussion about the cerebellum have less pronounced structural decline relative to other brain regions would benefit from being tempered. They used relative measures here, and this is certainly interesting. But, how do other regions stack up? What would the hippocampus look like if such a measure were used? And as noted, does this pattern replicate in the CAM-CAN sample? Further, the authors cite Jernigan et al. (2001) in arguing that cerebellar changes are smaller than those in other brain regions, when in looking at their tables, in fact, the gray matter reductions of the cerebellum are comparable to those of the prefrontal cortex and second only to those of the hippocampus.

      We agree with the reviewer that this is an interesting question but this question needs to be addressed in a separate paper. We also remove the citation to the Jernigan paper.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Comments on revisions: The authors addressed all my concerns.

      We thank you for the positive review and feedback throughout the review process.

      Reviewer #2 (Public review):

      Comments on revisions: We agree with the overall findings of the study and appreciate that the claims in text and title have been appropriately toned down. As additional suggestions e.g. for presentation, many of the graphics/labels are still too small to be useful. It would be interesting to see if this cell line is similar to the tumours in terms of all the phenotypes. The lapatinib experiment was good. I wonder how quick this drug affects the mitochondria. Also it would be interesting to see if these cells have higher OXPHOS than other non-transformed breast epithelial cells. The WB on oxphos components is good with ab110413 but this looks like many subunits are detected so this should be made clear.

      Thank you for these suggestions.

      We have clarified in the Methods section (lines 475–476) the specific OXPHOS subunits detected using the Ab110413 antibody cocktail.

      With respect to lapatinib, prior work has shown that lapatinib can alter the phosphoproteome within minutes to hours (PMID:22964224). In our experiments, however, NF639 cells were exposed to lapatinib for 24 hours - a timeframe in which transcriptional and translational remodeling are also expected to occur. Therefore, we cannot distinguish whether the observed suppression of OXPHOS reflects acute signaling effects or downstream changes in gene and protein abundance. Importantly, the purpose of this experiment was proof-of-principle: to determine whether HER2 signaling contributes to respiratory competency in a cell line derived from the same transgenic model as the intact tumor slices used in this study. Thus, while defining the precise kinetics of inhibition or comparing to benign/non-transformed cells would be interesting, these were not the primary objectives of the added experiments.

      We have increased figure label sizes across all main figures.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Frangos et al. used a transcriptomic and proteomic approach to characterise changes in HER2-driven mammary tumours compared to healthy mammary tissue in mice. They observed that mitochondrial genes, including OXPHOS regulators, were among the most down-regulated genes and proteins in their datasets. Surprisingly, these were associated with higher mitochondrial respiration, in response to a variety of carbon sources. In addition, there seems to be a reduction in mitochondrial fusion and an increase in fission in tumours compared to healthy tissues.

      Strengths:

      The data are clearly presented and described.

      The author reported very similar trends in proteomic and transcriptomic data. Such approaches are essential to have a better understanding of the changes in cancer cell metabolism associated with tumourigenesis.

      Weaknesses:

      (1) This study, despite being a useful resource (assuming all the data will be publicly available and not only upon request) is mainly descriptive and correlative and lacks mechanistic links.

      We appreciate this point. While the primary goal of our study was to assess mitochondrial adaptations with HER2-driven tumorigenesis, we agree strengthening the mechanistic interpretation would improve the impact of the data. To address this, we have provided experiments demonstrating HER2 inhibition in NF639 cells with lapatinib supresses respiratory capacity, directly supporting the interpretation that HER2 activity regulates respiratory function (Figure 10). We have expanded the discussion appropriately (lines 378-394). Both raw RNA-seq and proteomic data were deposited through GEO and the PRIDE repositories (accession numbers included in Data Availability Statement).

      (2) It would be important to determine the cellular composition of the tumour and healthy tissue used. Do the changes described here apply to cancer cells only or do other cell types contribute to this?

      We thank the reviewer for this suggestion; we have added experiments that have directly addressed this concern.

      Cell type composition analysis by immunofluorescence was added (Figure 6) where we quantified epithelial, mesenchymal, endothelial, immune and stromal populations in our benign mammary tissue and tumor samples. We found no major shift in the dominant cell types that would confound transcriptomic data in whole tissues.

      We integrated immunofluorescence data with a publicly available scRNA-seq dataset from human breast tumors which allowed us to estimate cell-type-specific expression of OXPHOS genes in our own samples. Despite the possibility of species differences, this is the only dataset of its kind, and we used this to generate an estimate of cell type weighted OXPHOS mRNA expression (Figure 6). This revealed that epithelial cells are likely the dominant contributors to OXPHOS gene expression for CIIV. All calculations are delineated in the Methods section.

      (3) Are the changes in metabolic gene expression a consequence of HER2 signalling activation? Ex-vivo experiments could be performed to perturb this pathway and determine cause-effects.

      Thank you for this suggestion – we have included an experiment directly testing this concept. We assessed mitochondrial respiration in NF639 HER2-driven mammary tumor epithelial cells in the presence or absence of the well-described dual tyrosine kinase inhibitor lapatinib. Lapatinib reduced basal, CI-linked and CI+II linked respiration without compromising mitochondrial integrity or coupling, demonstrating that HER2 activation regulates respiration in our model. This data is presented in Figure 10, and a new section has been added to the discussion describing the implications of this finding in the context of the current literature (lines 378-394).

      (4) The data of fission/fusion seem quite preliminary and the gene/protein expression changes are not so clear cut to be a convincing explanation that this is the main reason for the increased mitochondria respiration in tumours.

      We agree mitochondrial morphology and dynamics alone cannot fully account for the observed respiratory phenotype – this was emphasized in the discussion but has since been further clarified (lines 365-377). We retained the TEM and dynamics gene/protein data because they do support morphological differences consistent with enhanced fission. However, we have revised the tone of our interpretation to more explicitly acknowledge that these findings are correlative, and the updated discussion now emphasizes that the increased respiratory capacity in tumors is likely driven by multiple converging mechanisms.

      Reviewer #2 (Public review):

      Frangos et al present a set of studies aiming to determine mechanisms underlying initiation and tumour progression. Overall, this work provides some useful insights into the involvement of mitochondrial dysfunction during the cellular transformation process. This body of work could be improved in several possible directions to establish more mechanistic connections.

      (5) The interesting point of the paper: the contrast between suppressed ETC components and activated OXPHOS function is perplexing and should be resolved. It is still unclear if activated mitochondrial function triggers gene down-regulation vs compensatory functional changes (as the title suggests). Have the authors considered reversing the HER2-derived signals e.g. with PI3K-AKT-MTOR or ERK inhibitors to potentially separate the expression vs. functional phenotypes? The root of the OXPHOS component down-regulation should also be traced further, e.g. by probing into levels of core mitochondrial biogenesis factors. Are transcript levels of factors encoded by mtDNA also decreased?

      We appreciate this insight and agree that the discordance between mitochondrial content and function is fascinating and have addressed the concerns above in the following manner:

      - We have altered the title – we agree we cannot definitively say that the enhanced respiratory capacity observed is compensatory.

      - We have added experiments in NF639 cells in the presence of lapatinib, a tyrosine kinase inhibitor to interrogate whether HER2 is necessary for our functional outcome of interest – the enhanced respiratory capacity in the tumors. Lapatinib significantly suppressed respiration (Figure 10) demonstrating HER2 signaling directly regulates mitochondrial respiration.

      - We have expanded the discussion to provide further comment on potential explanations for increased respiratory function and low mitochondrial content.

      (6) The second interesting aspect of this study is the implication of mitochondrial activation in tumours, despite the downregulation of expression signatures, suggestive of a positive role for mitochondria in this tumour model. To address if this is correlative or causal, have the authors considered testing an OXPHOS inhibitor for suppression of tumorigenesis?

      Previous studies have eloquently highlighted that directly or indirectly inhibiting mitochondria can supress growth in HER2-driven breast cancer (PMID:31690671) or alternatively, amplification of mt-HER2 enhances tumorigenesis (PMID: 38291340). In many solid tumors, this is the concept of preclinical and clinical studies using IACS-010759 or similar inhibitors of OXPHOS which do suppress growth but have significant off target effects in healthy tissues (PMID: 36658425, 3580228We have expanded the discussion to ensure the reader is aware of these previous contributions and highlighted the importance of future work delineating the role of enhanced respiratory function in HER2-driven mammary cancer (lines 378-394).

      (7) A number of issues concerning animal/ tumour variability and further pathway dissection could be explored with in vitro approaches. Have the authors considered deriving tumourderived cell cultures, which could enable further confirmations, mechanistic drug studies and additional imaging approaches? Culture systems would allow alternative assessment of mitochondrial function such as Seahorse or flow cytometry (mitochondrial potential and ROS levels).

      We thank the reviewer for this suggestion – we have addressed this in part by using the NF639 HER2driven tumor epithelial line which demonstrated that HER2 regulates our observed respiratory response. Unfortunately, the addition of tumor derived cell cultures was not feasible or within the scope of our study. Animal and tumor variability has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (8) The study could be greatly improved with further confirmatory studies, eg immunoblotting for mitochondrial components with parallel blots for phospho-signalling in the same samples. It would be interesting if trends could be maintained in tumour-derived cell cultures. It is notable that OXPHOS protein/transcript changes are more consistent (Figure 5, Supplementary Figure 4) than mitochondrial dynamics /mitophagy factors (Figure 8). Core regulatory factors in these pathways should be confirmed by conventional immunoblotting.

      We thank the reviewer for this thoughtful comment. While we agree that additional confirmatory studies can be valuable, due to tissue quantity constraints and the number of assays required for our multi-omics analysis, extensive additional blots were not feasible. However, we had sufficient protein to provide select OXPHOS proteins to verify the proteomic data (now provided in S-Fig.4H). Furthermore, we have plotted the fold change of genes and proteins detected in both datasets and added this to Figure 4 (4A, B), further highlighting the consistency between our transcriptomic and proteomic findings. We believe that the highly consistent and concordant nature of our datasets collectively provides strong support for our central objective - determining whether mitochondrial content and respiratory function correlate in HER2-driven mammary tumors. The reproducibility of OXPHOS-related changes reinforces the robustness of our observations. We also appreciate the reviewer’s insight that OXPHOS alterations appear particularly consistent. In response, we have edited the discussion to further emphasize this point, especially in relation to the distinctive pattern observed for Complex V, which showed greater preservation relative to Complexes I–IV across several methods (lines 348-364). We comment on how this stoichiometric shift may contribute to intrinsic respiratory activation despite reduced mitochondrial content.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Further Minor points.

      (9) It would be helpful to know further details regarding the source of the tumour samples, particularly for the proteomics (N=5) and transcriptomics (N=6) datasets, since the exact timepoint of tissue harvest and number of tumours/mouse varied, according to the methods section. Were all samples from the omics studies from different mice (ie 11 mice)? B4 and B6 seem like outliers in mitochondrial transcriptomes. Are these directly paired eg with T4 and T6? Are the side-by-side pairs of Ben and Tum samples for blots in Figure 1 and Supplementary Figure 1 from the same mouse.

      This has been clarified in the Methods section (lines 424-429). Mitochondrial respiration experiments were performed in paired tissue (benign and tumor from same mouse). Transcriptomic, proteomic and histological analyses were performed on tumors and benign samples from different mice due to tissue limitations.

      (10) Further references and details are needed to support the methodology of the mitochondrial function tests (eg. nutrients vs pairing with complexes). What was the time point of nutrient supplementation? It would seem that the lipid substrates should take longer to activate OXPHOS than pyruvate/malate or succinate. Is this the case? Is there speculation as to why succinate supplementation is much more active than pyruvate+malate? What is +MD in Figure 6? The rationale for pooling data for Figure 7A is unclear since the categories appear to overlap: (pyruvate, malate, ADP) vs. (palmitoyl-carnitine, malate, ADP).

      Thank you for this comment. We have expanded the methods (lines 515-531) to provide additional detail on the mitochondrial respiration protocol. Briefly, permeabilized tissues were exposed to substrates delivered at supraphysiological concentrations in a sequential protocol lasting ~30–60 minutes. Under these conditions, mitochondrial respiration reflects the maximal capacity to utilize each substrate rather than the physiological time course of substrate mobilization or uptake that would occur in vivo with the influence of blood flow and transport/substrate availability limitations.

      (11) Many of the figures were blurry (Figure 1F, 2B) or had labels that were too small to be effective (Figures 1G, H, 2D-G, 3E-G, 5E-I, 7C, 8B).

      The font size of figure labels has been increased where possible and all figures have been exported to maximize resolution.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We agree with the reviewer that a limitation of our study is its focus on cell-based assays rather than in vivo experiments. We did consider evaluating the effects of statins on B cell responses in vivo; however, this approach is complicated by findings that statins can influence antigen presentation by dendritic cells, thereby impacting antibody responses (Xia et al, 2018). We have revised the discussion section to acknowledge this points.

      The reviewer also noted that our study assessed the roles of HMGCR, SQLE, and prenylation in B cell activation using pharmacological inhibitors and genetic knockdown/out approaches. Loss-of-function techniques such as RNAi, siRNA, and CRISPR can be challenging to apply to primary B cells, but we are exploring their feasibility for future revisions. While we acknowledge the limitations of using pharmacological inhibitors, we have taken several steps to mitigate these, including targeting multiple steps in the cholesterol biosynthetic pathway using structurally distinct inhibitors and conducting rescue experiments by supplementing downstream metabolites. To strengthen the results on prenylation further, we have added data using two further distinct prenylation inhibitors (revised Figure 6). To further investigate potential off-target effects of statins, we performed proteomic analysis of B cells treated with and without fluvastatin. The data suggest that fluvastatin primarily affects cholesterol metabolism and does not cause widespread off-target effects (new Supplementary Figure 9).

      Reviewer #1 (Recommendations for the authors):

      What signalling mechanisms link LPS sensing to proteomic and metabolic changes? Do these changes depend on specific signalling modules downstream of TLR4 (e.g., MyD88, TRIF, NF-kappaB, MAPKs)? Other receptors found to produce similar effects (TLR7, TLR9, CD40) may share these modules. This information could strengthen the conclusion by showing the chain of molecular events through which immune stimuli reprogram B cell metabolism.

      Signalling through most TLRs, including TLR4, TLR7 and TLR9, requires the adaptor protein MyD88. To determine if MyD88 is required for LPS-induced signalling, we carried out immunoblotting to compare signalling in B cells between WT mice and MyD88-deficient mice. We found that phosphorylation of key downstream proteins, including p38 and ERK1/2 (MAPK signalling), Akt, p70S6K and S6 (mTOR signalling) was diminished in MyD88-deficient mice (Figure S11). These results have been added to the manuscript as Supplementary Figure 11.

      We assessed the requirement of these signalling pathways for LPS-induced proliferation by treating B cells with rapamycin to block mTORC1, PD184352 for MEK1/MEK2 (the upstream activators of ERK1/2), VX745 for p38 or a combination of PD184352 and VX745. These results have been added to the manuscript as the new Figure 9. Rapamycin demonstrated the strongest inhibitory effect on proliferation, and combinatorial blocking of MAPK signalling mildly reduced proliferation (Figure 9A-B). In terms of cholesterol metabolism, treatment with all of these inhibitors reduced cholesterol levels; however, treatment with PD184352 and VX745 reduced cholesterol to the same level as naïve B cells (Figure 9F).

      Other activating stimuli appear to have similar effects, we showed originally that TLR7 and TLR9 activation had a similar effect on proliferation and cholesterol to TLR4, as did activation of CD40 and the BCR (Figure 10). We have now expanded this and shown that these other receptors can also promote protein synthesis (new Supplementary Figure 4).

      There seem to be errors in the manuscript text.

      (1) Page 6, line 232: ssRNAseq?

      We that the reviewer for spotting these issues. This has been amended to scRNAseq.

      (2) Page 13, line 490: SC7A5?

      This has been amended to SLC7A5

      (3) The abbreviation CF (cholesterol-free?) is not defined when it first appears.

      This has been amended to cholesterol-free (CF) on page 9, line 411.

      Reviewer #2 (Public review):

      The reviewer suggested that the study would be strengthened by determining whether the observed changes are specific to LPS + IL-4 stimulation or represent a more general B cell response to mitogenic signals. We believe that these effects are not specific to LPS and also occur with other mitogenic stimuli. We have expanded on the data in the original draft showing that other TLR agonists as well as CD40 and BCR stimulation increase both B cell proliferation and cholesterol levels and also looked at the effects of these stimuli on protein synthesis.

      Reviewer #2 (Recommendations for the authors):

      (1) One of the most highly enriched processes is 'response to interferon alpha'. This stands out as most of the other processes identified involve more general cellular processes (i.e., cell proliferation, cell metabolism, etc...). Minimally, interferon alpha should be discussed. It would also be interesting to test whether type I interferons regulate any of the metabolic changes identified.

      Response to interferon alpha has the highest fold enrichment of 6.78. To look at this further compiled a list of proteins upregulated by IFN-α stimulation in murine B cells, derived from (Mostafavi et al, 2016) and compared these with our proteome. We found that most of the IFNα regulated genes were not significantly upregulated following LPS + IL-4 stimulation compared to naïve B cells (Figure S3A). We also measured phosphorylation of the transcription factor STAT1, which is induced by IFNα and IFNβ signalling, and found that LPS stimulation did not induce p-STAT1 (Figure S3B-C). These results have been added to the manuscript as Supplementary Figure 3. Despite this, as discussed further in the manuscript we cannot rule out a weak interferon response in the proteomics.

      (2) The proteome of BCR-stimulated B cells has been analyzed by mass spectrometry. This dataset should be compared with the LPS + IL-4 dataset of the current study. This may reveal whether these two stimulations have similar or different effects on B-cell function. In particular, it is interesting to know whether BCR stimulation induces SLC7A5 expression and whether proteins involved in cholesterol metabolism are altered by BCR stimulation.

      A similar study using anti-IgM and anti-CD40 to activate murine B cells has found an upregulation of amino acid transporters, including SLC7A5, in their proteomic data, suggesting that this is not a stimulus-specific effect. This has been added to the text subsection “Protein synthesis in LPS + IL-4 stimulated B cells is dependent on the uptake of amino acids.” In line with this we have also shown that stimulation of the BCR upregulates protein synthesis (new Supplementary Figure 4). We have added data on HMGCR, SQLE and LDLR form the BCR proteomics experiments to the new Supplementary Figure 13. As the BCR proteome published as a preprint (James et al 2024) is about to be resubmitted as a distinct paper that does not deal with cholesterol metabolism, we have not expanded on this dataset further.

      (3) A role for mTORC1 has been shown for proteome remodelling following BCR stimulation of naïve B cells, regulating the expression of amino acid transporters. Is mTORC1 involved in any of the changes detected following LPS + IL-4 stimulation? (i.e., cell proliferation, ribosome biogenesis, amino acid transport, cholesterol biogenesis).

      To determine the importance of mTORC1 for B cell function, we treated B cells with rapamycin. We found that rapamycin treatment slightly reduced protein synthesis (Figure S12A) and amino acid uptake (Figure S12B). These results have been added to the manuscript as Supplementary Figure 12. Rapamycin reduced cholesterol to almost the levels in naïve B cells (new Figure 9F) and had a significantly inhibitory effect on proliferation (new Figure 9A-B).

      (4) Analysis of Slc7a5 knockout B cells showed that SLC7A5 is required for LPS-induced proliferation (Figure 4G). Is SLC7A5 required for B cell growth following LPS + IL-4 stimulation? Is SLC7A5 required for BCR-induced B cell proliferation/growth?

      There appears to be a misunderstanding, as Figure 4G compares proliferation between WT and SLC7A5 KO B cells following LPS + IL-4 stimulation and not LPS stimulation alone.

      Unfortunately, we no longer have access to Slc7a5fl/fl/Vav-iCre+/- mice and will not be able to measure CTV staining for proliferation following BCR stimulation. However, a similar study using anti-IgM and anti-CD40 to activate murine B cells found that B cells from Slc7a5fl/fl/Vav-iCre+/- mice were significantly smaller, had reduced expression of the chaperone protein CD98 and impaired expression of the transferrin receptor CD71, which is required for iron uptake, compared to WT B cells (James et al, 2024).

      (5) The expression of several key proteins (regulating proliferation/amino acid transport/cholesterol metabolism) is shown to be significantly upregulated by LPS + IL-4 stimulation of naïve B cells. It would be interesting to determine whether these increases result from induced transcription of the relevant genes. This could initially be assessed by qRT-PCR analysis of LPS + IL-4 stimulated primary B cells, or alternatively, mining of online RNAseq datasets.

      We mined RNA-Seq data from C57BL/6 mice (Tesi et al, 2019) which compared naïve B cells and B cells after 2,4, or 8 hours of LPS stimulation. We found that the transcription of genes that coded for the amino acid transporter SLC7A5/SLC3A2 (Figure S6A-B) and key genes involved in cholesterol metabolism followed the same pattern of upregulation as our proteomic data (Figure S6C-F). These results have been added to the manuscript as a new Supplementary Figure 6.

      (6) Cholesterol levels are shown to be increased following resiquimod, CpG, anti-IgM, and CD40L stimulation (Figure 9). What effect do these agonists have on levels of HMGCR, SQLE, and LDLR in B cells? Is B-cell growth by these agonists impaired by Fluvastatin.

      We found that stimulation of murine B cells with either IL-4, anti-IgM or anti-CD40 could increase levels of HMGCR, SQLE and LDLR, with the largest increase seen with a combination of these stimuli (Figure S13A-D) (James et al, 2024). These results have been added to the manuscript as Supplementary Figure 13.

      Figures 10C-E show that B cell growth, survival and proliferation are impaired by Fluvastatin after Resiquimod, CpG, anti-IgM, and CD40L stimulation, although we do not have proteomic data from these stimuli to confirm the levels of HMGCR, SQLE and LDLR.

      We carried out proteomics after 24 hours of LPS + IL-4 stimulation in normal/CF media, with or without Fluvastatin. We found that Fluvastatin treatment in normal media increased the expression of HMGCR, SQLE and LDLR. Fluvastatin treatment in CF media had the highest increase in the expression of these key proteins (Figure S9G-J). These results have been added to the manuscript as Supplementary Figure 9.

      (7) Do Fluvastatin or FGTI-2734 affect early activation of signaling pathways by LPS + IL-4 stimulation of B cells? (eg. MAPKs, STATs, PI3K/AKT).

      This is an interesting question, we will pursue this in our future work.

      References:

      James O, Sinclair LV, Lefter N, Salerno F, Brenes A & Howden AJM (2024) A proteomic map of B cell activation and its shaping by mTORC1, MYC and iron. bioRxiv 2024.12.19.629506 doi:10.1101/2024.12.19.629506

      Xia Y, Xie Y, Yu Z, Xiao H, Jiang G, Zhou X, Yang Y, Li X, Zhao M, Li L, et al (2018) The Mevalonate Pathway Is a Druggable Target for Vaccine Adjuvant Discovery. Cell 175: 1059-1073.e21

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Adult laboratory mice produce ultrasonic vocalizations during free social interactions, as well as lower-frequency, voiced calls (squeaks) during aversive contexts. The question of whether mice possess a more complex repertoire of vocalizations has been of great interest to scientists studying rodent vocal behavior. In the current study, the authors analyze the rates and acoustic features of vocalizations produced by pairs of mice that are allowed to interact across a barrier, which prevents direct physical interaction. In this context, they find that same-sex (but not opposite-sex) pairs of mice produce vocalizations that are lower in frequency than the typical 70 kHz ultrasonic vocalizations produced during free interactions and that are also distinct from squeaks. These lower frequency vocalizations were observed in both male-male and female-female pairs, as well as in same-sex pairs from multiple mouse strains. The authors also report that call rates and acoustic features are not affected in male-male pairs that have been treated with the anxiolytic drug buspirone, suggesting that anxiety is not a major driver of vocalization in this behavioral context.

      Strengths:

      (1) The observation that same-sex pairs of mice produce lower frequency (<70 kHz) vocalizations in this behavioral context is novel.

      (2) The consideration of multiple types of pairs (female-female, male-male, and female-male), as well as the inclusion of multiple strains of mice and barriers with different hole diameters, are all strengths of the study.

      (3) The authors include detailed analyses of vocalization acoustic features, as well as detailed tracking of mouse positions relative to the barrier.

      Weaknesses:

      The categorization applied to vocalizations based on their mean frequencies is poorly supported and ignores the distinction in laryngeal production mechanism between voiced and ultrasonic vocalizations. Specifically, the authors are likely lumping together voiced and ultrasonic vocalizations into their "low frequency" (< 30 kHz) category, while they reserve the term "ultrasonic" exclusively for the subset of ultrasonic vocalizations with the highest mean frequencies (> 50 kHz). This categorization scheme also does not align well with past work on lower frequency rodent vocalizations, which complicates the comparison of the present findings to that past work.

      We thank the reviewer for their assessment. Firstly, we did not use mean frequencies, but peak frequencies of each single call.

      The distinction between ‘voiced’ and ‘whistled’ vocalizations based on their spectral-temporal features is hardly possible. While evidence in form of audio recordings made from both deer mouse and grasshopper mouse in helium-enriched air suggests vocalizations with lower fundamental frequency being ‘voiced’ (Pasch et al., 2017; Riede et al., 2022), a computational model considering the laryngeal anatomy of Mus musculus estimates fundamental frequencies of vocalizations at subglottal phonation threshold pressures usual for USVs to be in the range of 1 – 5 kHz and approaching 10 kHz for higher subglottal pressures usually found in the production of ‘voiced’ vocalizations (Pasch et al., 2017). Furthermore, a recent study in the singing mouse (Scotinomys teguina) found minimal fundamental frequencies of single song notes, produced by a whistle mechanism, to be about 4 kHz (Zheng et al., 2025). Thus, the presence of low fundamental (peak) frequencies in mouse vocalizations alone appears to be insufficient for deducing the production mechanism of these vocalizations.

      We did not observe differences in acoustic features clearly separating our ‘LFV’ calls into two groups suggestive of different production mechanisms. Thus, we cannot rule out that our ‘LFV’ class contains vocalizations produced by different mechanisms. However, we did not observe any squeaks in our experiments and can therefore rule out that this prominent type of ‘voiced’ call is lumped together with other calls in the ‘LFV’ calls.

      While the questions regarding production mechanism, the neurocircuitry involved, and the context-dependent choice of which mechanism to use is intriguing/enticing, the distinction between ‘voiced’ and ‘whistled’ vocalizations lies beyond the scope of our manuscript. Instead, the neurocircuitry involved in mouse vocalization production, particularly USVs and squeaks has been revealed by other laboratories. Optogenetical activation of RAm Nts neurons elicited emission of both audible vocalizations (fundamental frequencies of 10 kHz and below) and USVs in awake mice in a stimulus-dependent manner (Veerakumar et al., 2023). Furthermore, optogenetical activation of RAm-vocalization neurons led to immediate measurable adduction of vocal folds and emission of canonical USVs (Park et al., 2024). While different populations of PAG neurons are responsible for the production both squeaks and USVs (Ziobro et al., 2024), the two input streams seem to converge on RAm vocalization neurons, as silencing the output of these neurons abolished both squeak and USV emission completely (Park et al., 2024). Thus, while near complete closing of the vocal folds is necessary for the production of canonical USVs (Mahrt et al., 2016; Park et al., 2024), it is not clear which degree of vocal fold opening would result in what fundamental frequencies.

      We will add a paragraph on this issue to the discussion in the next version of the manuscript.

      In some analyses, the authors report that different groups of mice produce different relative proportions of vocalization types (as defined by mean frequency) but then compare acoustic features of vocalizations between groups after pooling all vocalizations together. The analyses of acoustic features conducted in this way may be confounded by the different proportions of vocalization types across groups.

      We displayed the relative distribution of the different call classes demonstrating that 80% of the call repertoire during the separation consisted of noisy calls and ‘LFV’. Thus, the per individual averaged acoustic features e.g. peak frequency would be predominantly shaped by the features of these two call classes. However, we agree with the reviewer’s criticism and will provide a more detailed display and analysis of the acoustic features of each call class.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine vocal communication during same-sex dyadic interactions in mice, comparing periods of physical separation (with limited sensory access) to direct social contact. They report that separation dramatically alters the vocal repertoire, shifting it away from canonical ultrasonic vocalizations (USVs) toward low-frequency vocalizations (LFVs) and broadband "noisy" calls. While LFVs and noisy calls have been described previously, largely in aversive contexts, this study provides a detailed, systematic characterization of these vocalizations during social interactions, thereby extending prior work.

      The authors explore several experimental manipulations and analyses, including divider hole size, strain and sex differences, anxiolytic drug treatment, and correlations with spatial proximity, to infer potential functions of these call types. Although the dataset is rich, the results are largely descriptive, and many conclusions remain tentative. Several experimental variables are not fully controlled, and in some cases, the interpretation exceeds what the data can clearly support. Nonetheless, with improved experimental framing, additional analyses of existing data, and a clearer discussion of limitations, this work has the potential to make a valuable contribution by broadening the field's focus beyond USVs to understand a wider vocal repertoire relevant to social context.

      Strengths:

      Much work on mouse vocal communication focuses almost exclusively on USVs. This manuscript convincingly demonstrates that non-USV vocalizations (LFVs and noisy calls) are prominent and systematically modulated by social context, highlighting an underappreciated dimension of mouse communication. Furthermore, the authors employ several experimental manipulations, including sensory access, strain, sex, and pharmacological treatment, to assess changes in vocalization repertoire. This provides a valuable resource for the field and reveals robust context dependence of vocalization. The discussion is thoughtful and integrative, particularly in its consideration of potential communicative roles of LFVs and noisy calls and their relationship to sensory constraints and signal propagation, although these ideas will require further experimental validation.

      Weaknesses:

      There are several concerns regarding experimental design and data interpretation that could be addressed to strengthen the manuscript.

      (1) The terminology used for vocalization types is confusing and needs better clarification. The authors refer to Grimsley et al. (2016) multiple times, yet they use the same names for their vocalizations while applying different definitions. This makes it very difficult to compare the two papers. Since this study and Grimsley et al. use different mouse strains (FVB vs CBA), a direct comparison of absolute frequencies may also not be appropriate. Please explicitly clarify the definitions of the call types (e.g., frequency range, voiced vs. USV) and explain how they relate to those in the previous study earlier in the manuscript.

      The existence and use of various distinct classification systems for mouse vocalizations is well known and the need to agree on a common classification system is consensus in the field. Thus, it was not our intention to complicate mouse call classification even more.

      Grimsley at al. (2016) reserve the ‘low frequency’ band solely for squeaks (or “low frequency harmonics”). Hence, it appears straight forward to name mouse calls with “mean dominant frequencies” falling between squeaks and USVs, “mid-frequency tonal vocalizations (MFVs)” (Grimsley et al., 2016). We did not observe the emission of squeaks in our experiments, but instead we observed tonal vocalizations in a peak frequency spectrum encompassing both squeaks and Grimsley and colleagues’ ‘MFVs’, representing the lowest peak frequencies we observed (< 32 kHz). Furthermore, we observed vocalizations in the range of 32 – 50 kHz (which were not low frequency components of canonical USVs) and of > 50 kHz (corresponding to canonical USVs). Leaning on the terminology of Grimsley and colleagues (2016), we thought it to be straightforward to name these call classes according to their location on the frequency spectrum: low frequency vocalizations (LFVs; up to 32 kHz), encompassing squeaks, but also Grimsley and colleagues’ MFVs, middle frequency vocalizations (MFVs; 32 – 50 kHz), and finally canonical USVs (> 50 kHz). Admittedly, choosing ‘MFVs’ for mouse calls with different acoustic features than those described by Grimsley and colleagues (2016) has caused unnecessary confusion. We therefore consider adapting our classification scheme for the next version of the manuscript.

      Regarding the comparison of call classes between different mouse strains, strain differences of spectral-temporal features of call classes have been described for canonical USVs (e.g. Scattoni et al., 2008). However, the acoustic features as well as call repertoire are still quite comparable. Furthermore, we have additionally tested both CBA/J and C57BL/6J mice in our study confirming the presence of both noisy calls, ‘LFVs’, ‘MFVs’, and ‘USVs’ in the vocal repertoire of these two strains.

      We will provide a more detailed display and analysis of the acoustic features of the call classes with the next version of the manuscript.

      (2) In the initial experiment, mice always experience separation first (15 minutes), followed by unification (5 minutes), using novel same-sex dyads. Multiple factors besides physical contact could influence vocalization across this sequence, including habituation to the arena, reduced anxiety over time, or increasing familiarity with the partner despite physical separation. It is unclear whether the authors have tested the reverse order (unification first, followed by separation). If not, this limitation should be explicitly acknowledged. In addition, examining whether vocalizations or behaviors change over the course of the 15-minute separation period, for example, by comparing early vs late phases, could help disentangle effects of habituation from those of physical separation per se.

      We had not tested mice in the reverse order, beginning with 5 minutes of unification followed by 15 minutes of separation. Therefore, we acknowledge this limitation of our study and will address it explicitly in the next version of our manuscript. We appreciate the reviewer’s note regarding the inclusion of vocalizations over time and aim to provide this analysis in the next version of the manuscript.

      (3) The conclusion that separation-induced LFVs are unlikely to be anxiety-driven may overinterpret the buspirone experiment (Figure 8). Vehicle injections themselves produced large changes in call rate and call-type distribution, raising concerns about stress or arousal induced by the injection procedure. Comparisons between buspirone-treated animals and untreated animals are therefore problematic, as these groups differ in their experimental histories, including the number of exposures. The manuscript would benefit from independent measures confirming the anxiolytic efficacy of buspirone compared to vehicle injection in this paradigm, such as behavioral readouts of anxiety. In addition, the experimental design requires a clearer description. It is not always clear whether the same dyads were tested twice, or how social familiarity, contextual familiarity, and habituation to injections were handled. Male data comparing first and second exposures should also be included as supplementary figures to allow direct comparison with the excluded female dataset.

      We agree with the reviewer’s point that the injection procedure itself appeared to have an impact on vocalization behavior. In fact, we had included the ‘untreated’ cohort in Fig. 8 despite their different experimental history to appreciate the potential impact of injection onto vocal behavior.

      Furthermore, we appreciate the reviewer’s point of confirming the anxiolytic effect of buspirone treatment with further behavioral readouts and aim to provide such analysis in the next version of the manuscript.

      Regarding the reviewer’s query for clearer experimental design description, the same dyads were tested twice. All mice lived in groups in their home cage, however, they had not met the individual they would face during the experiment before the first experiment. We will improve the description of the experimental design addressing the reviewer’s points in the next version of the manuscript.

      (4) The idea that noisy calls function to attract conspecific attention is intriguing. However, in Figure 5, all call types, including LFVs and USVs, are most likely to occur when mice are already in close proximity during separation, which seems inconsistent with a long-distance signaling role. Analyses of the temporal relationship between vocalizations and behavior would strengthen this claim. For example, it would be informative to test whether bouts of noisy calls precede approach behavior or a reduction in inter-animal distance. Examining whether calls occur before, during, or after orientation toward the partner could further clarify whether these vocalizations actively modulate social behavior.

      We appreciate the reviewer’s remarks regarding the apparent inconsistencies between noisy calls as conspecific attraction calls and their occurrence in close mouse-to-mouse proximity. We must concede that the size of our testing arena limited the maximum distances mice could achieve. Thus, we aim to provide a more extensive analysis including approach behavior and changes of inter-animal distances for resubmission of the manuscript as suggested by the reviewer.

      (5) The effects of divider hole size on vocal repertoire are striking but difficult to interpret. Unexpectedly, small holes and no holes yield similar call distributions, whereas large holes produce a markedly different profile dominated by LFVs, which also differs from free interactions. If large holes allow greater tactile or close-range interaction, the reduction in USVs and MFV is counterintuitive. Incorporating behavioral metrics such as distance, orientation, or specific interaction types alongside call classification would greatly aid interpretation and help link vocal output to interaction quality rather than divider type alone.

      We agree with the reviewer that the interpretation of the divider-hole-size-experiment are difficult and following this reviewer’s input, aim to provide additional behavioral analysis for the effect of divider hole size with the next version of the manuscript.

      (6) Throughout the study, vocalizations are pooled across both animals in the dyad. Because the arena is neutral rather than a home cage, either animal could be initiating vocalization. Assigning calls to individuals, where possible, using spatial or acoustic cues, would substantially strengthen functional interpretations. Even limited analyses, e.g., identifying which animal vocalizes first or whether calls precede approach by the partner, could provide important insight into the communicative role of different call types.

      We agree with the points raised by the reviewer regarding the importance of assigning recorded calls to the respective individual for deciphering the communicative role of different call types. Unfortunately, our system was only equipped with one condenser microphone therefore we are not able to assign calls to individual mice.

      Literature:

      Grimsley, J. M. S., Sheth, S., Vallabh, N., Grimsley, C. A., Bhattal, J., Latsko, M., Jasnow, A., & Wenstrup, J. J. (2016). Contextual Modulation of Vocal Behavior in Mouse: Newly Identified 12 kHz „Mid-Frequency“ Vocalization Emitted during Restraint. Frontiers in Behavioral Neuroscience, 10, 38. https://doi.org/10.3389/fnbeh.2016.00038

      Mahrt, E., Agarwal, A., Perkel, D., Portfors, C., & Elemans, C. P. H. (2016). Mice produce ultrasonic vocalizations by intra-laryngeal planar impinging jets. Current Biology: CB, 26(19), R880–R881. https://doi.org/10.1016/j.cub.2016.08.032

      Park, J., Choi, S., Takatoh, J., Zhao, S., Harrahill, A., Han, B.-X., & Wang, F. (2024). Brainstem control of vocalization and its coordination with respiration. Science (New York, N.Y.), 383(6687), eadi8081. https://doi.org/10.1126/science.adi8081

      Pasch, B., Tokuda, I. T., & Riede, T. (2017). Grasshopper mice employ distinct vocal production mechanisms in different social contexts. Proceedings. Biological Sciences, 284(1859), 20171158. https://doi.org/10.1098/rspb.2017.1158

      Riede, T., Kobrina, A., Bone, L., Darwaiz, T., & Pasch, B. (2022). Mechanisms of sound production in deer mice (Peromyscus spp.). The Journal of Experimental Biology, 225(9), jeb243695. https://doi.org/10.1242/jeb.243695

      Scattoni, M. L., Gandhy, S. U., Ricceri, L., & Crawley, J. N. (2008). Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS One, 3(8), e3067. https://doi.org/10.1371/journal.pone.0003067

      Veerakumar, A., Head, J. P., & Krasnow, M. A. (2023). A brainstem circuit for phonation and volume control in mice. Nature Neuroscience, 26(12), 2122–2130. https://doi.org/10.1038/s41593-023-01478-2

      Zheng, X. M., Harpole, C. E., Davis, M. B., & Banerjee, A. (2025). Vocal repertoire expansion in singing mice by co-opting a conserved midbrain circuit node. Current Biology: CB, 35(23), 5762-5778.e6. https://doi.org/10.1016/j.cub.2025.10.036

      Ziobro, P., Woo, Y., He, Z., & Tschida, K. (2024). Midbrain neurons important for the production of mouse ultrasonic vocalizations are not required for distress calls. Current Biology: CB, 34(5), 1107-1113.e3. https://doi.org/10.1016/j.cub.2024.01.016

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors were seeking to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, the authors sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). The authors used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. They observed that while TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple non-polar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. They confirmed this mechanism using an additional set of simulations and used it to explain experimental electrophysiology data,

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The authors develop their own forcefield parameters for the RY785 molecule based on extensive QM based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the single channel conductance. The authors have performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The authors conclude that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits K+ current. This conclusion is plausible given that RY785 makes stable contacts with multiple hydrophobic residues in the S6 helix, which they can also validate using a recently published closed-state Kv2.1 channel cryo-EM structure. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The authors, however, did not directly observe this semi-closed channel conformation and in fact acknowledge that more direct simulation evidence would require extensive enhanced-sampling simulations beyond the scope of this study. They have not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the authors quantified K+ permeation, they have not made any estimates of the ligand binding affinities or rates, which could have been potentially compared to experiment and used to validate their models.

      However, despite those relatively minor weaknesses, the conclusions of the study are convincing, and overall this is a solid study helping us to understand two distinct molecular mechanisms of the voltage-gated potassium channel Kv2.1 inhibition by TEA and RY785, respectively.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, Zhang et al. investigate the conduction and inhibition mechanisms of the Kv2.1 channel, with a particular focus on the distinct effects of TEA and RY785 on Kv2 potassium channels. Using microsecond-scale molecular dynamics simulations, the authors characterize K⁺ ion permeation and RY785-mediated inhibition within the central pore. Their results reveal an inhibition mechanism that differs from those described for other Kv channel inhibitors.

      Strengths

      The study identifies a distinctive inhibitory mode for RY785, which binds along the channel walls in the open-state structure while still permitting a reduced level of K⁺ conduction. In addition, the authors propose a long-range allosteric coupling between RY785 binding in the central pore and changes in the structural dynamics of Kv2.1. Overall, this is a well-organized and carefully executed study, employing robust simulation and analysis methodologies. The work provides novel mechanistic insights into voltage-gated potassium channel inhibition and may offer useful guidance for future structure-based drug design efforts.

      Weaknesses:

      The study needs to consider the possibility of multiple binding sites for PY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-023-39307-6). These advances illustrate that ligand effects cannot always be interpreted based solely on a single binding site identified previously.

      Reviewing Editor: 

      The comments of the reviewers seem thoughtful and constructive. The weaknesses noted in reviews mainly concern mismatch between expectations, created by reading the Abstract, and data in the manuscript. The mismatch could be reconciled by either new simulations examining a semi-open state of the gate and additional RY785 binding sites, or by adjusting wording of the Abstract and Discussion to make it more clear that such simulations were not done. 

      The Abstract and Discussion have been revised to make clear the computer-simulations presented in our study were designed to specifically validate or refute the hypothesis that RY785 is recognized by the pore domain, not the voltage sensors. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      The authors addressed all the major issues in the original submission identified by the reviewers. I noticed a few minor issues, listed below, which can potentially fix small errors and further improve the readability of the manuscript. 

      p.3 tetramethyl-ammonium -> tetraethylammonium 

      p.7 "Snapshot of the final snapshot" -> "Snapshot of the final simulation coordinates" 

      p. 8 "sigma value" - please spell out what it is. 

      p. 9 "one or other subunit of the tetramer" -> "one or another subunit of the tetramer" or "one or more subunits of the tetramer" 

      p 15 "(the net charge of these constructs is thus zero)." -> ""(the net charge of these constructs is zero for these systems)." Please note that using ionizable amino acid residues in their default protonation state does not guarantee net zero charge of the system since the number of cationic and anionic residues is generally not the same. 

      p. 15 "Two K+ ions were initially positioned in the selectivity filter, one coordinated by residues 373..." Please indicate at which ion binding sites S_1, S_2, e.g. K+ were located and what the residue names are . 

      SI Figs. S3-S20. Please indicate in the figure captions that all those data are for RY785 

      SI Fig. S22 and SI Table S1 captions "shown in Fig. S20" -> "shown in Fig. S21" 

      We thank the Reviewer for this thorough proofreading. We have made the necessary corrections. 

      Reviewer #2 (Recommendations for the authors): 

      The authors have addressed most of my comments satisfactorily, with the exception of the first point. Below, I provide further clarification regarding my concern. 

      First, it appears that the authors may have misunderstood what is meant by the possibility of multiple binding sites for RY785. This does not imply that the central pore is excluded as a binding site. Rather, it refers to the possibility that, in addition to a pore-domain site, the ligand may interact with additional binding sites, either simultaneously or in a statedependent manner. Increasing structural and functional evidence supports the existence of multiple ligand-binding modes in voltage-gated ion channels. For example, polyunsaturated fatty acids have been shown to bind to KCNQ1 at both the voltage sensor domain and the pore domain (https://doi.org/10.1085/jgp.202012850). Similarly, cannabidiol has been structurally resolved in Nav1.7 at two distinct sites, one in a fenestration and another near the IFM-binding pocket (https://doi.org/10.1038/s41467-02339307-6). These advances illustrate that ligand ecects cannot always be interpreted based solely on a single binding site identified previously. Therefore, even if one assumes that there is no precedent for a small-molecule inhibitor that simultaneously acts on both the voltage sensor and pore domain, this does not exclude the possibility that a ligand may bind to both regions in dicerent functional states.  

      The Reviewer’s opinion came across clearly in the previous version. We however disagree that a computational investigation of the possibility that RY785 binds to the voltagesensors is well-advised at this point, given that the model we propose seemingly ocers a rationale for the inhibitory ecects observed experimentally. Our opinion is also that there is no compelling precedent for the mechanism of inhibition envisaged by the Reviewer – and would argue that neither of the two studies referenced above are compelling examples.  As we stated in our previous response to the Reviewer, we believe that the logical next step in this research will be to validate or refute the computational prediction we have put forward, experimentally. 

      In addition, the present computational study does not provide direct mechanistic evidence to explain the statement that RY785 accelerates voltage-sensor deactivation. Specifically, no simulations were performed to model pore-domain closure or voltage-sensor motion upon RY785 binding. Moreover, alternative binding sites were neither explored nor explicitly excluded, as the simulations only involved placing a single molecule of TEA or RY785 approximately 10 Å below the cytoplasmic gate. Under these conditions, conclusions regarding ecects on voltage-sensor dynamics remain speculative. 

      That is a fair characterization. 

      These concerns do not detract from the overall quality of this otherwise strong computational study. There are several straightforward ways to address this issue. For example: 

      (1) Perform molecular docking or related screening approaches to evaluate potential ligand-binding sites beyond the central pore, particularly in regions proximal to the voltage sensor. This should not impose a substantial additional computational burden for a computational chemistry group. 

      (2) Revise the abstract and discussion to clarify that the current work focuses exclusively on pore-domain binding and does not explore possible additional binding sites near the voltage sensor. Explicitly stating this limitation would help prevent potential overinterpretation by readers.

      We have opted for (2), as noted above.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using electron microscopy, the authors report discontinuities in the plasma membrane of C. elegans embryos. They associate these discontinuities with cell division and speculate that membrane rupture and subsequent resealing contribute to cytokinesis. They further discuss the proximity of these sites to vesicles and propose a role for vesicle-mediated membrane extension. 

      Weaknesses:

      (1) The possibility that the membrane discontinuity is an artifact

      Although the authors focus on discontinuities in the plasma membrane, similar discontinuities are also observed in mitochondria, the nuclear envelope, and yolk granules. This raises concerns about whether the electron micrographs presented are suitable for assessing membrane continuity.

      Electron micrographs result from a lengthy sample preparation process, including high-pressure freezing, freeze substitution in acetone containing OsO4, gradual warming, uranyl acetate staining, resin embedding, and ultrathin sectioning. In general, lipids are soluble in acetone at temperatures above −30 {degree sign}C, and preservation of membrane structures relies heavily on efficient OsO4 fixation.

      Insufficient OsO4 treatment would be expected to reduce membrane contrast.

      C. elegans embryos are encapsulated by an eggshell that forms at fertilization and gradually develops during the first few cell divisions. It is unclear how efficiently OsO4 in acetone penetrates the eggshell during freeze substitution, raising further concern about plasma membrane preservation under the conditions used.

      We thank the reviewer for raising this important technical concern. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts. Below, we present multiple lines of evidence—ranging from technical reproducibility to orthogonal imaging approaches—that collectively demonstrate the biological reality of these structures.

      (1) Technical expertise and standard protocols

      Our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well-established. Thus, the observed membrane discontinuities are unlikely to stem from technical inexperience or idiosyncratic methods.

      In addition to membrane discontinuities, we would like to emphasize that a large number of single plasma membranes separating adjacent cytoplasmic domains were also detected under EM (Figure 1, 3 and 4, for instance). This observation is particularly significant because the invagination model cannot generate single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes could explain the formation of cytoplasm-enclosed membranes. Furthermore, as the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, this indicates successful EM processing and argues against inefficient fixation or other technical issues.

      (2) Reproducibility across independent preparations and techniques

      To test whether the discontinuities were preparation-specific, we examined four independent sample batches collected in the lab over the years. Membrane discontinuities, as well as cytoplasm-immersed membranes, on embryonic cells were consistently observed across all batches, indicating that the phenomenon is not dependent on a single preparation method. Furthermore, we validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dualbeam scanning electron microscopy (SEM). Membrane discontinuities were clearly identifiable with both techniques, further supporting their robustness.

      (3) Validation using an independent public dataset

      We examined the publicly available C. elegans embryo EM collection (WormAtlas). In several instances, particularly at the embryonic periphery where plasma membrane discontinuities are more readily visualized (https://www.wormimage.org/image.php?id=140265&page=1), we identified similar structures. The presence of these features in an independent dataset generated by different researchers confirms that they are not artifacts unique to our sample preparation.

      (4) Developmental regulation of membrane discontinuities

      We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (5) Rigorous criteria for identifying membrane discontinuities

      To ensure unbiased analysis, we systematically collected images from early embryonic cells using the following criteria:

      (1) Random section selection: For each sample, we randomly selected one section containing the largest number of embryos or cells (Sup Figure 2) for initial analysis. We found membrane discontinuities in 159 cells distributed across 57 embryos, representing 95% of the total sampled embryos This portion of the data is summarized in Figure 1.

      (2) Whole-membrane examination: Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation.

      (3) Neighboring section verification: Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      (4) Serial section reconstruction: To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments.

      First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Y-shaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (6) Orthogonal validation by live imaging: In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both membrane ruptures and free-ended sister membrane structures could be detected (Figures 6), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase or metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      (2) Lack of evidence linking membrane discontinuity to cell division 

      The reported plasma membrane discontinuities are not specific to mitotic cells. If this were a physiological process playing an important role in cytokinesis, it should occur in a temporally and spatially coordinated manner with nuclear division. However, it remains unclear at what stage of the cell cycle the membrane rupture occurs and where it is located relative to chromosomes and the mitotic spindle.

      Thank you for this insightful comment. We agree that establishing a direct link between plasma membrane discontinuities and mitotic progression is critical, and we appreciate the opportunity to clarify this point.

      In C. elegans embryos, the early stages of development are characterized by rapid and extensive cell division. Within approximately 100 minutes, a two-cell embryo develops into an embryo containing nearly 30 cells. The majority of the electron microscopy analyses in our study were performed on embryos at stages with fewer than 30 cells, where most cells are actively dividing. Thus, it is reasonable to infer that the cells exhibiting membrane discontinuities are predominantly mitotic cells.

      Supporting this notion, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of membrane discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity strongly suggests that membrane discontinuities are tightly linked to cell division.

      Importantly, mitotic features such as metaphase chromosomes aligned at the equatorial plane or two (or more) nuclei sharing common cytoplasm can be identified in EM images. In our single random EM section analysis, we captured membrane discontinuities in cells at metaphase, anaphase (characterized by fewer than 10 chromosomal clumps), and telophase (defined by two nuclei sharing cytoplasm). Hence, membrane discontinuities are indeed present on mitotic cells. In addition, a published work by Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy captured similar membrane discontinuities in cells displaying classical mitotic features, including anaphase or telophase.

      To further investigate the spatial relationship between membrane ruptures and chromosome organization, we performed three-dimensional reconstructions on a metaphase cell. As shown in Figure 2 and Video S1, the membrane discontinuities largely encircled the condensed chromosome disc and were spatially aligned with the future segregation zone, further revealing the relative location of membrane discontinuities to chromosomes, at least at metaphase.

      We further collected 3D information for a telophase cell containing three nuclei. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site that merged to form a single rupture. The observation that membrane ruptures are present in a tri-nucleated cell is particularly informative. The tri-nucleated feature indicates that this cell underwent two rounds of cell division and that both divisions were at telophase. The presence of a single membrane rupture suggests that membrane discontinuities may persist throughout the cell cycle, as the second cell cycle began from a mother cell that still shared cytoplasm with its sister cell and already had one membrane rupture. Therefore, in addition to the mitotic phase, membrane discontinuities—at least in this context—also exist during the DNA synthesis stage.

      (3) Lack of evidence for extension of the separated membrane 

      Although the authors speculate that resealing of the ruptured membrane occurs via extension of the separated membrane, no direct evidence supporting this mechanism is presented. Proximity to vesicles alone does not demonstrate that membrane extension occurs through vesicle fusion. More direct evidence is required to support this claim.

      Thank you for raising this important point. We appreciate the opportunity to clarify our conclusion.

      In our study, EM analysis revealed the presence of cellular vesicles in close proximity to both free membrane edges and the already separated sister plasma membranes (Figure 4). However, we acknowledge that without advanced live-cell imaging, it is not possible to conclusively determine whether the extension of these separated sister membranes occurs through vesicle fusion.

      We realize that a statement in the Discussion section—“The expansion of the plasma membrane is generally driven by vesicle fusion”(page 16)—may have inadvertently led the reviewer to interpret this as our own conclusion regarding the mechanism of membrane extension in this context. In fact, that statement was intended to reflect the current general understanding of membrane expansion, not to imply that we had demonstrated such a mechanism for the free-ended sister membranes. As we subsequently noted, “However, this remains speculative and requires further experimental validation.”

      To avoid any misunderstanding, we will revise this section to clearly state that the mechanism by which the separated sister membranes extend remains unknown and that further investigation is needed to determine how existing models of membrane expansion may apply to or be adapted for this novel context.

      We thank the reviewer again for their thoughtful comment, which has helped us improve the clarity of our manuscript

      (4) Inconsistency with published work

      Numerous studies have examined cell division in developing C. elegans embryos using the GFP::PH(PLC1δ1) marker expressed from the ltIs38 transgene [pAA1; pie-1::GFP::PH(PLC1δ1) + unc-119(+)], generated by the Oegema lab (https://wormbase.org/species/c_elegans/transgene/WBTransgene00000911#01--10 ). To date, no study has reported membrane ruptures of the magnitude described here. The complexity of cell surface morphology from the 8- to 12-cell stages onward has been well documented, for example, by Fu et al. (2016) using light-sheet microscopy and 3D reconstruction (doi:10.1038/ncomms11088).

      Supplementary Movies 5, 6, and 10 of this paper illustrate how single-plane images can easily produce apparent membrane discontinuities, for example, due to membrane orientations nearly parallel to the imaging plane.

      The three single-plane images from only three embryos presented in Figure 6 are insufficient to support the authors' strong conclusions. Raw 3D data should be provided.

      Thank you for this important comment. We fully agree that the GFP::PH(PLC1δ1) marker, generated by the Oegema lab, has been widely and effectively used to study various aspects of C. elegans embryonic development. In fact, we also employed this same marker in our study to assess membrane integrity.

      However, while live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using this same GFP marker have in fact revealed membrane discontinuities that went unnoticed. For example, Fu et al (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088) using light-sheet microscopy and 3D reconstruction, captured membrane discontinuities in cells undergoing mitotic phases such as anaphase or telophase. Similarly, an earlier study by Harrell and Goldstein (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Nevertheless, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively identifying such membrane discontinuities.

      We acknowledge that our findings are surprising. We did not set out to challenge the long-held view of membrane integrity during cell division. In fact, this study began when our dedicated EM technician, Jingjing Liang, first observed membrane discontinuity phenomena in control samples—wild-type embryos. Had she not come across this observation, we likely would never have pursued this line of inquiry.

      We appreciate the opportunity to clarify these points and thank the reviewer for thoughtful engagement with our work.

      Reviewer #2 (Public review):

      Summary:

      Liang et al. explore an unusual observation of membrane discontinuities in dividing C. elegans embryonic cells. This report is the first to demonstrate that, instead of the classical invagination of membranes during cytokinesis, cells in the early embryos of C. elegans exhibit separation of sister membranes that extend independently. TEM images of high-pressure-frozen samples provide strong evidence for the presence of Membrane Openings (MOs) in cells at various stages of the cell cycle, predominantly during mitosis. High-resolution images (x 30,000) clearly show the wrinkled plasma membrane and smooth MOs.

      The electron microscopy data are supported by the live cell imaging of strains with fluorescently tagged membrane markers. This study opens up the possibility of tracking MOs at other stages of C. elegans development, and also asks if it might be a common phenomenon in other species that exhibit rapid embryonic growth and divisions. 

      Strengths:

      (1) Thorough verification of Membrane Openings (MO) by several methods: 

      (a) 4 independent sample batches.

      (b) Examined historical collections.

      (c) Analysed embryos at different stages of development. The absence of MOs in later stages (comma) serves as a negative control and gives confidence that MOs are genuine and not technical artifacts. 

      (2) Live cell imaging of strain with fluorescently labelled membranes provides realtime dynamics of membrane rupture.

      (3) After observing the membrane rupture, the next obvious question is - what prevents the cytosol from leaking out? The EM images showing PBL and PEL - extracellular matrix serving as barriers for the cytosol are convincing.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weakness:

      (1) The association of membrane discontinuities with cell division is not convincing, as there are 159 cells out of 425 showing MOs, but it is not mentioned clearly how many of these are undergoing cell division. Also, it's not clear whether the 20 dividing cells analysed for MOs are a part of the 159 cells or a separate dataset. A graphical representation of the number of samples and observed frequencies would be helpful to understand the data collection workflow.

      We sincerely thank the reviewer for raising this important question and appreciate the opportunity to clarify these points.

      (1) Relationship between membrane discontinuities and cell division

      In C. elegans embryos, early development is characterized by rapid and extensive cell division: within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. Most of our electron microscopy (EM) analyses were performed on embryos at stages with fewer than 30 cells, in which the majority of cells are actively dividing. Therefore, it is reasonable to infer that the cells exhibiting membrane discontinuities (MOs) are predominantly mitotic. Supporting this, as embryos reached the comma stage—when cell proliferation declines and elongation begins—the incidence of MOs dropped sharply (0/13, 0/17, and 0/30 cells examined. This developmental specificity strongly links MOs to cell division.

      Moreover, in single random EM sections, we observed MOs in cells displaying clear mitotic features, such as metaphase chromosomes aligned at the equatorial plate, or anaphase/telophase configurations (fewer than 10 chromosomal clumps or two nuclei sharing common cytoplasm). Thus, MOs are indeed present in mitotic cells.

      From our 3D reconstruction (Figure 5), we identified a telophase cell containing three nuclei, each enclosed by its own plasma membrane, with each membrane harboring a single rupture that converged into a single opening. This tri-nucleated configuration indicates that the cell had undergone two rounds of division and was at telophase in both. The presence of a single membrane rupture in this context suggests that MOs can persist beyond mitosis, as the second cell cycle initiated from a mother cell that already shared cytoplasm with its sister and already contained a rupture. Thus, in this case, MOs were also present during DNA synthesis stage.

      (2) Clarification of sample numbers and datasets

      In Figure 1, we present results from a single EM section per embryonic cell, with sections randomly selected per embryo as detailed in Sup Figure 2. This initial dataset (425 cells) forms the basis of Figure 1.

      From the same pool of 425 cells, we used additional EM sections—distinct from those shown in Sup Figure 2—to locate 20 dividing cells for analysis of membrane discontinuities. Thus, while these 20 cells originated from the same set of embryos, they were not derived from the sections used in Figure 1 or Sup Figure 2.

      A graphical summary of sample numbers from the single-section analysis is already provided in Figure 1. Notably, cells with two clearly visible nuclei are more likely to be sectioned through or near their maximal diameter. In contrast, the randomly selected sections used for Figure 1 captured cells at variable planes, reducing the likelihood of observing MOs. Consistent with this, in the three embryos where no MOs were detected (one example is Sup Figure 2N), the sections likely passed through peripheral regions of the cells. Consequently, the frequency of MOs in randomly sectioned cells (Figure 1) is not directly comparable to that observed in the 20 dividing cells, which were analyzed using sections more likely to capture cells near their maximal diameter. These 20 dividing cells should therefore be considered a separate analysis. We will add detailed explanations in the Methods section to ensure this distinction is clearly understood.

      We are grateful for the reviewer’s thoughtful feedback and believe these clarifications will improve the clarity and rigor of the manuscript.

      (2) In Figures 3A and 3B, the resolution of the images is not enough to verify 3A as classical membrane invagination and 3B as detached sister membranes.

      Thank you for your valuable comment. In the revised manuscript, we will provide additional images at higher magnification to better illustrate the classical membrane invagination in Figure 3A and the detached sister membranes in Figure 3B.

      (3) Figure 6 lacks controls. How does the classical invagination look in this strain? Also, adding nuclear dye would be informative, in order to correlate the nuclear division with membrane rupture, as claimed. 

      Thank you for this important comment. As we addressed how we correlated nuclear division with membrane rupture in response to weakness (1), below we will focus on how we may distinguish classical invagination from membrane rupture.

      While live imaging provides invaluable temporal resolution, its limitations in resolving membrane ultrastructure are substantial. In C. elegans embryos, early development is marked by rapid and extensive cell divisions. Within approximately 100 minutes, a two-cell embryo develops into one containing nearly 30 cells. During this fast-dividing stage, the rapid kinetics of membrane extension—approximately 20–30 seconds during metaphase and less than 3 minutes during cytokinesis— combined with embryo motility, introduce considerable spatiotemporal ambiguities. Furthermore, the longstanding invagination model of cytokinesis has shaped interpretations in the field, which may have led to ambiguous structures such as free-ended extensions being dismissed as potential artifacts rather than recognized as alternative morphological features. Theoretical and computational models have largely been built upon invagination-centric assumptions, which may have further constrained conceptual frameworks. Therefore, fluorescence protein-based live imaging analysis alone could not serve as a convincing approach to challenge the current dogma of cell division, nor did we intend it to.

      However, when reexamined in light of our findings, previous studies using GFP::PH or similar markers have in fact revealed membrane discontinuities that went unnoticed. For example, using light-sheet microscopy and 3D reconstruction, Fu et al captured membrane discontinuities in cells undergoing division such as anaphase or telophase (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016.DOI:10.1038/ncomms11088)

      Similarly, an earlier study by Goldstein et al. (Harrell and Goldstein. 2011. Internalization of multiple cells during C. elegans gastrulation depends on common cytoskeletal mechanisms but different cell polarity and cell fate regulators. Developmental Biology. DOI:10.1016/j.ydbio.2010.09.012) showed regions where the GFP::PH signal appeared fuzzy and discontinuous.

      Here, to capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, both membrane ruptures and free-ended sister membrane structures (Figures 6) could be detected, providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos.

      However, given the inherent limitations of fluorescence microscopy in resolving membrane ultrastructure, high-resolution electron microscopy—supported by rigorous controls and serial section analysis—remains the gold standard for definitively distinguishing invagination from membrane discontinuities.

      While future advances in imaging technology may enable real-time visualization at near-EM resolution, our extensive, multi-year effort to test the artifact hypothesis has convinced us that these membrane discontinuities are genuine biological features of dividing C. elegans embryonic cells.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors challenge a dogma in cell biology, namely that cells are at any time point engulfed by a continuous plasma membrane. Liang et al. find that during C elegans embryogenesis, a high number of cells are not entirely surrounded by a plasma membrane but show membrane openings (MOs). These openings are enriched at the embryo's periphery, towards the eggshell. The authors propose that plasma membrane discontinuities emerge during metaphase of mitosis and that independent extension of "sister membranes" engulfs the daughter cells.

      Strengths:

      On the positive side, the authors find plasma membrane discontinuities not only by electron microscopy but also by fluorescence microscopy and provide information about the dynamics of membrane openings and their emergence. While this is assuring, the authors conclude that MOs emerge during metaphase. From what the authors show, this particular information cannot be deduced, as there is no dynamic capture of a membrane scission event together with a chromatin marker that would indicate mitosis. The authors could, however, attempt to find such events in live movies, given the high incidence of MOs reported from their EM data.

      Thanks to the reviewer for the encouragement. Highly appreciated.

      Weaknesses:

      In order to convincingly demonstrate the absence of any plasma membrane in the respective regions of the embryonic periphery or between cells of the embryo, the authors would have to show consecutive serial TEM sections where MOs are detected over more z-planes, beyond the mere 3D reconstructions. Although the authors state in the methods section that continuous ultrathin sections were cut for the metaphase sample (page 21, line 472), consecutive sections are never shown in TEM. While we do see the 3D reconstructions, better documentation of the underlying TEM data is missing. It would be necessary to show a membrane opening in consecutive z sections. Alternatively, the authors could seek the possibility to convincingly back up their claims with volume imaging by focused ion beam scanning EM (FIBSEM), where cellular volumes can be sectioned in almost isotropic resolution

      We Thank the reviewer for raising these important technical concerns. We have taken this question seriously since first observing membrane discontinuities six years ago, and we have since conducted extensive controls to rule out fixation artifacts.

      First of all, in addition to membrane discontinuities, we would like to highlight that a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      Second, we applied rigorous criteria for identifying membrane discontinuities:

      (1) To test whether the discontinuities were preparation specific, we examined four independent sample batches and validated our findings using two EM techniques: transmission electron microscopy (HPF-TEM) and dual-beam scanning electron microscopy (SEM).

      (2) We analyzed embryos across multiple developmental stages. Membrane discontinuities were observed in both intrauterine and laid embryos at early stages. However, as embryos reached the comma stage—a period marked by the onset of elongation and reduced cell proliferation—the incidence of discontinuities dropped dramatically (0/13, 0/17, and 0/30 cells examined). This developmental specificity argues strongly against a general fixation artifact, which would be expected to occur randomly across stages. Additionally, the eggshell is present throughout the embryonic stage of C. elegans; therefore, the dramatic reduction of membrane discontinuities in comma-stage of embryo argues against the possibility that the eggshell poses a fixation problem.

      (3) Each putative membrane discontinuity was identified only after examining the entire plasma membrane of the cell on a given section. Importantly, aside from the discontinuity, the remainder of the plasma membrane remained intact. Moreover, in most cells, only a single discontinuity was present per section, arguing against random, widespread membrane tearing during preparation. Because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Again, the same membrane discontinuity was confirmed only after inspecting the entire plasma membrane on those neighboring sections as well. We will include this verification protocol in the revised Methods and additional imaging of consecutive sections would be provided if needed.

      To further determine whether a dividing cell indeed contains one membrane rupture, we performed two serial reconstruction experiments using consecutive sections, as the reviewer suggested. First, we used HPF-TEM to analyze 105 consecutive sections of a metaphase cell, reconstructing the entire plasma membrane and chromosome configuration. We found that one membrane rupture largely encircled the chromosomal disc (Figure 2 and Video S1), spatially aligning with the future segregation zone. Second, we used AutoCUTS-SEM to collect approximately 600 sections covering ~95% of a telophase cell containing three nuclei sharing a common cytoplasm. This tri-nucleated cell was enclosed by three distinct plasma membranes, each harboring a single rupture site. These three ruptures converged to form a Yshaped exposed cytoplasmic region spanning >351 sections (Figure 5). Collectively, these reconstructions demonstrate that each cell contains only one discontinuity from a 3D point of view, further supporting that the phenomenon is not due to random sample preparation damage.

      (4) In addition to EM, we performed live imaging of plasma membrane dynamics. While live imaging provides important temporal context, we recognize its limitations in resolving membrane ultrastructure. The rapid kinetics of membrane extension (approximately 20–30 seconds for metaphase and less than 3 minutes for cytokinesis), combined with embryo motility, introduces spatiotemporal ambiguities. To capture dynamic membrane events, our live imaging using the GFP::PH membrane marker was performed at 4-second intervals, approaching the practical limit for single-section scanning of the embryo. With single-plane live imaging, nevertheless, both putative membrane ruptures (Figure 6A) and free-ended sister membrane structures could be detected (Figures 6B and 6C), providing additional evidence that membrane rupture and independent extension of detached sister membranes underlie cytokinesis in C. elegans embryos. Notably, 3D membrane dynamics analysis using light-sheet microscopy (Fu et al. Imaging multicellular specimens with real-time optimized tiling light-sheet selective plane illumination microscopy. Nature Communications. 2016. DOI:10.1038/ncomms11088). revealed membrane ruptures in dividing early C. elegans embryonic cells, including during telophase and metaphase. Therefore, live imaging further validates the membrane rupture phenomena in dividing embryonic cells in C. elegans

      We are confident that the cumulative evidence presented here addresses the reviewer's concerns and demonstrates that the observed membrane discontinuities, as well as cytoplasm-immersed membranes, are not procedural artifacts but rather reflect a previously underappreciated aspect of plasma membrane dynamics during embryonic cell division.

      Another critical issue concerns the detection of the membrane discontinuities in electron micrographs, which, in my opinion, is ambiguous. How do the authors reliably discriminate in their TEM images whether there is a plasma membrane or not? The absence - or weak appearance - of the stain of the electron dense material at membranes, which seems to be their criterion for MOs, is also apparent at other, intracellular membranes, like at the NE or at the ER (for example, see Figure 1C). Also, the plasma membrane itself appears unevenly stained in regions that the authors delineate as intact (for example, Figure 1C, 2B/1).

      We thank the reviewer for raising this important concern.

      First, our laboratory has extensive experience with electron microscopy across diverse biological systems, including neurons, muscle cells, and hypodermis in C. elegans, as well as tissues from Drosophila, mouse, bacteria, and cultured cells (Chen et al., 2013; Ding et al., 2018; Guan et al., 2022; Y. Li et al., 2018; Miao et al., 2024; Qin et al., 2014; Wang et al., 2026; J. Xu et al., 2022; M. Xu et al., 2021; L. Yang et al., 2020; X. Yang et al., 2019; Zhu et al., 2022). Importantly, we did not introduce any novel or unconventional steps in our EM preparation; all protocols were standard and well established. Thus, the observed membrane discontinuities are unlikely to result from technical inexperience or idiosyncratic methods.

      Second, because EM preparation yields serial sections, we verified nearly all membrane discontinuities by examining adjacent sections. Specifically, a membrane discontinuity was confirmed only after inspecting the entirety of the plasma membrane in neighboring sections. We will include this verification protocol in the revised Methods section, and additional images of consecutive sections can be provided if needed.

      Third, in addition to membrane discontinuities, a large number of single plasma membranes separating adjacent cytoplasmic domains were detected by EM (Figure 1, 3 and 4). This observation is particularly significant because the invagination model cannot account for the formation of single plasma membrane barriers between adjacent cytoplasmic domains. Instead, independent extension of detached sister membranes offers a plausible explanation for the generation of cytoplasm-immersed membranes. Furthermore, the morphology and continuity of these single cytoplasm-immersed membrane structures are well preserved, indicating successful EM processing and arguing against potential issues such as inadequate fixation or other technical limitations.

      EM-related publications by Jingjing Liang:

      Chen D, Jian Y, Liu X, Zhang Y, Liang J, Qi X, Du H, Zou W, Chen L, Chai Y, Ou G, Miao L, Wang Y, Yang C. 2013. Clathrin and AP2 Are Required for Phagocytic Receptor-Mediated Apoptotic Cell Clearance in Caenorhabditis elegans. PLoS Genetics 9:e1003517. DOI: https://doi.org/10.1371/journal.pgen.1003517

      Ding L, Yang X, Tian H, Liang J, Zhang F, Wang G, Wang Y, Ding M, Shui G, Huang X. 2018. Seipin regulates lipid homeostasis by ensuring calcium‐dependent mitochondrial metabolism. The EMBO Journal 37:e97572. DOI: https://doi.org/10.15252/embj.201797572

      Guan L, Yang Y, Liang J, Miao Y, Shang A, Wang B, Wang Y, Ding M. 2022. ERGIC2 and ERGIC3 regulate the ER‐to‐Golgi transport of gap junction proteins in metazoans. Traffic 23:140–157. DOI: https://doi.org/10.1111/tra.12830

      Li Y, Zhang Y, Gan Q, Xu M, Ding X, Tang G, Liang J, Liu K, Liu X, Wang X, Guo L, Gao Z, Hao X, Yang C. 2018. C . elegans -based screen identifies lysosome-damaging alkaloids that induce STAT3-dependent lysosomal cell death. Protein & Cell 9:1013–1026. DOI: https://doi.org/10.1007/s13238-018-0520-0

      Miao Y, Du Y, Wang B, Liang J, Liang Y, Dang S, Liu J, Li D, He K, Ding M. 2024. Spatiotemporal recruitment of the ubiquitin-specific protease USP8 directs endosome maturation. eLife 13:RP96353. DOI: https://doi.org/10.7554/eLife.96353

      Qin J, Liang J, Ding M. 2014. Perlecan Antagonizes Collagen IV and ADAMTS9/GON-1 in Restricting the Growth of Presynaptic Boutons. Journal of Neuroscience 34:10311–10324. DOI: https://doi.org/10.1523/JNEUROSCI.5128-13.2014

      Wang Z, Zhang L, Zhou B, Liang J, Tian Y, Jiang Z, Tao J, Yin C, Chen S, Zhang W, Zhang J, Wei W. 2026. A single MYB transcription factor GmMYB331 regulates seed oil accumulation and seed size/weight in soybean. Journal of Integrative Plant Biology 68:470– 485. DOI: https://doi.org/10.1111/jipb.70101

      Xu J, Chen S, Wang W, Man Lam S, Xu Y, Zhang S, Pan H, Liang J, Huang Xiahe, Wang Yu, Li T, Jiang Y, Wang Yingchun, Ding M, Shui G, Yang H, Huang Xun. 2022. Hepatic CDP-diacylglycerol synthase 2 deficiency causes mitochondrial dysfunction and promotes rapid progression of NASH and fibrosis. Science Bulletin 67:299–314. DOI: https://doi.org/10.1016/j.scib.2021.10.014

      Xu M, Ding L, Liang J, Yang X, Liu Y, Wang Y, Ding M, Huang X. 2021. NAD kinase sustains lipogenesis and mitochondrial metabolism through fatty acid synthesis. Cell Reports 37:110157. DOI: https://doi.org/10.1016/j.celrep.2021.110157

      Yang L, Liang J, Lam SM, Yavuz A, Shui G, Ding M, Huang X. 2020. Neuronal lipolysis participates in PUFA‐mediated neural function and neurodegeneration. EMBO reports 21:e50214. DOI: https://doi.org/10.15252/embr.202050214

      Yang X, Liang J, Ding L, Li X, Lam S-M, Shui G, Ding M, Huang X. 2019. Phosphatidylserine synthase regulates cellular homeostasis through distinct metabolic mechanisms. PLOS Genetics 15:e1008548. DOI: https://doi.org/10.1371/journal.pgen.1008548

      Zhu J, Lam SM, Yang L, Liang J, Ding M, Shui G, Huang X. 2022. Reduced phosphatidylcholine synthesis suppresses the embryonic lethality of seipin deficiency. Life Metabolism 1:175–189. DOI: https://doi.org/10.1093/lifemeta/loac02

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

      Weaknesses:

      The weaknesses are limited and relate primarily to framing and presentation rather than to the substance of the work. First, because contrast was titrated to maintain moderate performance (d′ between 1.2 and 1.8), the phase-linked changes in sensitivity appear modest in absolute terms, which could benefit from explicit contextualization. Second, a coding error resulted in unequal numbers of double-pass stimulus pairs across participants, which affects the interpretability of the response-consistency results. Third, several methodological details could be stated more explicitly to enhance transparency, including stimulus timing specifications, electrode selection criteria, and the purpose of phase alignment in group averaging. Finally, some mechanistic interpretations in the Discussion could be phrased more conservatively to clearly distinguish between measurement and inference, particularly regarding the relationship between reduced internal noise and sharpened tuning, and the physiological implementation of the frontal-occipital phase relationship.

      We appreciate the reviewer’s thoughtful and constructive feedback, particularly regarding clarity and framing. In response, we have made several revisions to improve transparency and contextualization throughout the manuscript.

      First, we now explicitly contextualize the relatively modest change in sensitivity by adding discussion of the contrast-titration procedure and its implications for effect size interpretation. Second, we address the coding error that led to unequal numbers of double-pass stimulus pairs across participants sooner in the manuscript by reporting the average number of pairs per participant in the Results (as well as the Methods), allowing for readers to interpret the results more appropriately. Third, we have provided additional detail, including precise stimulus timing parameters, electrode selection criteria, and a clearer explanation of the rationale for phase alignment in the Results (in addition to the Methods) section. Finally, we have revised portions of the Discussion to adopt more conservative language when interpreting our results, which more clearly distinguishes between empirical observations and mechanistic inferences, along with offering additional interpretations for the frontal-occipital phase relationship.

      We believe these revisions substantially improve the clarity, transparency, and interpretability of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Weaknesses:

      The sample size collected (N = 6) is, in my opinion, too small for the statistical approach adopted (group level). It is well known that small sample sizes result in an increased likelihood of false positives; even in the case of true positives, effect sizes are inflated (Button et al., 2013; Tamar and Orban de Xivry, 2019), negatively affecting the replicability of the effect.

      Although the experimental design allows for an accurate characterization of the effects at the single-subject level, conclusions are drawn from group-level aggregated measures. With only six subjects, the estimation of between-subject variability is not reliable. The authors need to acknowledge that the sample size is too small; therefore, results should be interpreted with caution.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

      We thank the reviewer for their supportive remarks on our design and analysis, and for raising this important statistical concern about our sample size (n=6). Our choice of a small sample size was driven by methodological considerations. Specifically, our reverse correlation analysis requires a large number of trials per participant, as it estimates perceptual tuning by regressing behavioral responses against fluctuations in the energy of stimulus features (orientation and spatial frequency). This approach, as well as the computation of signal detection theory (SDT) metrics such as d′ and criterion, depends on high trial counts to obtain reliable estimates, particularly given that our analysis further subdivides trials across eight phase bins. For this reason, we prioritized collecting a large number of trials per participant (∼5,000), which is consistent with established practices in psychophysical research.

      Importantly, our approach means that our design is reliable on the individual level, which motivated us to include a new binomial probability testing in our revised paper. This binomial test helps address concerns about the generalizability of our results. Binomial testing considers each participant as an independent replication of the effect and then computes the p-value associated with the probability of having observed the given number of statistically significant participants by chance, with a false positive rate of 0.05. In our data, 3 out of 6 participants showed significant effects, which corresponds to a probability of 0.002 of having observed these effects by chance alone. We believe this converging evidence supports the replicability and generalizability of our results. To improve the transparency of the single-subject data, we have included single-participant results in the Supplemental Materials to allow readers to directly assess the consistency of effects across individuals and to better contextualize between-subject variability.

      Thank you again for your suggestions, we believe that these additions have greatly improved our manuscript by demonstrating the robustness of our findings and increasing the transparency of our single-subject results.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The issue of generalizability arose during the review process, as your results are based on a small sample of participants who undertook a very large number of trials. In the revised version, it would be useful to discuss why this approach is valid, especially in the context of linking EEG with modeling (i.e., why it is more powerful than having many participants with fewer trials), and the extent to which your results can generalize to the population.

      We sincerely appreciate all of the helpful comments provided by the reviewers and hope we can address the concerns of our experimental approach. In the introduction, we have emphasized the importance of our current small sample size design, which allows us to reliably compute our signal detection theory metrics across 8 phase bins in addition to including the reverse correlation analysis. In the methods section, we have added a description of the binomial probability statistical framework, which addresses the generalizability of our results. In this framework, each participant is viewed as an independent replication and the p-value reflects the probability of having observed the number of individually significant subjects from the total sample size by chance. In this regard, observing a significant effect in 3 out of 6 participants (as in our study) from chance alone has a 0.002 probability, which we believe is unlikely and instead reflects a true effect present in the general population.

      Below I have copied our changes in the introduction and methods sections.

      “... in a large number of trials (6,020 per observer, n = 6) across multiple EEG sessions. This approach ensures a sufficient number of trials in order to reliably compute signal detection theory (SDT) metrics across multiple alpha phase bins while also affording enough statistical power for reverse correlation analysis (Xue et al., 2024), making it preferred over having a larger sample size with fewer trials.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024).”

      Reviewer #1 (Recommendations for the authors):

      My suggestions are intended to be light-touch and focused on strengthening the clarity and durability of the Reviewed Preprint rather than on additional experimentation or major new analyses.

      (1) Limitation statement for the double-pass coding error:

      Add a short statement in the Methods or Results acknowledging that the coding error led to markedly fewer repeated stimulus pairs for the first three participants than for the last three. For the response-consistency result in Figure 2E, a simple acknowledgement that the available evidence is stronger for some participants than others will help readers calibrate their confidence without detracting from the main story.

      Thank you for this suggestion, we have now added a statement to this effect in the Results section, in addition to the description already mentioned in the Methods section.

      “To examine this, we implemented a double-pass stimulus presentation (~600 stimulus pairs for participants 1-3 and ~2,500 pairs for participants 4-6) and analyzed participant’s response consistency (Xue et al., 2024) to two identical stimuli.”

      (2) Contextualizing the titrated performance level:

      In the Discussion, explicitly note that contrast was titrated to keep d′ between approximately 1.2 and 1.8, which intentionally maintains moderate performance. This contextualization will help readers understand that while the phase-linked changes appear modest in absolute terms, they are mechanistically informative within this design.

      Thank you, we have included a sentence to the Discussions speaking to this point.

      “We also note that the observed modulation of d’ between optimal and suboptimal phases was relatively modest in absolute terms (0.21) in our study and could therefore require many trials per subject to detect. Two reasons for this modest effect size could be related to specific features of our task design. First, we titrated stimulus contrast to maintain consistent task performance. This titration could have reduced the magnitude of the phase effect on d’ that would otherwise be apparent if the stimulus intensity were kept constant. Additionally, the use of (relatively) high-contrast random noise likely means that trial-to-trial variability in perception is largely driven by random fluctuations in the noise properties and, to a lesser extent, internal brain state. Although both of these choices were necessary to perform SDT and reverse correlation analysis, they differ from many previous studies investigating alpha phase using only near-threshold detection in the absence of external noise and may contribute to an underestimation of the true effect size.”

      (3) Methods clarifications:

      (a) Replace placeholder text such as "{plus minus}" and "{degree sign}" with the appropriate symbols, and ensure that any equations implied in the reverse-correlation section are fully present.

      Thank you for bringing this to our attention, these placeholder texts are an artifact of the conversion process and we will correct this.

      (b) State explicitly that the 8 ms stimulus duration corresponds to a single frame on your 120 Hz display, which will clarify the timing in Figure 1A and the pre-stimulus windows in the phase analyses.

      Thank you, we have added language to both the Method and Results sections explicitly indicating that the 8 ms stimulus choice corresponds to a single screen refresh. Additionally, we changed the text in Figure 1A to include inter-trial interval timing (as opposed to merely saying “Start Trial”):

      “(A) Task design. Each trial contained a brief, filtered-noise stimulus (8 ms; one screen refresh) presented to the right or left of fixation with equal probability.”

      “Each participant (n = 6) completed 5-6 EEG sessions of a Yes/No detection paradigm whereby participants reported the presence or absence of a brief (8 ms; one screen refresh) vertical Gabor target (2 cycles per degree) with concurrent confidence judgments (see Figure 1A), along with an additional imagination judgement (reported in the supplemental materials).”

      (c) In the description of the post-stimulus taper, consider phrasing the rationale in terms of minimizing contamination from evoked responses rather than asserting that the taper ends before the earliest evoked response, which keeps the argument correct without committing to a precise latency boundary.

      Thank you for this suggestion. We have changed our rationale for the taper to “minimizing”, rather than avoiding, the evoked response.

      “This resulted in the post-stimulus data being flat after 70 ms, which is intended to minimize the evoked response in our data.”

      (4) Analysis transparency:

      (a) In the description of posterior electrode selection, explicitly note that channels were chosen solely on the basis of alpha power, independent of behavioural performance, and that the same electrodes were used for each participant across sessions.

      We have gladly made this clarification to the methods.

      “This was individually determined by rank-ordering 17 of the posterior channels (Pz, P3, P7, O1, Oz, O2, P4, P8, P1, P5, PO7, PO3, POz, PO4, PO8, P6, and P2) and algorithmically choosing the three with the highest power. This ensured that electrode selection was made independent of performance and instead was based upon maximizing alpha signal strength.”

      (b) Describe the phase-alignment step used to center each participant's optimal bin before group averaging as a device for visualization and summary, and clarify that inferential statistics are based on the underlying, non-aligned data as appropriate. This will reassure readers who are cautious about circularity.

      We agree that this should be made more explicit throughout the manuscript and have added statements clarifying this aspect in the Figure 2B caption, the Results, and Method sections.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only. Error bars represent ± 1 SEM. The pattern shows a clear phasic modulation of d’ across bins.”

      “... requiring us to phase-align the performance data across participants in order to visualize the underlying phasic effects. To this end, we aligned all metrics (d’, c, HR, and FAR) by circularly shifting the data so that the bin with the highest d’ was assigned to bin 8, which was then omitted from further visualizations.”

      “Bin 8 was then omitted from further visualizations. The shifted data were then averaged across all time points from -450 ms to 0 ms, based on significant effects at the group level, and averaged across participants. No statistics were conducted on these shifted variables and instead are for visualization purposes only.”

      (c) Add a short note on the number of permutations and the cluster-forming threshold in the phase-coupling analyses, if not already stated in the Results or captions, to complete the description of your non-parametric testing procedure.

      Thank you, we agree that reiterating this information in the Results section is helpful for the reader to clarify the analysis procedure.

      “After smoothing the resultant vector length over time with a 50 ms moving average, we compared the observed vector lengths to a permuted threshold (95th percentile of 1,000 permutations) at each time point from –700 to 0 ms and performed cluster correction (95th percentile of the permuted cluster size) to account for multiple comparisons.”

      (5) Discussion framing:

      Make one or two small adjustments to your mechanistic phrasing so that the distinctions between measurement and interpretation are fully explicit:

      (a) State that the combination of phase-d′ coupling, counterphased hit and false-alarm rates, response consistency, and phase-dependent classification images is "consistent with" a reduction in effective internal noise and sharper estimated tuning at optimal alpha phase, within the assumptions of your SDT and reverse-correlation framework.

      Thank you for this suggestion. We have changed the language in the discussions to reflect this framing and interpretation of the results.

      “Moreover, our data are consistent with a model in which the variability of internal responses changes systematically across the alpha cycle, as reflected in the inverse relationship between hit rate and false alarm rate.”

      (b) Emphasize that reduced effective internal noise and sharpened sensory tuning are two complementary descriptions of a better match between sensory evidence and decision template rather than fully separable mechanisms.

      Thank you, we have added this language for clarity of our interpretation.

      “Together with decreases in the variance of sensory tuning during the optimal phase, our results suggest that alpha phase impacts sensitivity by shaping trial-to-trial variation in internal noise during perceptual decision making, leading to better matches between sensory evidence and decision templates as opposed to a change in the gain of internal sensory responses.”

      (c) Note that the frontal-occipital phase relationship is consistent with a coordinated, possibly top-down component to the alpha-phase effect, while remaining agnostic about the precise physiological implementation.

      Thank you for raising this additional interpretation. We have added this as a plausible alternative to the single-source account in the Discussion section.

      “Moreover, our results suggest that prior literature reporting phasic effects in the alpha-band range from both frontal and occipital regions may plausibly be reporting the same effect from a single projected dipole source; however, these results are also consistent with two synchronized alpha sources which are anti-phase.”

      Reviewer #2 (Recommendations for the authors):

      Major issues:

      Given that collecting more data may not be doable, the authors should take some actions to test the reliability of their results. For instance, simulations could be run to test the robustness of the results with such a small sample size (Zoefel, 2019). It would also be of interest to include in the report statistics and plots at the individual level, not only the aggregates. It is also important to report which electrodes were used in the analysis for each of the subjects, in the Methods section, it is clearly stated that these electrodes differed between subjects.

      Thank you for these suggestions. To assess the reliability of our results at the single-subject level, we have included a new binomial probability test which is a framework suitable for small sample size experiments with large trail numbers (Schwarzkopf & Huang, 2024). Binomial testing views each individual as an independent replication and considers the probability of having observed the number of significant participants given the total number tested participants, and outputs the probability of having observed the results by chance. We believe this framework adequately addresses the reviewer’s concern of generalizability in addition to being well-suited to the design of our study.

      To assess individual significance, we averaged the resultant vector length and permutations over the analysis window from -450 to 0 ms. If the resultant vector length exceeded the permutation for that participant, then they were considered to be a significant participant. In total, 3 out of 6 participants (participants 1, 4, and 5) showed significant d’ coupling. The binomial probability (equivalent to a p-value) of having observed this outcome as a result of three false positives at the individual-subject level is very small (p = 0.002), which is sufficiently low for psychological studies.

      Below is the text which we have added to the Results and Methods sections.

      “To interrogate the robustness of our findings at the single-subject level, we adopted a test of binomial probability, which is a statistical framework that treats each individual as an independent replication and is ideal for small sample size studies that utilize a large number of trials per observer (Schwarzkopf & Huang, 2024). For our data, we assessed individual significance by averaging the actual and permuted resultant vector lengths across time (-450 to 0ms) and comparing the real vector length to the 95% percentile of the permuted datasets. With this approach, 3 out of 6 participants showed significant d’-phase coupling which corresponds to a binomial probability of p = 0.002, indicating a very low probability that we observed these results by chance alone.”

      “Additionally, we used a binomial probability testing framework that is designed for small sample sizes and treats each participant as an independent replication. As such, it computes the probability of having observed the number of statistically significant outcomes by chance given our sample size (Schwarzkopf & Huang, 2024). To assess significance at the participant level, we averaged the participant’s resultant vector length and permutations from -450 to 0 ms and obtained the 95th percentile of the time-averaged permutations. We then compared the averaged resultant vector lengths to the permutation thresholds for each subject, which revealed 3 out of 6 significant subjects. We then used the MATLAB function myBinomTest.m (Nelson, 2026) to compute the p-value associated with the probability of having observed 3 out of 6 significant subjects by chance (with a false-positive rate of 0.05).”

      To address the reviewer's second request, we now include a supplemental figure which has each individual’s results for the main analysis (see Supplementary figure 3). These graphs, in addition to the methods, now provides the reader with each participant’s given set of analysis electrodes.

      “Each participant had a different combination of electrodes which were used in the analyses; however, the same three channels were used across sessions within a participant (participant 1: POz, PO3, O1; participant 2: P7, PO7, PO4; participant 3: P2, P1, Pz; participant 4: O1, Oz, O2; participant 5: O2, PO8, PO4; participant 6: Oz, O2, O1).”

      As an alternative approach, linear mixed models (LMM) could be used for statistics, as they are more suitable for small sample sizes (Wiley et al., 2019). LMM improve generalization by modelling subject-specific random effects. Although raw circular data is not suitable for LMM, the sine and cosine of the phases could have been used as predictors, for instance. Given that data were collected for 6 different sessions, sessions could be included as a factor in the model to improve statistical power.

      We appreciate the suggestion but feel that LMMs would be a challenge in this case not only because the main predictor variables are circular, but because the main outcome variables are not defined on the single-trial level and require many trials to be computed (e.g., classification images, SDT measures, response consistency). As such, computing these measures within a session may also lead to noisier estimates than we had designed our experiment for. We therefore prefer the more straightforward approach we have taken in the paper, which has now been supplemented by a binomial test of individual-subject level significance.

      Given that the number of subjects is quite small, I believe that individual data should be presented (either in the main text or supplementary materials) also for figures: 2A, B, C and D.

      Thank you, we have included all of these results to the individual graphs in the Supplemental Materials (see Supplementary figure 3).

      In plot 2B (HR and FAR) a p-value = 0.015 appears. However, in the text you write:

      "Indeed, this showed that the difference between the HR and FAR vector angle was significantly clustered around a mean of 180{degree sign} (v = 3.78, p = 0.01), indicating that the phase angle associated with the greatest hits was counterphase to the phase angle associated with the greatest false alarms."

      Which one is correct? Or do they refer to different tests?

      We appreciate you catching this confusing discrepancy. The two values refer to the same test which has a p-value of 0.0145. In the figure, this value was rounded to the thousandths decimal place (i.e., 0.015), whereas in the text it was rounded to the hundredths value (0.01). We now consistently report p-values out to three decimal places throughout the manuscript.

      Did you perform any statistical test for phasic modulation of dprime and criterion? I say that because in Figure 2B, you state that the data shows a "clear phasic modulation of d' across bins", but no statistic is mentioned. On the other hand, in Figure 2D, you state, "We did not & observe any significant phase-dependent relationship between phase and criterion." Is this sentence referring to both 2C and 2D panels or only to 2C?

      Figure 2B and 2D show the phase-behavior relationship across bins after aligning the phase bins to each participant's “best” d’ bin. This bin is omitted from the plots because it is used for alignment, making the analysis circular. Accordingly, these panels were intended purely for visualization and were not used for statistical inference. Additional language has been added to the figure caption highlighting this aspect.

      “The data have been aligned across participants so that each individual's highest d’ was assigned to bin 8 (omitted from the plot), with the remaining data circularly shifted, and is averaged across -450 ms to stimulus onset. This graph is for visualization purposes only.”

      The primary statistical test for phase-behavior coupling was performed using permutation testing of the resultant vector length, which quantifies the magnitude of phase-dependent modulation. These results are shown in Figures 2A (for d′) and 2C (for criterion). In the original manuscript, we reported only the time points that survived cluster-based correction, but did not explicitly report the cluster p-values. We have now added these cluster p-values to the manuscript for completeness.

      “The data revealed significant cluster-corrected coupling between alpha phase and d’ in the prestimulus window from -220 ms until stimulus onset (cluster p = 0.046),...”

      Additionally, we have changed the caption of Figure 2 to be separate for C) and D).

      “(C) No evidence for the coupling of criterion to pre-stimulus alpha-band phase. Graph C reveals the time course of the resultant vector lengths for alpha phase-criterion coupling, which shows no significant phase-dependent relationship between phase and criterion.

      (D) The underlying shifted c across phase bins (shifted to participants’ optimal phase, as in graph B) did not visually demonstrate a phasic modulation pattern.”

      Minor issues:

      In general, the paper is very clear. I found a statement confusing in the Response consistency section:

      "To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. This procedure was done for each channel at each time point (from -450 to 0 ms) and then averaged."

      Which makes no sense, as response consistency is independent of channel and time point. I believe here you refer to the phase, maybe by just changing the order (start with response consistency and then proceed to phase), the paragraph would be clearer.

      We appreciate you catching this mistake. We have clarified the Methods section in the following way:

      “To quantify response consistency, we computed the proportion of trials in which participants provided the same response across the two identical trials. Since the optimal phase changes over time, the set of trials were classified as either both having occurred during the optimal phase (or otherwise) for each time point (from -450 to 0 ms) and channel. The proportion of consistent responses was then averaged across channels and time.”

      Could you include a plot of the power spectrum used for IAF estimation of all the subjects?

      Thank you for the suggestion. In Supplemental Figure 3 we have included the power spectrum that was used to estimate IAF in addition to a topoplot of alpha power (IAF +/- 2 Hz) that has the analysis electrodes labelled.

      Bibliography:

      Wiley RW, Rapp B. Statistical analysis in Small-N Designs: using linear mixed-effects modeling for evaluating intervention effectiveness. Aphasiology. 2019;33(1):1-30. doi: 10.1080/02687038.2018.1454884.

      Zoefel B, Davis MH, Valente G, Riecke L, How to test for phasic modulation of neural and behavioural responses, NeuroImage, Volume 202, 2019,116175, https://doi.org/10.1016/j.neuroimage.2019.116175.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      (a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A). We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we have moderated our interpretation accordingly in the revision.

      (b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we have marked such cytoplasmic or multifunctional genes with red asterisks in both Fig. 5G and Fig. 6D in the revised manuscript.

      Our comparison showed that key genes for motile machinery are not detected in cochlear hair cells. For example, Dnah6 and Dnah5 are not expressed in the P2 cochlear hair cells. Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells. Furthermore, axonemal CCDC39 and CCDC40, the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome were not detected in cochlear hair cells. We have revised the text to emphasize key differences.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We have revised the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates.

      We note that spontaneous activity exits throughout nervous system. It allows the nervous system to maintain baseline activity and interpret signals. Retinal cells are spontaneously active even in the dark and spiral ganglion neurons also fire spontaneously. Spontaneous hair bundle motion driven by mechanotransduction-related mechanism has been observed in bullfrog saccular hair cells. So, it is unlikely that spontaneous kinocilia beating would interfere with generating temporally faithful representations.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous neural activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We have revised the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations for the authors:

      (1) Figure 1 - Explain how hair cell types are recognized after dissociation. Figure 1 will not be clear in this regard for non-aficionados. Some of the dissociated cells shown appear quite distorted and even unhealthy - e.g., the bottom right crista type II hair cell; the second from left crista type I hair cell; can you address why this doesn't matter for the purposes of this study?

      HC types in Fig. 1C were identified based on their morphological features: Type I HCs are flask-shaped with a narrow neck while type II HCs are cylindrical and short. We have replaced those cells with new images. In our study, HCs were identified based on their marker genes. Although some HCs such as those shown in Fig. 3C were impossible to avoid during preparation of single cells for library (most people did not examine their morphology), quality of mRNA and sequencing was high, better than those datasets published in previous studies.

      (2) Line 98 - Explain accessory cells (as opposed to supporting cells).

      We changed accessory cells to other cell types.

      (3) Line 246 - The primary cilium is...

      Changed.

      (4) Figure 6D - The scale bar is missing. Please use arrows to point to the genes you call out in the text. Also, the genes called out in the text as differently expressed (line 342) are quite faint bands in both cell types. It would be a service to the reader to point them out in the panel.

      A scale bar has been added. We also marked those genes as suggested and edited the text accordingly.

      (5) Figure 7 - mixes frog crista and mouse middle ear images with waveforms and FFTs from frog crista, mouse middle ear, and mouse crista. Related to these still images are 2 videos of frog kinocilium beating (2 hair cells). The mouse images must be underwhelming, or we would have been shown those, yet they were considered adequate to analyze.

      Yes, the spontaneous kinocilia motion of mouse crista HCs is very small. The peak motion is about 40 nm, which is very close to the resolution of our camera. That is why we used photodiode technique to detect its motion. Photodiode is more sensitive, and this technique allows us to observe dynamic response waveform.

      (6) I recommend labeling each figure panel with the tissue of origin to avoid confusion.

      Labeled as suggested.

      (7) I suggest dropping the mouse middle ear data, as they are not directly adequate as a positive control (or no more so than the more beautiful frog data).

      We keep the waveforms of middle ear cilia movement in Fig. 7. The main reason is that we would like to show the magnitude difference between airway cilia and kinocilia. The kinocilia movement was at least an order of magnitude less than the movement of airway cilia. This has led to our effort to generate a model to predict the 96-nm modular repeat and explain why kinocilia movement in mice is much smaller than airway cilia and bullfrog kinocilia.

      (8) Focus on the hair bundle motions:

      (a) Show the waveforms for the frog crista hair cells and their FFTs.

      These images were captured many years ago using camera. The kinocilia motion is between 5 and 10 Hz. We did not present any waveforms of kinocilia motion since we no longer have access to bullfrogs. However, although we did not present response waveforms, the videos are very powerful for visualization of kinocilia beat of bullfrog saccular HCs.

      (b) Find some way to show us how you measured the mouse hair bundle beating.

      Photodiode technique was used to measure spontaneous kinocilia motion in mice. More details are now included in the text.

      (c) Does EGTA break links between kinocilium and stereocilia? (Could that contribute to the higher beat frequency?) Just applying the same treatment and viewing from above could clarify whether kinocilia dissociate from stereocilia rows. This would likely be more straightforward with an otolith organ.

      All these links (tip links, side links) are vulnerable to Ca concentration and Ca-free medium is often used to break these links as shown in many previous studies. Breaking the kinocilia links leads to reduced load to the kinocilia, which may result in larger motion of the kinocilia. The frequency is inherent to motile machinery and subject to temperature and intracellular ATP concentration. When facing upward, the hair bundles in otolith organ do not have a good contrast against HCs in the background. This makes measurement of their motion difficult, especially when the motion is small and random and can’t be averaged to improve signal to noise ratio. Besides, unlike cochlear HCs whose hair bundles are short and can easily be oriented in parallel with light path, the long hair bundle of vestibular HCs is more difficult to orient and image. For these reasons, we chose to use crista hair bundles for our measurements since they can be oriented in perpendicular to the light path without interference from background HCs. The lateral motion of the entire bundle is also relatively easy to measure in this preparation.

      (6) Is there no reason to cite McInturff et al. (2018), given that they compared type I and II VHC transcriptomes at P12 and P100? This database is also available on gEAR.

      Their studies are now cited. We also compared their datasets with ours.

      (7) Line 374 - Eatock et al., 1998 citation does not work for this purpose. Eatock & Songer (2011) would be better, or Li, Xue, Peterson (2008): mouse utricle anatomy; significant discussion of relative heights of kinocilia and tallest stereocilia.

      Changed and cited.

      (8) In Figure 3, 2 of the 18 panels in B are missing labels.

      The bar, applied to all panels, was there at the bottom of Fig. 3B. The bar is bigger and more visible in the revision.

      (9) Line 187 should "Sppl1" be Spp1?

      Corrected.

      (10) Define BBSome on line 244.

      Added.

      (11) Looking at Figure 5, it seems that all the motile genes are expressed in the vestibular hair cells and not the cochlear hair cells. It is surprising that there are any cilia-related genes expressed in these adult cochlear hair cells, given that they do not retain their cilia into adulthood. Could the authors make a comment on this finding in the discussion? Also, are there any ciliopathies that show a vestibular defect but normal hearing in mice or humans? Have you compared the cilia-related gene expression in neonatal/embryonic vestibular hair cells to your dataset?

      There are many kinocilia related genes still expressing adult cochlear HCs. It is not surprising to see many kinocilia related genes in cochlear HCs. Most of these genes are related to primary cilia structure including the basal body and transporters in cilia. The basal body is still present in cochlear HCs. Many other primary cilia-related proteins are also expressed in soma, especially those related to signal transduction, microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, protein folding, translation, nuclear transport, ubiquitination, RNA binding, mitochondrial proteins and transcription factors. Of course, some of them are vestigial. We added discussion of this in the text. Comparison between neonatal cochlear and vestibular was presented in Fig. 6D. We compared those genes related to the axonemal repeat (96 nm repeat complex). Due to quality of mRNA, the total genes and genes related to kinocilia detected in previous developmental studies were much less than our datasets. While we detected 112 out of 128 genes related to axonemal repeat, only 90 genes were detected in previous studies (Burns et al., 2015; McInturff et al., 2018). Therefore, we only compared neonatal cochlear and vestibular HCs using their datasets. As far as we know, no ciliopathies with vestibular defects but normal hearing have been reported in mice or humans. But we plan to use a Ccdc39 mutant mouse model to examine how loss of function of a key motile cilia signature gene would affect kinocilia motility and vestibular function.

      (12) How is "expression level" in the violin plots being calculated? Is this a measure of read count? The normalization is cursorily explained in the methods. Is this value comparable across genes? Did the authors switch to z-score by Figure 6?

      We dissected the auditory and vestibular sensory epithelia from the same groups of mice and prepared libraries and sequenced them at the same time. All parameters are the same. The violin Plots are based on values presented in Supplementary Table 1. Each dot in the plot reflects an aggregated number of reads across all cells for each gene. They are all normalized across different HC types and biological repeats. The details for normalization are now provided.

      (13) The authors comment on the 16/128 motile cilia axonemal repeat genes that are not expressed in the vestibular hair cells. Listing these somewhere may be helpful to the readers.

      We thank the reviewer for this helpful suggestion. Most of the 128 motile cilia axonemal repeat genes were listed in Figs 8C and S5, along with known loss-of-function mutations and ciliopathy associations identified in human diseases or observed in animal models. To improve clarity, we have now included Table S2, which provides the complete list of all 128 motile cilia axonemal repeat genes, including those not expressed in vestibular HCs.

      (14) Figure 5D needs some refinement. While the authors used databases, including CiliaCarta, SYSCILIA gold standard, and CilioGenics, to identify the primary cilia-related genes, they have included many genes that are not highly specific to primary cilia function (e.g., HSP90, HSPA8, DNAJA4, GNAS...). Perhaps the authors would be able to do a better job of specifically querying primary cilia function by using genes that are common to these three databases.

      We presented comparison and analysis based on three major cilia databases, which are generated from proteomics of cilia from different tissues/organisms. In addition, we have provided more comprehensive list of primary cilia-related genes in Fig. S2. While majority of cilia-related genes/proteins are highly conserved, some genes/proteins are tissue-/organism-specific. Majority of the genes presented in Fig. 5D of our manuscript are shared among all three databases. The cilium is a complex structure, composed of proteins for microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, signaling, and protein folding. It also contains proteins for translation, nuclear transport, ubiquitination, RNA binding as well as mitochondrial proteins and transcription factors (https://ciliogenics.com/?page=Home). Proteins such as HSP90 and HSPA8 are important for protein folding. HSPA8 also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. GNAS is part of a G protein complex that transmits signals. DNAJA4 is one of the high-confidence cilia proteins (mean score of 1.26, expression rank is 938). These proteins are detected in cilia according to CilioGenics (https://ciliogenics.com/?page=Home). These proteins are not highly specific to cilia and are expressed in soma as well. Most of these proteins for signaling such as WNT (Supplementary Fig. 2) are detected in both cilia and soma.

      (15) The authors state, "Furthermore, we observed robust spontaneous kinocilia motility in bullfrog crista HCs and small spontaneous bundle motion in mouse crista HCs." This statement should be moderated by acknowledging that this motility was observed in only some cells. The authors favor the hypothesis that the lack of motility in some crista HCs is due to depolarization or damage to the sample. The authors should also acknowledge the possibility that there may be cell-to-cell variability in the motility of the kinocilia.

      We address these issues in public review section. We modified the statement as suggested.

      (16) The first few pages of the Results section include many lists of genes. Readability may be improved if this is curtailed modestly.

      Changed as suggested. We removed comparison among different types of HCs and replotted Fig. 2B. This has reduced the number of genes mentioned in the text.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      We thank the reviewer for pointing out this inconsistency. We have corrected this interpretation. Specifically:

      (1) We removed statements suggesting that the circuit is fully disengaged during starvation.

      (2) We now state that endogenous hugin activity is reduced during starvation, but the circuit retains modulatory capacity when experimentally perturbed.

      (3) The Discussion now emphasizes that the system operates as a state-modulated inhibitory tone rather than a strictly fed-state switch.

      We believe this revised framing resolves the discrepancy.

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      We agree that this is an important observation. Although the effect size is modest, it is reproducible and suggests that hugin signaling may not operate as a strictly linear pathway.

      To address this:

      (1) We added a paragraph in the Results acknowledging the PK2-R2-dependent phenotype.

      (2) We included a discussion noting the potential functional heterogeneity of hugin neurons.

      (3) The schematic model (now Figure Supplementary 17, previously Figure Supplementary 16) includes a dashed line indicating a possible parallel PK2-R2-dependent branch.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

      We fully agree with the reviewer. Our original description of the circuit as a “satiety brake” implied exclusive engagement in fed states, which is not strictly supported by the behavioral data. Although endogenous hugin activity is elevated under fed conditions (as shown by CaMPARI), experimental manipulations demonstrate that the circuit retains functional capacity to modulate feeding behavior across feeding states.

      To address this concern, we have:

      (1) Removed the term “satiety-specific brake” throughout the manuscript.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module.

      (3) Revised the Discussion to explicitly state that the hugin–AstA pathway biases sweet sensitivity according to circulating glucose levels rather than functioning as an on/off switch.

      (4) Substantially revised Supplementary Figure 17 to reflect graded modulation across metabolic states rather than binary state engagement.

      These changes better align our conclusions with the experimental observations.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism.

      We agree that the modest PER phenotype suggests that Glut1-mediated glucose uptake represents one component of glucose sensing in hugin neurons. We have clarified this in the Discussion and now explicitly state that additional glucose-sensing mechanisms may contribute to hugin activation.

      Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

      We fully agree that the previous framing overstated state specificity.

      As described above, we have:

      (1) Removed “satiety-specific brake” terminology.

      (2) Reframed the circuit as a glucose-responsive inhibitory module.

      (3) Revised the Discussion to explicitly acknowledge modulation across feeding states.

      (4) Updated the schematic model (Figure Supplementary 17, formerly Figure Supplementary 16) accordingly.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      Both the reviewers and I agree that the conclusion about a "satiety-dependent" brake needs to be modified to discuss the phenotypes that are also observed under starved conditions. Reviewer 1 would further like to emphasize that the authors are not required to follow through with the specific recommendations suggested by them. Modifying the conclusion and Supplementary Figure 16 should suffice.

      We sincerely thank the Reviewing Editor for the clear guidance. We fully agree that our previous framing of the hugin–AstA circuit as a strictly “satiety-dependent” brake may have overstated the state specificity of the system.

      In response to this recommendation, we have:

      (1) Revised the Abstract, Results, and Discussion to moderate the conclusion and explicitly acknowledge the phenotypes observed under starved conditions.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module, rather than a satiety-exclusive brake.

      (3) Supplementary Figure 17 (formerly Figure Supplementary 16) has been substantially revised to illustrate graded modulation across metabolic states rather than binary engagement.

      We appreciate the clarification that no additional experiments were required and are grateful for the opportunity to improve the conceptual framing of our work.

      Please include full statistical reporting in the main manuscript (e.g., figure legends or results).

      We have revised all figure legends to include full statistical reporting.

      Reviewer #1 (Recommendations for the authors):

      By re-framing their finding as the "brake" mechanism on satiety-induced suppression of feeding behavior and sensitivity to sweet taste, the authors substantially improved the clarity of their findings and their significance. The additional data (Fig. Supp. 13B, C) allows "apple-to-apple" comparisons of behavioral data. I support the publication of this manuscript with no further experiments, although I have several suggestions for the text.

      As I write in the public review, I have a reservation on the authors' argument that hugin-AstA system is the "'satiety brake' - that is selectively engaged in fed states to dampen sweet sensation (lines 392-394)". Manipulation of both hugin system (Fig. 2C, Fig. 3A, C, D, G, Fig. Supp. 8A, C, Fig. Supp. 10A-C, Fig. Supp. 13B, C) and AstA system (Fig. 4A, E, Fig. Supp., 8C, D, Fig. Supp. 12A-C, Fig. Supp. 13D) all indicate that hugin-AstA system suppresses feeding regardless of the satiety state. Specifically, Fig. Supp. 13B shows that synaptic blockade does further increases PER, causing contradictions to authors' statements ("silencing hugin+ neurons led to enhanced sweet-driven feeding behavior (line 299-300)" and "...further silencing has little additional effect (line 402)"). The CaMPARI data (Fig. 1J) provides the link between the activity levels of hugin-releasing neurons and satiety state. However, the fact that eliminating hugin-AstA signal can promote further PER in starved flies suggests that this brake is not completely satiety-dependent. I ask authors to at least discuss this perceived discrepancy between their data and conclusions.

      Also, the authors' finding that PK2-R2 reduction actually suppresses PER specifically among starved flies (Fig. 3D, H), albeit with relatively small effect size, suggests that hugin-AstA axis is not a singular, linear pathway as authors suggest in Fig. Supp. 16. While delineating the PK2-R2-dependent pathway is beyond the scope of this study, at least a line of discussion would be helpful.

      Minor comments:

      (1) Fig. Supp. 8 (dTRPA1 activation of hugin and AstA neurons), and Fig. Supp. 13B-D (inhibition of hugin and AstA neurons) should be in the main figure given its relevance to the narrative of this manuscript.

      We agree with the reviewer regarding their importance. The key behavioral panels from these figures have now been moved to the main figures to strengthen the narrative flow.

      (2) Fig. Supp. 11 (PER and imaging using decapitated heads only), despite its creativity, leaves me wonder how PER of fly heads looks like. It is a highly artificial and invasive experiment. Supplementary movies would be helpful.

      We apologize for the lack of clarity in our description. In this experiment, flies were not decapitated. Instead, we surgically severed the connection between the brain and the ventral nerve cord (VNC), while keeping the body and proboscis musculature intact. Thus, the flies remained physically intact, and PER was measured using the same behavioral protocol as in intact animals.

      We have revised the figure legend to clarify this point and avoid confusion. Because the behavioral procedure was identical to standard PER assays and the flies retained normal proboscis motor function, we did not include supplementary videos.

      (3) Expression patterns of PK2-R1 and AstA-R2 in proboscis are mentioned in text but with no data (lines 229 and 279). I strongly encourage authors to show images.

      We have now included the relevant expression images in the revised manuscript.

      (4) A citation for the "previous study (line 486)" describing PER method is required.

      The appropriate citation has been added.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. Author Response:

      Public Review:

      On behalf of all authors I would like to thank the reviewers for highly constructive and helpful comments, which, once addressed fully, will make the paper stronger and more useful as a tools and resources contribution.

      Besides addressing all minor issues that were pointed out by the reviewers, we see three main lines of changes we will need to pursue in order to address all major concerns. We plan to do all of these as fast as possible. Given that new alignments, segmentation and tracing is needed, this will take between one and three months.

      (1) Availability of code, software documentation and accessibility of pipeline. 

      Both reviewers and the editorial summary agreed that we need to improve the availability of our code, provide more instructions and examples of how to use the code, and make our methods more reusable to outsiders. To achieve this we will follow the suggestions made by the reviewers, in particular the list presented by reviewer 1 (point three of weaknesses in the public review).

      We firstly would like to apologize for the faulty link to the SegToPCG (https://github.com/Heinzelab/SegToPCG) repository (the correct name and link is: LSDtoPCG and https://github.com/Heinze-lab/LSDtoPCG) as well as the missing code in the https://github.com/Heinze-lab/synful_312 repository; these issues have already been fixed and will be included in an updated bioRxiv version.

      Second, we will generate an overarching umbrella page that will serve as a go-to site for any user who would like to implement our pipeline. To enable implementation, we will expand the documentation, provide detailed instructions, and include an example dataset with these instructions.

      (2) Quantification of analysis steps, including segmentation, alignment and manual tracing, to validate our claims of increased efficiency and transferability across species.

      As for point 1, both reviewers as well as the editorial summary highlighted the need for more comprehensive quantification of the workflow, especially with respect to segmentation quality as well as time investment into manual tracing and high resolution alignments. In particular, these data should validate the transferability of the segmentation models across species, and support the claims made about the time savings resulting from using our multiresolution workflow compared to a whole sample synaptic resolution approach.

      To this aim, we will generate all analyses according to the reviewer suggestions and incorporate the resulting data in new figures and tables. To make the data fully comparable across species, we will apply the latest version of our alignment and segmentation scripts to at least one high resolution data stack of each species, quantify manual tracing of a comparable, defined set of neurons in each species, and perform VOI analyses of each species segmentation against manually traced neurons in identically sized testing volumes in each dataset. Additionally, we will proof-read identical branches of homologous neurons in each species and quantify the required number of edits from raw segmentation output to completion.

      As the segmentation pipeline has evolved over the last years, a fair comparison between all datasets requires fresh analysis based on the latest version of our machine learning models (cannot be done with existing data) and will therefore take a few weeks of time.

      (3) Clarification of aims for multi-resolution pipeline and how projectomes and connectomes inform each other

      Reviewer 2 highlighted that there is not sufficient clarity about the aims of combining projectome and connectome. Judging from the reviewer comment, we might have inadvertently left the impression that we aimed at predicting a connectome from projectome data, by using spatial proximity of neurons as a proxy for connectivity. In fact, our data show that this is not possible, and that projection level data cannot predict connectivity. For instance, in the head direction system, the projectivity data suggests identical circuits for bees and flies (except at the edges of the ring), but connectivity data shows that the components of the ring attractor circuit are forming circuits that are distinctly different between the species (despite the same neurons with the same projection patterns being involved).

      What we aim to do is slightly different. We define global patterns of information flow using the projectome, and then define circuits in a part of this global circuit at synaptic level. Then, we extrapolate the global connectivity by assuming that the circuits identified in one or two computational units (columns) are repeated in each column. This rests on the assumption that the same neurons form the same connections in each repeated module, as long as the cellular repertoire is identical (verified by the projectome), but does not use proximity data to predict connectivity. This method thus only applies to brain regions that consist of repeated computational modules, i.e. where we can assume that knowing the connectivity in one of them allows extrapolation to the entire brain region. While this is a simplification, the Drosophila CX has in principle confirmed this assumption.

      We will generate a new figure in which we illustrate the process of combining local connectomes and global projectomes using examples from our data, but illustrating this schematically also for other brain regions, e.g. the insect optic lobe or the cerebral cortex of mammals. We will also carefully rewrite the relevant text passages to avoid misunderstandings.

      Overall, we would like to thank the reviewers again for their thorough and detailed comments, which will help to make our connectomics workflow more accessible and reproducible.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      We appreciate that the sample size is small, but this was limited by the technical difficulty and relatively low yield with this preparation. From a total of 16 experiments, we were able to obtain successful recordings in 6 cases, and these provided the characterisation of the 11 cells reported in Figure 4. We believe that this is sufficient to “strongly suggest” that the cells with dense Trpm8 input correspond to cold-selective cells. We have toned down the statements in the abstract (line 23) and the Results section (line 246).

      (2) The authors used tdTomato expression to identify brain targets innervated by these coldselective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      As the Reviewer says, tdTomato labelling fills the entire cell. However, examination at high magnification reveals numerous varicosities along the labelled axons, presumably corresponding to synaptic boutons. We now illustrate this in Figure 6–figure supplement 2F.

      In addition, we have provided further evidence that these varicosities correspond to (glutamatergic) synaptic boutons by immunostaining sections through the LPB for the postsynaptic density protein Homer1, and showing Homer1 puncta apposed to varicosities (Figure 6–figure supplement 2 G,H). This new information now appears in the Results section (lines 374-380).

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold - selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

      We agree that branches to different brain nuclei may originate from specific subsets of ALS3 neurons and this is now stated in the figure legend. It is true that there are projections to other brain regions (including NTS). These are not included in the diagram, because their circuitry in relation to cold-sensing is less well understood. Although the projection to VPL from lumbar cord is sparse, this is likely to be explained by the very low proportion of lamina I projection neurons with axons that reach the thalamus. Our retrograde tracing data (e.g. Figure 6-figure supplement 4) had already revealed many cells in the C7 segment that were densely coated with Trpm8 afferents and retrogradely labelled from the lateral thalamus. We have carried out additional experiments in which AAV1.Cre<sup>ON</sup>.td Tomato was injected into the cervical enlargement of Calb1<sup>Cre</sup> mice.This resulted in much denser labelling in the VPL and PoT thalamic nuclei, supporting the suggestion that cold-selective lamina I neurons in the cervical enlargement project to these nuclei. This is now described in lines 381-387 and illustrated in Figure 6–figure supplement 3.

      Reviewer #2 (Public review):

      (1) In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      We fully accept that the sample size is small (please see response to Reviewer 1 above). We also accept that the thermal stimulation was not that well controlled. Unfortunately, commercially available probes for controlling skin temperature are too large to apply to the skin in this preparation. For this reason, we have used application of hot and cold saline, as in our previous studies with this preparation.

      (2) The authors could provide some sense of the effort needed to record from the 6 coldactivated neurons described. How many preparations were needed, etc?

      We now state that 6 out of 16 experiments resulted in successful recordings for this part of the study (lines 858-861).

      Reviewer #3 (Public review):

      (1) While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

      We have now carried out optogenetic experiments by expressing channelrhodopsin in Trpm8 afferents and retrogradely labelling ALS neurons with tdTomato. This has allowed us to directly demonstrate monosynaptic input. This is described in the Results section (lines 180-202) and the Methods section has been updated. As noted above, we have toned down the statement about lamina I neurons surrounded by Trpm8 afferents being coldselective (line 246).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The patch innervation of Trpm8+ sensory neurons in lamina I of the spinal cord dorsal horn is interesting. Do they occupy specific areas within lamina I along the mediolateral axis, or are their placements random? Quantifying the distribution of these terminals in lamina I might be worthwhile.

      Although we have not studied the mediolateral distribution systematically, it appears that the locations of the patches in the mediolateral axis is random, and they could be seen in medial, central and lateral parts of lamina I (as shown in Figure 2). We have added a comment to this effect in the Results section (lines 114-116). Quantifying Trpm8 terminals would be very labour-intensive, and we do not feel that this would be of great benefit.

      (2) Quantification for the percentage of Trpm8+ boutons contacting Phox2a+ neurons that are vGlut3+

      The main purpose of this part of the study was to provide a possible explanation for the finding by Li et al (2015) that some lamina I cells were associated with Vglut3-

      immunoreactive boutons. We found that the percentages of Trpm8+ boutons that contained Vglut3 varied considerably from cell to cell, and this is now stated in the text (lines 133134). However, knowing exact proportions was not an important aspect of the study, we have therefore not carried out a detailed analysis.

      (3) Quantification for the percentage of PBN projections neurons densely innervated by Trpm8+ axons that are calb1+.

      As requested, we have carried out immunohistochemistry to determine the proportion of lamina I ALS neurons with dense Trpm8 input that are calbindin-immunoreactive. We examined 31 neurons from 3 different mice and found that all but 4 (i.e. 87%) were immunoreactive. This is now described (lines 287-293) and illustrated (Figure 5–figure supplement 1). We have now put the electrophysiological characterisation that was in this figure into a separate supplement (Figure 5–figure supplement 2).

      (4) It might be helpful to confirm the brain projection targets of Cal1b+ lamina 1 projection neurons using AAV1-CreON-Synaptophysin-GFP (or other fluorescent proteins) injections

      Please see our response to Public review Reviewer 1 comment 2 above. We have provided further evidence that the brain regions that received input from the Calb1+ cells contain axonal boutons (lines 374-380 and Figure 6–figure supplement 2F-H).

      (5) Figure 6 - Figure Supplements 3 and 4 are duplicated

      We apologise for this duplication, which was made in error in the version originally submitted to eLife. This has now been corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned, in the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low, some recorded in current clamp, a few in voltage clamp. This prevents any solid statistical evaluation of the findings

      Please see response to response to the first point made by Reviewer 1 in the Public reviews. As stated above, we have toned down the statement about the relationship between cells with dense Trpm8 input and cold-selective cells (line 246).

      (2) In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the synaptic connection between afferents and ALS projection neurons.

      Please see our response to the Public review comment made by this Reviewer.

      (3) Line 35. In the description of the anterolateral system and the effects of lesions, the species(s) should be specified since rodents and humans have a different anatomical distribution of spinal tracts.

      We now state that while ALS axons ascend in the anterolateral quadrant in humans, they are located in the dorsolateral white matter in rodents (lines 40-42)

      (4) To describe the semi-intact preparation used for recording and stimulation from the periphery, the authors cite a study by Julien Allard (reference 25). However, that study describes an in vivo preparation. I believe there is an error in the citation.

      We thank the Reviewer for pointing this out – it has now been corrected.

      (5) Line 726. Dorsal horn recordings were performed at 25 ºC. What is the temperature of the skin? How would this low temperature affect the excitability of cold afferents and their axons? Perhaps a comment about this issue would be appropriate.

      The skin temperature in this preparation is the same as that of the spinal cord (25 °C). At this temperature, Trpm8 afferents would be active, but are likely to have adapted during the course of the experiment. Since this temperature is below 37 °C, it is likely that the conduction velocity of these afferents will be slower than in the in vivo situation. We have added a comment to this effect (lines 818-821).

      (6) Line 401. The authors could not detect Trpv1-immunoreactivity in the central terminals of Trpm8Flp;RCE:FRT mice. Could they detect Trpv1 immunoreactivity in any central terminal? Do they have positive evidence that their immunostaining worked?

      Trpv1 was readily detected in central terminals with the Trpv1 antibody. An example showing lack of detectable Trpv1-immunoreactivity in GFP-labelled (Trpm8-expressing) afferents is now shown in Figure 2–figure supplement 1K-M.

      (7) Line 437. What is the expected anterograde transport time for YFP from the lumbar cord to the brainstem? Are 2-3 weeks not sufficient based on the literature? I noticed the authors are using longer survival times after intraspinal injections

      In preliminary experiments for a previous study Substance P-expressing excitatory interneurons in the mouse superficial dorsal horn provide a propriospinal input to the lateral spinal nucleus | Brain Structure and Function we had found that a 2 week survival time after injection of AAV1.Cre<sup>ON</sup>.GFP into the lumbar spinal cord of Tac1<sup>Cre</sup> mice was not sufficient to label axons in the brain, although at 4 weeks we saw brain labelling. We have also found that extending survival times from 4 to 6 weeks gives greatly improved labelling, especially in the thalamus.

      (8) Figure 5A. Many of the labelled cells appear to have the somas in the white matter, which makes little sense. It seems the reference section to plot the cells is not optimal

      The placement of cells is accurate. Many spinal projection neurons are present outside the main region of grey matter (i.e. laminae I-X). These cells are found in 2 main regions – the lateral spinal nucleus (LSN) and the lateral reticulated part of lamina V. These two regions are intermediate between grey and white matter – i.e. they contain scattered cell bodies amongst a dense collection of axons. For this reason they appear outside the grey/white border as it is conventionally shown on diagrams of this type. This has been reported in numerous studies, e.g. see Figure 2 in The cells of origin of the spinothalamic tract of the rat: a quantitative reexamination - PubMed.

      (9) Recent transcriptomic studies suggest the presence of more than one subpopulation of Trpm8-expressing DRG or trigeminal neurons. It is unclear to what extent the Trpm8-Flp line is capturing this diversity.

      We are aware that there are at least 3 transcriptomic subsets of Trpm8-expressing primary sensory neurons. However, we are not aware of any suitable molecular markers that would allow us to discriminate between them, and therefore address this point.

      (10) Could the patchy distribution of Trpm8 afferents in lamina I reflect incomplete recombination; the empty spaces could be occupied by unmarked afferents?

      In theory it could, but this seems unlikely. The Trpm8<sup>Flp</sup> line (crossed with RCE:FRT) captures ~83% of Trpm8-positive cell bodies, and it seems very unlikely that the remaining 17% of Trpm8-expressing afferents would fill the spaces between GFP bundles that we see in lamina I. This is now stated in the Results section (lines 116-120).

      Reviewer #3 (Recommendations for the authors):

      (1) It would be a nice addition to the validation of the Trpm8-Flp line to specify what ages (if multiple) have been analysed and whether there are any differences. In addition, is labelling different at different levels of the spinal cord, and is there any labeling in supraspinal regions?

      The tissue used for this part of the study was obtained from mice aged 5-9 weeks and this is now stated (lines 78-79). We did not observe any differences with age, but we did not look at this in detail. Labelling was similar at different levels of the spinal cord, and this is stated (lines 108-109). We have added a brief account of the distribution of GFP labelling in the brain (lines 140-144).

      (2) Line 169. It is not clear how ALS neurons are labeled. It is explained in the material and methods (I believe it is AAV9.mCherry into the LPB or CVLM). Although I could not find a mention of a tdTomato AAV, maybe I missed it. In any case, it would be great to have the experimental strategy briefly explained in the text. For the same reason, I would recommend moving Figure 4 Supplement 1A and 1B schematics to the main figure, very helpful for understanding the experiment.

      We thank the Reviewer for this suggestion. We now explain in the Results section how the ALS neurons were labelled (lines 209-212), and as the Reviewer recommends we have put the schematic diagrams from Figure 4–figure supplement 1 into the main Figure. As noted in the text, the tdTomato labelling resulted from injection of an AAV coding for Cre into mice that contained the Ai9 allele. We have also updated the descriptions of brain injections in the Methods section to cover the new experiments (optogenetics, and calbindin immunohistochemistry).

      (3) Line 184. "Figure 4" would be good to specify the panels; I believe it should be 4A-C. Same for line 194, 4D-F?

      We apologise that this was omitted from the original version – we have now specified the panels.

      (4) Line 179. It would be great to specifiy in the text and figures the temperature used for hot and warm water. In addition, would the responses be different using different temperatures? Can you test ramps? These would go a great way to compare with responses shown in vivo by Ran and colleagues.

      We now specify the hot and cold saline temperatures used to stimulate the skin in the semiintact preparation in the legend for Figure 4 and in the Results section (lines 222-223). As noted above, it is difficult to use more accurate thermal stimuli in this preparation. Please see response to Reviewer 2 public comment 1.

      (5) Figure 4-Figure supplement 1F. It looks like these are very slow responses (1 sec?) for monosynaptic connectivity.

      In this figure (now part 1D) the action potential frequency was determined from counts of APs in 1 sec bins, and this is now stated in the legend. This might have given the impression of slow responses.

      (6) Line 203. I would tone down the statement, as only 6 cells "that were clearly associated with numerous GFP-labelled afferents" have been tested. Thus, it cannot be excluded that other cells with similar anatomical characteristics may also respond to other stimuli

      As requested, we have toned down this statement (line 246).

      (7) Line 230. Here AAV11.CreON.td Tomato is used, in previous retrograde experiments, AAV9 has been used (Figure 4), why the switch to 11? Is the tropism the same? Is it possible that because you are using a different serotype, you are targeting different neurons?

      We have found that although AAV9 coding for fluorescent proteins is very good for retrograde labelling, AAV9 coding for Cre-dependent constructs (e.g. AAV.Cre<sup>ON</sup>.tdTomato) gives very poor recombination in spinal projection neurons, for reasons that we do not understand. We recently became aware of the AAV11 serotype, which was recommended as being suitable for retrograde transport AAV11 enables efficient retrograde targeting of projection neurons and enhances astrocyte-directed transduction | Nature Communications. We have found that this works very well for labelling ALS cells throughout the spinal cord when using Cre-dependent constructs. We have added a reference to this paper at this point in the text. We are not able to say whether tropism is the same or different, but in each case many ALS neurons (including many of those in lamina I) are captured.

      (8) Line 234. Is there any positional organization for the "tdTomato-labelled cells densely innervated byTrpm8 afferents", do they preferentially cluster in some position of lamina I?

      These cells are found throughout the mediolateral extent of the dorsal horn, and this is now stated (lines 279-280).

      (9) Line 237. The actual number of cells/mm would be informative.

      This would be difficult to estimate, as the sections were cut in the horizontal plane, which means that lamina I can appear on a variable number of sections.

      (10) Line 249. From the figures, the action potentials of the Calb+ neurons seem to have a delayed onset (at the end of cold saline treatment, Figure 5, Supplement 1l) compared to lamina I ALS neurons recorded in Figure 4, Supplement 1f. If real, it is an interesting difference in the time-course of response that could indicate different coding properties e.g., response to cooling (general neurons) vs. response to absolute temperature (calb + neurons).

      As for Fig 4-figure supplement 4 (see response to point #5 above), action potential frequency was determined from APs counted in 1 sec bins, and this is now stated in the legend.

      (11) Figure 7. In the model, the disynaptic pathway should also be shown

      We have added a comment to the legend stating that there may also be indirect (“polysynaptic”) input from Trpm8 afferents to ALS3 neurons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods, including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior). There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus, this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Thank you for this constructive comment. Indeed, the current study lacks comprehensive strategies to unequivocally distinguish excitatory CCK neurons from heterogeneous CCK neuronal populations. Nevertheless, we provide multiple lines of evidence supporting the distribution of CaMKIIα/Vglut1-expressing CCK<sup>+</sup> neurons in the hippocampus (Figure 1F), using complementary approaches including transgenic mouse models as well as viral and antibody-based labeling (Figure 1A, Figure 1H-I). In addition, we demonstrate that 635 nm light reliably evokes field excitatory postsynaptic potentials (fEPSPs) at CA3-Schaffer collateral synapses expressing DIO-CaMKIIα-ChrimsonR in vitro (Figure 2A-F). Importantly, these light-evoked excitatory synaptic responses are abolished by AMPA and NMDA receptor antagonists (CNQX and APV), confirming the excitatory nature of the DIO-CaMKIIα-ChrimsonR-expressing synapses. To demonstrate the future works that can further support our findings and conclusions, we have added the strategies that can be conducted in the Discussion section in the revision:

      “Due to technical limitations at the current stage, we were unable to perform whole-cell recordings or pharmacological manipulations using CCK receptor antagonists. In future studies, the application of these approaches to directly record and selectively block EPSPs from excitatory CCK neurons in the hippocampus will further strengthen and validate our conclusions.” (Line 265 - line 269 in the revision).

      (1b) For the experiments that use a virus with the CCK-IRES-Cre mouse, there is no information or characterization on how well the virus targets excitatory CCK-expressing neurons. (Additionally, it has been reported that with CaMKIIa-driven protein expression, using viruses, can be seen in both pyramidal and inhibitory cells.

      We thank the reviewer for this insightful comment regarding the specificity of viral targeting in CCK-IRES-Cre mice.

      To address this concern, we performed additional characterization of viral expression in CA3. We found that DIO-CaMKIIα-mCherry expression showed a high degree of colocalization with CaMKIIα immunoreactivity, indicating preferential targeting of excitatory neurons (sFigure 1A-B; sFigure 2A-B; sFigure 3A-B). We showed an example to confirmed the high specificity of the AAV for infecting the excitatory CCK neurons in CA3 area.

      Besides, we acknowledge prior reports showing that CaMKIIα-driven viral expression can, in some cases, be detected in a small subset of inhibitory neurons. However, because CA3-Schaffer collateral projections to CA1 arise exclusively from excitatory CA3 pyramidal neurons, any potential expression in inhibitory CCK<sup>+</sup> interneurons are unlikely to directly contribute to the recorded CA1 synaptic responses in our electrophysiological experiments. That said, we cannot fully exclude the possibility that a minor population of inhibitory CCK⁺ neurons could indirectly modulate CA3 pyramidal neuron activity via local circuit mechanisms, particularly in experiments involving optogenetic manipulation or shRNA expression. We now explicitly acknowledge this limitation in the revised manuscript:

      “Importantly, to further improve cell-type specificity, we propose an intersectional genetic strategy using CCK-IRES-Cre × VGlut1-Flp mice combined with a Cre-On/Flp-On (Con/Fon) AAV, which would restrict expression exclusively to excitatory CCK-expressing neurons and eliminate potential contributions from inhibitory CCK<sup>+</sup> cells. This approach will be implemented in future studies to refine circuit specificity.” (Line 269 - line 273 in the revision).

      (2) The methods and figure legends are extremely sparse, leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data. More details would be useful in evaluating the tools and data. Additionally, further quantification would be useful-e.g. in some places, only % values are noted, or only images are presented.

      Thank you for these constructive comments. We have expanded the methodological descriptions in both the Methods section and the figure legends to provide sufficient detail for evaluating the experimental tools and data accuracy. In addition, we have added quantitative analyses where previously only representative images or percentage values were shown. Specifically, quantification has now been included for each AAV condition in the corresponding figures in the revised manuscript.

      (3) It is unclear whether the reduced CCK expression is correlated, or directly causing the impairments in hippocampal function. Does the CCK-shRNA have any additional detrimental effects besides affecting CCK-expression (e.g., is the CCK-shRNA also affecting some other essential (but not CCK-related) aspect of the neuron itself?)? Is there any histology comparison between the shRNA and the scrambled shRNA?

      Recent studies from our lab demonstrated that knockout the CCK gene expression significantly attenuates the hippocampal-dependent spatial learning and CA3-CA1 LTP, indicating CCK plays a critical role in modulating the hippocampal functions[1,2]. Additionally, CCK-shRNA or CCK-scramble did not significantly affect the excitatory synaptic transmission in the CA3-CA1 projections, hinting that CCK-shRNA may exhibits no obvious adverse effect on other neural components.

      Finally, we have provided the histology comparison between the shRNA and the scrambled shRNA regrading the expression level of the CCK protein (Pro-CCK) in the revision. Our result shows that CCK-shRNA (left panel) significantly reduced CCK expression in CA3<sup>CCK</sup>-positive neurons compared with the CCK-Scramble group (right panel).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      https://doi.org/10.7554/eLife.109001.1.sa2

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      (1) The authors do not mention which receptors of the CCK modulate these processes.

      We appreciate the reviewer for raising this important question. Based on our recent work, CCK-B receptors are the primary neural components mediating CCK functions in the hippocampus at both the synaptic plasticity and behavioral levels (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). To clarify this mechanism, we have added the following content to the revised manuscript:

      “Based on our recent work, CCK signaling in the hippocampus is predominantly mediated by CCK-B receptors, which play a critical role in regulating synaptic plasticity and spatial memory-related behaviors.” (Line 105 - line 106 in the revision).

      (2) This author does not test the CCK gene knockout mice or the CCK receptor knockout mice in these neural processes.

      Thank you for this insightful comment. We previously tested these experiments in an earlier study. Our results showed that high-frequency electrical stimulation failed to induce significant LTP in the CA3-CA1 pathway in both CCK gene knockout (CCK-KO) mice and CCK-B receptor knockout (CCK-BR-KO) mice in vitro (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). These findings indicate that CCK mediates its synaptic effects predominantly through CCK-B receptors in the CA3-CA1 pathway. Accordingly, we have added this description to the revised manuscript.

      “Additionally, high-frequency electrical stimulation fails to induce LTP in the CA3-CA1 pathway in both CCK-KO and CCK-BR-KO mice, indicating that CCK-dependent synaptic plasticity in this circuit is primarily mediated by CCK-B receptors.” (Line 170 - line 173 in the revision).

      (3) The author does not test the source of CCK release during the behavioral tasks.

      We thank the reviewer for raising this important point. In our previous work, we directly monitored CCK release in the hippocampus during an object-exploration task using a GPCR-based CCK-BR sensor combined with fiber photometry (Su et al., 2023). During object exploration, we observed a rapid and robust increase in CCK-BR sensor fluorescence, indicating activity-dependent CCK release in the hippocampus. Based on these findings, we deduced that hippocampal CCK release plays a critical role in hippocampus-dependent behavioral tasks.

      We acknowledge that hippocampal neurons receive CCK-positive projections from multiple brain regions, making it technically challenging to isolate and monitor the precise source of CCK release in the CA1 area during behavioral tasks in vivo. One potential strategy to address this limitation is selective overexpression of CCK in CA3 neurons (e.g., AAV-CCK delivery), followed by assessment of CCK-BR sensor responses during hippocampal-dependent behaviors. We have added this discussion to the revised manuscript to clarify the source and functional relevance of CCK release during behavioral tasks.

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (3) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

      https://doi.org/10.7554/eLife.109001.1.sa1

      Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      Thank you for pointing out this important issue. Previous studies have shown that CCK can be rapidly secreted during exploratory behaviors, as detected by the CCK-BR sensor. In parallel, CCK-positive neurons have been demonstrated to play a critical role in the precise execution of hippocampus-dependent spatial learning. Together, these findings suggest that exploratory behavior induces CCK secretion, which in turn contributes to the accuracy of hippocampal-dependent learning and memory processes. Based on this evidence, we propose that CCK secretion serves as a functional link between behavioral exploration and spatial learning. We have added these explanations in the revised manuscript to better clarify the causal relationship between behavioral exploration and CCK secretion:

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision)

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      We thank the reviewer for this comment. In fact, an earlier study from our laboratory demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models. These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels (Zhang et al., 2024; Wang et al., 2025). Accordingly, although direct genetic overexpression of CCK in the hippocampus has not yet been extensively characterized, the observed benefits of exogenous CCK delivery support the notion that increased CCK availability positively modulates hippocampal function and spatial learning. We have cited this study in the revised manuscript to support this interpretation.

      “Interestingly, an earlier study demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models (Zhang et al., 2024). These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels.” (Line 291 - line 297 in the revision)

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      In the hippocampus, CCK-expressing neurons consist of two major populations with distinct functions: excitatory (glutamatergic) and inhibitory (GABAergic) neurons. Excitatory CCK neurons are relatively sparse and intermingled with pyramidal cells. By releasing glutamate, they directly contribute to excitatory transmission and are thought to participate in synaptic plasticity and information processing related to learning and memory. In contrast, inhibitory CCK neurons are more abundant and include well-characterized interneuron subtypes such as CCK-positive basket cells. These neurons release GABA and primarily target the perisomatic region of pyramidal neurons, providing strong control over neuronal firing. Notably, inhibitory CCK interneurons are highly sensitive to neuromodulatory signals, particularly endocannabinoids via CB1 receptors, enabling dynamic regulation of inhibitory tone and network activity. Together, excitatory CCK neurons mainly support hippocampal excitation and plasticity, whereas inhibitory CCK neurons regulate network dynamics and spike timing. As the focus of the present study is on excitatory CCK neurons, a detailed comparison between these two populations was not included in the original manuscript.

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

      Thank you for this insightful comment. Our data indicate that the CCK detected during high-frequency stimulation originates from CA3 neurons rather than the entorhinal cortex (EC). As shown in Figure 2, we used an optogenetic approach combined with a GPCR-based CCK sensor to selectively examine CCK release from the CA3-CA1 pathway. ChrimsonR was specifically expressed in CA3 neurons projecting to CA1, restricting light stimulation to CA3 axon terminals. In parallel, the CCK sensor was locally expressed in CA1, allowing real-time detection of CCK release at CA3 presynaptic sites. High-frequency light stimulation robustly evoked CCK signals in CA1, demonstrating activity-dependent CCK release from CA3 terminals. Importantly, EC inputs were neither genetically targeted nor optically stimulated in this experiment, excluding the EC as a source of the detected CCK. Together, these results support the conclusion that CCK released during high-frequency stimulation is derived from local CA3 projections to CA1. Similarly, as the focus of the present study is on excitatory CCK neurons in the CA3 area, a detailed comparison between these two CCK sources was not included in the original manuscript.

      Citation:

      (4) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (5) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (6) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

    1. Author response:

      Reviewer #1 (Public review):

      Hierarchical Inference (Unit Survey)

      We agree that pooling units across preparations can overstate the strength of inference if preparation-level clustering is ignored. We will therefore reanalyze the unit-survey dataset using a hierarchical approach in which the preparation/animal is treated as the unit of inference. Our pooled dataset was derived from three chunk preparations exposed to AMPA and three baseline preparations, allowing us to report per-preparation proportions and variability as requested.

      A preliminary reanalysis of the buccal segment preparations is summarized below. In this analysis, the unit of inference is shifted from individual recorded units to the preparation level (n = 3 baseline; n = 3 at 60 nM AMPA), thereby accounting for potential within-preparation dependence.

      Author response table 1.

      The distribution of units for each of the three preparations per condition is as follows:

      Using the proportion of buccal units per preparation as the dependent variable:

      Baseline (n = 3): mean proportion of buccal units = 6.5% (SD 5.7%).

      60 nM AMPA (n = 3): mean proportion of buccal units = 53.2% (SD 6.0%).

      Absolute difference in proportions = 46.7% (95% CI 33.4% to 59.8%).

      Independent-samples t-test on per-preparation proportions: t(4) = 9.77, p = 0.0006.

      Thus, this preliminary hierarchical reanalysis indicates that the observed recruitment is consistent across preparations and is not driven by outlier data from a single animal. These results support substantial expansion of the buccal oscillator with excitation.

      Statistical Standardization: In the revision, we will better justify our use of parametric and non-parametric versions of the one-sample tests and review usage in the Methods, Table 1, and figure legends for consistency.

      Exclusion criteria for microinjection experiments: We will extend the description of these experiments by including a flow diagram summarizing the 15 attempted microinjection experiments and documenting the technical reasons for the 9 exclusions. These exclusions reflected the technical requirements of the preparation: (a) the buccal area had to be localized before AMPA excitation so that the effects of buccal-area manipulation during excitation could be interpreted reliably, which was not always possible; and (b) preparations had to exhibit sufficiently sustained periods of consecutive buccal bursting to permit quantification of buccal burst frequency, whereas some preparations expressed motor patterns dominated by lung bursts.

      Pharmacological Potency and Necessity: We will revise the wording of this section to make the causal interpretation more precise. Our data already show that local GABA microinjections can reverse the excitatory effects of local AMPA microinjections, providing an internal control for local pharmacological efficacy of GABA when the local network is excited. Notably, the local AMPA concentration used in these experiments (5 µM) is nearly two orders of magnitude greater than the 60 nM concentration used in bath application. We therefore interpret the failure of focal GABA inhibition to abolish rhythm during global excitation as being consistent with expansion of rhythmogenic capacity beyond the spatial reach of the local injection, rather than with failure of the GABA manipulation itself.

      Finding an inhibitory site that remains sensitive in bath applied AMPA is an interesting experiment but this would require identifying the anatomical substrate of a brainstem circuit for a non-ventilatory circuit in Rana that is guaranteed not to undergo reconfiguration with AMPA. This is beyond the scope of the current manuscript; based on our work to identify the neuronal substrate for ventilation in Rana, this would take at least five years to complete. In addition, having identified such a circuit there would be no guarantee that AMPA would not cause reconfiguration in this case too. With regards to transection boundaries and location of injections, we agree these would be useful refinements. We used the location of nerves as reliable landmarks to guide transections and located the buccal area using stereotactic coordinates to guide micropipette insertion and functional criteria (AMPA and GABA sufficiency and necessity tests) to locate the exact position based on our previous work.

      Unit Classification: We will review the nomenclature we use to define units to ensure it does not cause confusion and provide more explicit criteria for unit classes. This will include clarification of the absence of “buccal-only” units as currently defined. Specifically, when both buccal and lung rhythms are present, units active during buccal bursts are also active during lung bursts in our preparation. This does not conflict with the multiple interacting oscillator model we have proposed previously. Rather, recruitment of buccal-area neurons during lung bursts is consistent with a model in which the lung oscillator excites the buccal oscillator. It is also consistent with prior evidence that lung bursts persist after buccal-area ablation. In addition, burst frequency during lung episodes exceeds buccal burst frequency during intervening buccal periods. We will revise the text to make this logic clearer.

      Reviewer #2 (Public review):

      (1) Degeneracy vs. Redundancy

      We agree that degeneracy is the more precise term for the phenomenon our data demonstrate, in which structurally and functionally distinct neurons (lung units) acquire the capacity to participate in buccal rhythm generation under excitation. The Discussion already uses this language (e.g., "necessity and sufficiency may not work in a large degenerate network where rhythm generation is distributed across many elements"), but we used the word "redundant" in the Key Points Summary and Abstract in the broader sense of distributed robustness that a wider readership could grasp. Nonetheless, we recognize the distinction drawn by Goaillard and Marder (2021) and, considering the reviewers concerns, we will revise the Abstract and Key Points to adopt the degeneracy framework consistently.

      (2) Loss of Essential Requirement for a Discrete Oscillator

      The reviewer asks whether expansion of the rhythmically active region necessarily implies loss of the rhythmogenic kernel. We believe our necessity and sufficiency experiments (Figure 9) directly address this. Under baseline conditions, GABA microinjection into the buccal area reliably abolishes buccal bursting; under 60 nM bath AMPA, the same injection at the same location and volume has no significant effect on buccal frequency. If the kernel remained essential and the surrounding recruitment were merely supplementary, local inhibition of the kernel should still slow or abolish the rhythm. It does not. We interpret this as evidence that the essential requirement for the discrete buccal area is lost under excitation, not merely that a larger area has been recruited around a still-critical core. We acknowledge, however, that the word "lost" could be read as implying permanent elimination rather than state-dependent suspension, and we will temper this language in the revision.

      (3) Novelty Relative to Mammalian Studies

      We appreciate the reviewer drawing attention to the cited mammalian literature (Del Negro et al., 2002, 2009; Baertsch et al., 2018, 2019), which we discuss in detail in the manuscript. However, we respectfully note that our findings extend this literature in several ways that the public review does not acknowledge. First, Baertsch et al. demonstrated recruitment of tonic or silent neurons to become phasically active during inspiration; we show that neurons already assigned to one oscillator phase (lung) can be dynamically reassigned to another (buccal), which represents a qualitatively different form of reconfiguration. Second, we developed a novel approach to functionally ablate motor neuron pools using high-frequency nerve stimulation, enabling the unit survey to be interpreted at the premotor level which was not achieved in the mammalian studies cited. Third, our data provide the first demonstration of state-dependent oscillator expansion in a non-mammalian tetrapod, offering evolutionary context that strengthens the generality of the principle. We will revise the term "promiscuous" if it overstates the claim, but we maintain that our data support the conclusion that oscillator boundaries are flexible, which goes beyond what has been shown in mammals.

      (4) Figure 6, CN5 Output Under AMPA

      The reviewer asks whether the shift in premotor unit composition is reflected in CN5 motor output. This is a reasonable question. As noted in the manuscript, 60 nM AMPA produces only minor changes in the overt motor pattern as recorded from CN5, which is precisely why we interpret the premotor changes as a reorganization of the network's internal architecture that is not readily apparent from motor output alone. This is in sharp contrast to observations of substantive network reconfiguration in mammals in which eupnea is replaced by the pathological condition of gasping. We will add quantification of CN5 burst parameters (amplitude, duration, frequency) under baseline and 60 nM AMPA to make this point explicit.

      (5) Subthreshold Recruitment vs. Network Expansion

      The reviewer suggests that neurons classified as newly rhythmic under AMPA may have been part of the rhythmic network all along, receiving subthreshold inputs at baseline. We are grateful to the reviewer for highlighting this and hope they would agree that the literature clearly demonstrates that all respiratory neurons receive subthreshold phasic inputs of one kind or another, perhaps providing a clue that reconfiguration is a common feature of respiratory networks generally. Regardless of the implications for other animals, we agree this is likely the mechanism at work in the frog, and indeed our manuscript states that "this increase in the number and proportion of premotor buccal units is due in part to recruitment of sub-threshold buccal neurons that, under low excitability, only fire during lung bursts," citing intracellular evidence from Kogo and Remmers (1994) that lung neurons in this region receive subthreshold buccal-timed input. We note that this observation does not diminish our conclusion and likely explains the mechanism by which network expansion occurs. Whether one calls these neurons "newly recruited" or "pushed above threshold," the functional consequence is the same: a larger population of neurons is now rhythmically active during buccal bursts, and the necessity of the original buccal area is lost. We will clarify this reasoning in the revision and acknowledge the limitation that additional intracellular recordings from our preparation would be needed to fully characterize the subthreshold dynamics.

      (6) Figure 8, Epoch Length and Meta-analysis

      The reviewer notes that the pre-AMPA epoch appears shorter than the post-AMPA epoch in Figure 8A, which could bias unit classification. We will address this in the revision by reporting epoch durations explicitly and addressing its implication on spike counts where appropriate. Regarding the request for meta-analysis of lung unit spiking during baseline buccal bursts: this analysis is part of the rationale for the phase-recruitment panels, and we will expand Figure 8 to include the requested cross-condition comparisons (lung unit activity during baseline buccal bursts, and during post-AMPA lung bursts) as also suggested by Reviewer 3.

      (7) Figure 9, Buccal-to-Lung Burst Ratio

      The reviewer observes that the ratio of buccal to lung bursts decreases from approximately 4-5:1 under baseline to 2-3:1 under 60 nM AMPA, and suggests this is inconsistent with conversion of lung units into buccal units. We do not believe this is inconsistent. The buccal-to-lung burst ratio reflects the overt motor pattern, which is determined by the interaction of multiple oscillators and is influenced by AMPA at both buccal and lung levels. A change in this ratio does not speak to whether individual premotor units have acquired buccal-timed activity; the unit survey and the single-unit transformation data (Figure 8) address that question directly. Regarding the alternative model involving efference copy and cross-inhibition: this is an interesting hypothesis, but it is speculative and not tested by the current dataset. We are happy to discuss lung-buccal interactions more fully in the revision, including the parallels to parafacial/preBötC interactions in mammals, but we note that our data on unit transformation are better explained by network reconfiguration than by a feedback model that remains to be tested.

      (8) "Independent" Slices

      The reviewer compares our Level 2 transection to the preBötC sandwich slice preparation and argues the two resulting slices are not independent. We take the reviewer's point that "independent" may be read as implying no shared developmental or functional origin, which is not our intent. By "independent" we mean that the two physically separated slices can each generate rhythmic output without being synaptically connected to each other. This is, in fact, our central point: rhythmogenic capacity is distributed across a region broad enough to endow two separated slices with independent rhythm-generating capability when excited. We note that the analogy to the sandwich slice is imperfect because in our Level 1 cuts, only the rostral slice containing the buccal area generates rhythm -- the caudal slice does not -- whereas Level 2 cuts that bisect the buccal area produce rhythmicity in both halves, consistent with distributed capacity specifically within the buccal region. We will revise the wording to clarify what we mean by "independent" in this context.

      Reviewer #3 (Public review):

      Physiological Parallels: We will expand the Discussion to place these findings in a broader comparative context, including the eupnea-to-gasping transition in mammals as an example of state-dependent reconfiguration of respiratory networks. This will also allow us to clarify two advances that may otherwise be missed when comparing our work to that in mammals: (a) we developed a novel approach to functionally eliminate motor neurons, allowing mapped units to be interpreted as premotor; and (b) the state-dependent reconfiguration of the buccal oscillator occurred without qualitative changes in the overt lung-buccal motor pattern.

      Unit Transformation Analysis: We will revise Figure 8 to improve clarity around the observed lung-to-buccal transformation by expanding the phase-recruitment panels as suggested and will revisit the operational definitions of lung and buccal unit identity to reduce ambiguity. The central observation is that some units active only during lung bursts under one condition become active during buccal bursts when network excitation is increased.

      Saturation vs. Network Expansion: We will directly address the possibility that 60 nM bath-applied AMPA simply pushes the network toward a frequency ceiling. Two observations strongly argue against this interpretation: (a) 60 nM global AMPA produced only mild changes in buccal frequency, whereas local AMPA injection at much higher concentrations produced larger effects; and (b) local GABA was sufficient to reverse the effects of high-concentration local AMPA microinjections but insufficient to abolish rhythm during low-concentration global AMPA application. Together, these findings are more consistent with global AMPA endowing the network with distributed rhythm-generating capacity than with simple saturation of a discrete local oscillator. Notwithstanding these arguments, we will attempt to extend AMPA/GABA dose response experiment as suggested or add the lack of such experiments as a caveat to our interpretation.

      Figure 9C Correction: We will correct the statistical markings in Figure 9C to align with the text in the Results regarding the significance of frequency changes under 60 nM AMPA.

      In total, we believe these revisions will improve the rigor and clarity of the manuscript while preserving the central conclusion supported by the data: that the organization of the frog respiratory rhythmogenic network is state dependent and becomes more distributed under excitation.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

      We thank the reviewer for the positive evaluation and helpful feedback.

      Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.

      Using available scRNAseq data, the authors assessed the CD34<sup>+</sup>/Sca-1<sup>+</sup> cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the valuable comment regarding the potential role of the CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cell sub-population in PLC function.

      We agree that direct functional validation would be a crucial next step to confirm the contribution of this specific sub-population to PLC formation and function. The focus of the present study remains on the spatial localization and reproducible characterization of PLC structures based on 3D imaging, as well as the relevant transcriptional features revealed by single-cell analysis.

      To avoid overinterpretation, we have revised the Discussion section accordingly, providing a more focused and cautious description of the related findings.

      Comments on revisions:

      I appreciate the author's effort to revise the text so it more rigorously adheres to the presented evidence. Following a thorough read of the revised text, a few remaining minor issues were identified in the Discussion.

      (1) From where comes the hard evidence for PLC being the stem cell niche in the following sentence?

      for the two following statements:

      This suggests that the PLC may not only provide structural support but also serve as a perivascular stem cell niche specific to the portal region, potentially involved in hematopoiesis and tissue regeneration.

      The PLC serves as a directional scaffold for ductal growth, a specialized stem cell niche, and a potential site of neurovascular coupling.

      We thank the reviewer for this important comment. We agree that the term “stem cell niche” may imply functional evidence for direct stem cell regulation, which was not demonstrated in this study. Our conclusions were based on the spatial enrichment and transcriptional features of CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial populations expressing hematopoiesis-related genes in the portal region.

      To avoid overinterpretation, we have revised the sentence to remove the term “stem cell niche” and instead describe the PLC as being enriched in perivascular endothelial cell populations with hematopoiesis-related gene expression features. The revised text now reads:

      “These results suggest that, beyond structural support, the PLC in the portal region is enriched with perivascular endothelial cell populations exhibiting hematopoiesis-related gene expression features.”

      We have also modified the corresponding statement later in the Discussion. It now reads:

      “The PLC serves as a directional scaffold for ductal growth, displays distinct perivascular endothelial transcriptional features in the portal region, and may represent a potential site of neurovascular coupling.”

      We believe this wording more accurately reflects the descriptive and transcriptomic nature of our data without implying functional niche activity.

      (2) In the following paragraph, I lack references to the previously published evidence of liver innervation guidance mechanisms, such as the mesenchyme-mediated guidance (CD31- population) Gannoun et al., 2023 https://doi.org/10.1242/dev.201642, an important context for your finding.

      Further analysis showed significant upregulation of genes involved in neurodevelopment and axonal guidance in the CD34<sup>+</sup>/Sca-1<sup>+</sup> cluster, along with activation of neuronal signaling pathways. Immunostaining confirmed the presence of TH<sup>+</sup> sympathetic nerve fibers wrapping around the PLC in a "beads-on-a-string" pattern (Fig. 6), consistent with a classic neurovascular unit(Adori et al., 2021). Previous studies have shown that sympathetic nerves enter the liver along collagen fibers of Glisson's capsule and interact with hepatic arteries, portal veins, and bile duct epithelium, supporting the PLC as a scaffold for intrahepatic neurovascular integration.

      We thank the reviewer for highlighting the importance of previously published evidence regarding liver innervation guidance mechanisms. We agree that these studies provide important context for interpreting the neurodevelopmental and axon guidance–related transcriptional signatures observed in our dataset. Accordingly, we have revised the Discussion section to incorporate reference to mesenchyme-mediated axon guidance mechanisms in the portal region during liver development (Gannoun et al., 2023). This addition better situates our findings within the existing literature.

      (3) Several sentences have issues with a lack of space between words.

      We have carefully re-examined the entire manuscript for spacing and formatting inconsistencies and corrected minor typographical issues to ensure uniform formatting throughout the text.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments. A central concern raised is the comparison of performance with existing motion-correction methods. In response, we performed motion correction using several widely used approaches and compared results using the number of particles detected by 2DTM and their associated SNR. To minimize potential bias, we selected parameters to give each method a comparable level of model flexibility so that the results are as directly comparable as possible. Overall, Unbend performs the best. We note that extensive, method-specific parameter optimization could further affect absolute performance, and a comprehensive benchmarking study is therefore beyond the scope of this work

      Public Reviews:

      Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states: "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." I find the method appropriate, logical, and well-explained. Additionally, the work suggests using 2DTM-related measurements to quantify the improvement of the new method compared to the old one in cisTEM, Unblur. I find this part engaging; it is straightforward, accurate, and, of course, the group has a strong command of 2DTM, presenting a thorough study.

      However, everything in the paper (except some correct general references) refers to comparisons with the full-frame approach, Unblur. Still, we have known for more than a decade that local correction approaches perform better than global ones, so I do not find anything truly novel in their proposal of using local methods (the method itself- Unbend- is new, but many others have been described previously). In fact, the use of 2DTM is perhaps a more interesting novelty of the work, and here, a more systematic study comparing different methods with these proposed well-defined metrics would be very valuable. As currently presented, there is no doubt that it is better than an older, well-established approach, and the way to measure "better" is very interesting, but there is no indication of how the situation stands regarding newer methods.

      Regarding practical aspects, it seems that the current implementation of the method is significantly slower than other patch-based approaches. If its results are shown to exceed those of existing local methods, then exploring the use of Unbend, possibly optimizing its code first, could be a valuable task. However, without more recent comparisons, the impact of Unbend remains unclear.

      We thank the reviewer for this important point. We agree that comparing against modern local motion-correction approaches is a valuable task. To address this, we added a new benchmarking section (pp. 17–18, lines 444–492, Fig. 8, Fig. 8—figure supplement 1) that compares Unbend against widely used patch-based local correction methods, including MotionCor2, MotionCor3, Warp, and CryoSPARC. Using the same 2DTM-based metrics described in the manuscript (detections per micrograph and SNR distributions for commonly detected particles), we find that Unbend provides the most stable performance across the tested datasets and, in most cases, yields higher detection counts and higher SNR than the alternative methods.

      Regarding runtime, the current implementation is CPU-based and is therefore slower than some optimized GPU-accelerated packages. We now clarify this limitation in the manuscript (line 498–499). Our primary goal in this study is to improve motion-correction accuracy and quantify its impact using 2DTM-based measures. Importantly, higher-quality motion-corrected micrographs can reduce downstream processing cost (e.g., by increasing particle detection efficiency and reducing ambiguous candidates), so modest additional compute times at the motion-correction stage can be offset later in the workflow. We also note that GPU acceleration and additional code-level optimizations are planned for future releases (line 501-503); however, they are not required to evaluate the methodological contribution and the benchmarking results presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D cubic spline model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone. Several different in situ samples are used for evaluation, covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears to be an elegant way of describing complex motions in cryo-EM samples, and the authors present convincing data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution, and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Thank you for this positive assessment.

      Weaknesses:

      While the improvements with Unbend vs. Unblur appear clear, it is less obvious whether Unbend provides substantial gains over patch motion correction alone (the current norm in the field). It might be helpful for readers if this comparison were investigated for the in situ datasets. Additionally, the authors are open that in cases where full motion correction already does a good job, the extra degrees of freedom in Unbend can perhaps overfit the motions, making the corrections ultimately worse. I wonder if an adaptive approach could be explored, for example, using the readout from full-frame or patch correction to decide whether a movie should proceed to the full Unbend pipeline, or whether correction should stop at the patch estimation stage.

      We thank the reviewer for suggesting an adaptive criterion to decide whether to proceed patch alignment or not. We agree that such an approach could be valuable for efficiency and for avoiding unnecessary model flexibility. However, our results indicate that a simple criterion based on the magnitude of estimated local patch motion is unlikely to be sufficient. For example, in the BS-C-1 cell lysate dataset, (see line 412-417 on page 16), we observe minimal local motion (Figure 4b) with mean patch shifts of only 0.7Å and full-frame alignment already yields comparable detection counts, yet local correction still produces a measurable SNR gain (13.84 ± 0.04 to 14.25 ± 0.04, 3%) and improves SNR for ~70% of the commonly detected targets (Figure 6c). This suggests that residual local distortion can remain even when overall local motion appears small. Establishing a robust, dataset-agnostic stopping rule would therefore require a dedicated, systematic benchmarking study across many samples and acquisition conditions.

      Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam-induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method has been proven to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) One key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements. This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes.

      Thank you for this positive assessment.

      Weaknesses

      (1) While very interesting, it is not clear how the proposed method using 3D splines for estimating local deformations compares with other existing methods that also aim to correct local beam-induced motion by approximating the deformations throughout the frames using other types of approximation, such as polynomials, as done, for example MotionCor2.

      We thank the reviewer for this suggestion. We agree that positioning Unbend relative to existing local motion-correction methods is important. In the revised manuscript, we added a dedicated benchmarking section comparing Unbend with widely used local correction approaches, including MotionCor2, MotionCor3, Warp, and CryoSPARC, using the same 2DTM-based metrics (Fig. 8, Fig. 8—figure supplement 1). This section is included on pp. 17–18, lines 444–492. To make the comparison as fair as possible, we matched nominal model flexibility across methods and otherwise used default parameters to reduce method-specific tuning. This expanded comparison provides a direct baseline against current patch-/spline-based approaches and shows that Unbend performs consistently across the in situ datasets evaluated here, with improvements in detection counts and/or SNR in multiple cases.

      (2) The use of 2DTM is appropriate, and the results of the analysis are enlightening, but one shortcoming is that some relevant technical details are missing. For example, the 2DTM SNR is not defined in the article, and it is not clear how the authors ensured that no false positives were included in the particles counted before and after deformation correction. The Jupyter notebooks where this analysis was performed have not been made publicly available.

      We agree that these technical details improve clarity and reproducibility. We have therefore made three changes.

      (1) Definition of 2DTM SNR. We added an explicit definition of the 2DTM SNR in Section “2DTM provides a one-step verification for motion correction”, pp. 11, lines 277–287). Briefly, at each image location we compute cross-correlation values over the searched orientation space and define the 2DTM SNR as the maximum per location z-score across orientations.

      (2) False-positive control / detection threshold. We clarified how detection thresholds were set to control false positives (pp. 11, lines 285–287). Specifically, we used the standard 2DTM statistical framework in which the threshold  is chosen using the one-false-positive (1-FP) criterion (or equivalently, a specified expected false-positive rate). We applied the same thresholding procedure consistently across all motion-corrected micrographs. This ensures that particle counts before/after correction reflect changes in signal recovery.

      (3) Reproducibility of the analysis. We have made the script used for the benchmarking and figure generation publicly available (pp. 24 line 622-623), and we provide a link in the Data Availability statement (pp. 25 line 650). The repository includes sample .star files and a python package that computes detections per micrograph, commonly detected particles, and SNR comparisons.

      (3) It is also not clear how the proposed deformation correction method is affected by CTF defocus in the different samples (are the defocus values used in the different datasets similar or significantly different?) or if there is any effect at all.

      We thank the reviewer for raising this point. In the revised manuscript, we now report the defocus ranges used for each dataset (Table 1) and clarify that all motion-correction comparisons were performed within each dataset using the same CTF estimation and 2DTM settings (pp. 23 line 615-618). Across the five datasets, four were collected at similar defocus ranges (1.0 µm to 1.5µm), whereas one dataset includes near-focus (0.4 µm) micrographs (Table 1). Because Unbend operates on frame alignment/warping rather than CTF modeling, we do not expect a defocus specific effect beyond indirect influences through image SNR and reliability of cross-correlation-based alignment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The obvious recommendation would be to use their 2DTM approach for a comparison of their new method with other currently used ones

      We agree and added a new comparison section (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #1 Public Review.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 29, typo. 3 ~ 8% > 3 - 8%.

      Corrected.

      (2) Lines 220 and 226. Should this be e-/Angstrom squared for the exposure?

      Corrected to e<sup>-</sup>/Å<sup>2</sup> (Now pp. 9 lines 230, 236).

      (3) Figure 2 c-d. These are good for instinctively seeing the movement, but I found the legend confusing, as a 10 x 10 pixel array is mentioned, yet the schematics show a higher sampling (30 x 30 pixels? in c-e).

      Thank you for pointing this out. The “10×10” annotation refers to the physical scale, whereas the grid represents pixel sampling. We removed the “10×10” label and now show only the pixel grid to avoid confusion. The caption has been updated to state that the grid corresponds to a 30×30 pixel sampling. (Fig. 2c, d; pp. 31, line 766)

      (4) Figure 4. It would be good if the n of movies analyzed was given in the figure legend.

      Thank you for noticing this. We report the number of movies per dataset in the corresponding summary table (Table 1).

      (5) Figure 5. X/Y axes labels missing (assume pixels). Also, suggest changing the strain scale to % to match the main text description of this figure.

      We added X/Y axis labels, changed the strain scale to % (Figure 5), and specified that the strains are per pixel on pp. 14 line 367. Correspondingly, the X/Y labels and strain scale in strain plots in Figure 4—figure supplementary 1 to 5 are also changed.

      (6) Unify labelling of Figure 4 and 6 (i.e., Bacteria vs. M. pneumoniae, etc.).

      Corrected. Sample labels are now consistent across figures. (Figures 4 and 6)

      Reviewer #3 (Recommendations for the authors):

      Some recommendations related to the points mentioned in the 'Weaknesses' section in the public review:

      (1) If feasible, it would be useful to see a comparison with other existing methods that estimate local deformations (e.g., MotionCor2), at least on some of the datasets. For example, does the proposed method lead to better 2DTM SNR in the detected particles compared to other methods, or higher detection numbers? Alternatively, if such a comparison would require too much additional work and the authors have good reasons to believe that the results are evident, it would be helpful to include a discussion about why the proposed method is expected to perform better, both in terms of the general approach and specific implementation details.

      We agree that this comparison is important. (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #3 Public Review (1).

      (2) It would be useful to define the 2DTM SNR in the main text of the paper, as well as to address the point about false positives in the picked particles.

      We added an explicit definition of 2DTM SNR and clarified the detection thresholding/false-positive control used in our analysis (pp. 11, lines 277–287). Addressed above in Response to Reviewer #3 Public Review (2.1 and 2.2).

      (3) Regarding the results shown in Figures 4 and 6: do the authors have any insight about how the CTF defocus affects the deformation estimation and correction across the different sample types?

      We now report the defocus ranges used for each dataset (Table 1). We have addressed this problem in Response to Reviewer #3 Public Review (3).

      (4) Will the Jupyter notebooks used for the 2DTM analysis be made publicly available?

      Yes. We have deposited a python script used for the 2DTM benchmarking and figure generation in a public repository and added the link in Data Availability statement. (pp. 23 line 622, pp. 25 line 650). Addressed above in Response to Reviewer #3 Public Review (2.3).

      (5) I would also appreciate a few words about the implementation details of the 3D spline model (e.g., what libraries have been used, if any, or if the authors have implemented their own code for this).

      The 3D spline model and warping code were implemented by us (no external spline library was used) and the relevant implementation details are described in the “Sample distortion modeling and correction” section (pp. 7–10, lines 174–246). For optimization, we used the L-BFGS implementation provided by the dlib library, which is now explicitly cited (pp. 10, line 264).

      Some comments regarding the presentation of the work:

      (1) I found the mathematical background on splines on pages 7-9 a little distracting from the main ideas of the paper, and I believe it could be moved to the methods section. A short description of this in the main text of the paper would suffice, and it would be useful to state clearly when this is background material and when it is the authors' contribution.

      We appreciate the suggestion. Because Unbend includes an in-house spline implementation (no external spline library) and it is the central part of this work, we retained the spline description to support reproducibility. (pp. 7–10, lines 174–246).

      (2) More generally, I found the whole method very interesting, but understanding exactly what all the steps involved were was a bit cumbersome, as they are spread across different sections of the main text. I think it would be useful to have a dedicated section giving the exact steps taken in the algorithm, possibly pointing to the relevant section in the text for more details about each step. This could be, for example, in the form of an 'Algorithm' box or a flowchart.

      We added an Algorithm box as Figure 2 supplement summarizing the end-to-end workflow and pointing to the relevant sections for details (Figure 2—figure supplement 1 Algorithm, pp. 4, line 96–103, pp. 32 line 799). This is intended to make the sequence of steps easier to follow.

      (3) In Figure 3, panels (b) and (c), the difference between the two micrographs, before and after correction, is not very noticeable, particularly the Thon rings in the spectra. I don't know if this is due to the image quality in the paper or if a better example could be shown. For example, the differences are clear in some of the supplementary figures.

      Thank you for the suggestion. We revised the figure by adding annotations to show the recovered Thon rings. This figure shows a vertex motion and is intended not only to show improvement but also to illustrate complex, spatially varying deformation patterns that motivate the 3D spline model (pp. 12, lines 304–308). The supplementary figures display those with highest motions in each sample type, thus the Thon rings for the motion corrected micrograph in higher frequency space look more obvious. We also refer readers to the supplementary examples where the differences are more pronounced (pp. 12, lines 310–312).

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization).

      Thank you for this insightful comment. We acknowledge that our claim of “rapid cross-modal plasticity” is based on correlational evidence and does not directly address synaptic or circuit-level reorganization, which would require more invasive methods. Our study instead focuses on changes in the representational structure of tactile stimuli when visual input is temporarily removed, highlighting the adaptability of sensory coding to environmental context. We agree that this distinction is important and have revised the manuscript to clarify that the observed changes reflect functional reorganization rather than structural plasticity, as indicated by the enhanced separability of texture representations in S1 during darkness.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might contribute to the observed neural differences. These factors are acknowledged but not directly measured (e.g., via pupillometry or cortical state indicators).

      Thank you for your insightful comment. We agree that arousal and exploratory behavior could influence neural differences and have considered these factors in our study. While gait was controlled, we did not directly measure arousal (e.g., via pupillometry or cortical indicators).

      To partially address this, we reviewed locomotor-speed traces (Supplementary Figure 1), which showed no significant differences between light and dark conditions, suggesting movement speed did not drive the neural differences. We also reversed the order of light and dark conditions, and although the separability of textures was not significantly different, it further supports that motivation did not confound our results.

      However, we acknowledge that arousal may still affect cortical dynamics, especially in the dark condition, where the lack of visual input might alter exploratory behavior. Due to technical limitations, we could not directly measure arousal states, and this is now discussed in the revised manuscript. While we cannot rule out the influence of arousal, the enhanced separability of texture representations suggests that sensory reorganization due to visual deprivation likely played a substantial role.

      (3) Moreover, the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized - only that population-level signals become more discriminable. As such, the term "plasticity" may overstate the conclusions and should be interpreted with caution unless validated by additional causal or longitudinal data.

      Thank you for your important comment. We agree that the term "plasticity" may overstate our conclusions, as our study focuses on population-level signal changes rather than direct evidence of circuit-level reorganization.

      To address this, we have revised the manuscript to clarify that while the observed changes in neural separability suggest functional reorganization of sensory representations, they do not confirm structural plasticity. We have updated the wording throughout the manuscript to emphasize that these findings reflect functional reorganization in response to short-term visual input loss, rather than structural or long-term plasticity.

      We also updated the discussion to highlight the need for future research with more invasive approaches to validate the causal mechanisms behind these rapid changes in neural dynamics.

      (4) The study highlights the forelimb region of S1 and a post-contact temporal window as particularly important for decoding texture, based on occlusion and integrated gradient analyses. However, this finding may be somewhat circular: The LFPs were aligned to forelimb contact, and the floor textures were sensed primarily via the forelimbs, making it unsurprising that forelimb electrodes were most informative. The observed temporal window corresponds directly to the event-aligned epoch, and while it may shift slightly in duration in the dark, this could reflect general differences in sensory gain or arousal, rather than changes in stimulus-specific encoding. Thus, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      Thank you for your insightful comment. We understand your concern that the finding of forelimb electrodes being most informative might seem circular, given that the LFPs were aligned to forelimb contact, and the floor textures were primarily sensed by the forelimbs. This design choice was intentional, as the task focused on texture perception through the forelimb, and the forelimb subregion of S1 is naturally expected to play a dominant role in this process. While this somatotopic specificity may make the results predictable, our aim was to emphasize the changes in temporal dynamics of neural processing under visual deprivation.

      We observed a shift in the temporal window's duration in the dark condition, which we interpret as a change in how texture information is processed without visual input. While this could reflect sensory gain or arousal differences, the lack of significant differences in locomotor speed or other behavioral measures (Supplementary Figure 1) suggests that these changes are more likely due to functional reorganization of sensory processing.

      We have clarified in the discussion that the shift in the temporal window is consistent with previous research on sensory reorganization involving both spatial and temporal cortical adjustments. While we do not claim novel spatial or temporal organization, we emphasize that the shift in temporal dynamics suggests adaptation in encoding strategy for texture perception in the absence of visual input. Future studies measuring arousal states (e.g., pupil diameter or cortical state markers) would help distinguish the contributions of arousal versus sensory reorganization to these dynamics.

      (5) While the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Without a behavioral readout (e.g., discrimination accuracy), claims about perceptual enhancement remain speculative.

      Thank you for raising this important point. We agree that while the neural data suggest enhanced separability of tactile representations in the dark condition, we do not directly assess whether these changes translate into improved tactile perception behaviorally.

      However, the primary aim of our study is not to claim perceptual enhancement, but to demonstrate that neural representations in the somatosensory cortex can rapidly reorganize in response to visual deprivation. To clarify this distinction, we have revised the manuscript to emphasize that the observed neural changes in S1 are consistent with functional reorganization of tactile representations, rather than a direct indication of perceptual improvement.

      Future studies will be crucial to directly test whether the enhanced separability of tactile representations in S1 correlates with improved tactile perception in a behavioral task. We have highlighted this as an avenue for future research to better understand the link between neural changes and perceptual outcomes.

      (6) In addition to point 4, the authors discuss implications for sensory rehabilitation, including Braille training and haptic feedback enhancement. However, the lack of actual chronic or even more acute pathological sensory deprivation, behavioral data, or subsequent intervention in this study limits the ability to draw translational conclusions. It remains unknown whether the more distinct neural representations observed actually translate into better tactile performance, discriminability, or perception. Additionally, extrapolating from rats walking on sandpaper in the dark to human rehabilitative contexts is speculative without a clearer behavioral or mechanistic bridge. The potential is certainly there, but the claim is currently aspirational rather than empirically grounded.

      Thank you for raising this important point. Upon careful consideration, we have decided to remove the discussion of sensory rehabilitation implications from the revised manuscript. We have refocused the manuscript to concentrate solely on the neural findings related to tactile encoding reorganization in response to short-term sensory deprivation, avoiding speculative extrapolation to human rehabilitative contexts. This revised approach ensures that the manuscript emphasizes the empirical findings without overstating the translational potential.

      (7) While the CNN showed good performance, details on generalization robustness and validation (e.g., cross-validation folds, variance across animals) are not deeply discussed. Also, while explainability tools were used, interpretability of CNNs remains limited, and more transparent models (e.g., linear classifiers or dimensionality reduction) could offer complementary insights.

      We appreciate the reviewer’s valuable feedback. In response to the concern about generalization robustness and validation, we have now conducted 5-fold cross-validation to assess the model's performance within animals (Figure 6C). We also have added supplementary information on the average silhouette scores across the different folds and animals (Supplementary Table 1, 2). These details are provided in the methods section and discussed in the results to offer a clearer picture of the model's robustness and consistency across rats.

      Regarding the interpretability of CNNs, we acknowledge that deep learning models can lack transparency. We also attempted classification using more transparent models such as PCA and SVM, but their performance did not exceed chance level (Supplementary Figure 2). This indicates that while these simpler models are more interpretable, they cannot capture the complex representations in the LFPs, making deep learning models like CNNs necessary for extracting these insights.

      Reviewer #2 (Public review):

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      Thank you for your insightful comment. We recognize the importance of clarifying the exact nature of the high-dimensional neural patterns that the CNN exploits for surface roughness decoding. In response, we have performed additional analyses to provide a more detailed explanation of the CNN's decision-making process and the discriminative features it learned:

      Grand-Average LFP Waveforms Analysis: We calculated the grand-average LFP waveforms for each texture × lighting condition (Figure 4A). While visual inspection did not reveal distinct features in the averaged waveforms, we explored the channel-wise correlations between textures under both light and dark conditions (Figure 4B). We found that the correlation between textures was lower in the dark condition, suggesting that LFPs become more distinct between textures when visual input is absent, which aligns with the CNN’s output.

      Time-Frequency Decomposition (Wavelet Analysis): We also performed time-frequency decomposition of the LFPs using wavelet transforms (Figure 4D). No prominent differences emerged across texture × lighting conditions in the spectral domain. However, upon computing differences in wavelet features between light and dark conditions and analyzing the relationship with the CNN's attribution scores (Supplementary Figures 5A-C), we observed a negative correlation in the 50-60 Hz range and a positive correlation in the 80-90 Hz range. This suggests frequency-specific modulation in LFP activity that may contribute to texture representations, providing further support for the CNN’s learned features.

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      Thank you for your thoughtful comment. We appreciate your suggestion to strengthen the statistical rigor of our analysis regarding the cross-modal representation reorganization. In response, we have implemented several additional analyses to more rigorously quantify the separability of neural representations between light and dark conditions:

      (1) Permutation Test for Cluster Separability: We performed a permutation test to assess whether the observed differences in cluster separability between light and dark conditions were statistically significant or could have arisen by chance. The results showed that the silhouette scores for the dark condition consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 4). This permutation test strengthens the validity of our findings, indicating that the enhanced separability in darkness is a systematic reorganization of neural representations, not due to random fluctuations.

      (2) Reporting Cluster Distances: To address concerns about the modest effect size and borderline significance, we have explicitly reported the underlying cluster distances in the form of silhouette scores for each individual animal (Supplementary Table 1, 2). These values reflect the Euclidean distance between clusters within each rat, providing a clearer understanding of the separability observed.

      (3) Additional Statistical Analysis on Silhouette Scores: To further enhance the rigor of our statistical analysis, we recalculated the silhouette scores using 5-fold cross-validation within each animal, ensuring that our results are robust across multiple data splits (Figure 6C).

      By incorporating these additional analyses and reporting detailed cluster distances, we believe we have significantly strengthened the confidence in our claim of cross-modal reorganization.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      Thank you for raising this important point. In response to your concern, we carefully examined whether visually-evoked potentials (VEPs) could be present in the S1 recordings, particularly under the light condition. However, we observed that this experiment did not involve any cue-guided visual stimulation, such as flashing lights or visual cues aligned with the LFP recordings. Without such external visual stimuli, it is unlikely that VEPs would be reliably evoked in the S1. Therefore, we believe the stronger texture separation observed in the dark condition is not due to visual interference, but rather reflects a genuine sensory reorganization in response to the absence of visual input.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      Thank you for your insightful comment regarding behavioral controls. In response, we have added locomotor speed traces aligned with corresponding LFPs (Supplementary Figure 1) to demonstrate that locomotion remained consistent across trials, irrespective of environmental condition (light vs. dark). Additionally, we report locomotor speed variance over 10-minute blocks to confirm no significant motor changes affecting neural recordings. These analyses indicate that LFP differences are unlikely due to locomotor confounds.

      While measuring pupil size could be useful for assessing arousal, the camera resolution in our study was insufficient for reliable measurements. We have noted this limitation in the Discussion and recommend that future studies with high-resolution eye-tracking explore arousal's role in sensory processing in S1.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      Thank you for highlighting the potential confounds due to trial ordering. To address this, we reversed the condition order (dark before light) in a subset of sessions from six rats and reanalyzed the data (Supplementary Figure 3). The results showed not significant, but increase separability in the dark condition, suggesting that the enhanced separability in the dark condition is not due to trial order effects like fatigue or satiation. While order effects may contribute to trial-to-trial variability, the consistent pattern of enhanced separability in the dark further supports the interpretation that visual deprivation directly influences the reorganization of tactile representations in S1.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      Thank you for your insightful comment on the potential bias of forelimb-aligned LFP analyses. We acknowledge that the choice of alignment event can influence the results and appreciate the suggestion to consider hindlimb-aligned data. However, our experimental design specifically focused on forelimb S1. The forelimb region of S1 was oversampled in our array, and as expected, we observed larger responses there, consistent with the known somatotopic organization of S1.

      While hindlimb-aligned data could provide additional insights, it is not directly relevant to the primary question of how forelimb S1 codes tactile information under visual deprivation. We do not believe the forelimb alignment introduces a bias, as it aligns with the sensory task being investigated. However, we recognize the value of exploring alternative alignments and have now included a discussion in the Methods section regarding the rationale for our design choices.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      Thank you for your constructive comment. In response, we have added a more detailed analysis of event-related waveforms, averaged across conditions (light vs. dark, smooth vs. rough textures), and presented them spatially and temporally aligned to forelimb contact (Figure 4A). These waveforms did not show clear, distinct features that could differentiate conditions, which highlights the limitations of traditional amplitude-based metrics in detecting subtle neural activity changes related to visual deprivation.

      We further performed channel-wise correlation analyses (Figure 4B), revealing stronger texture correlations in the light condition, indicating that averaged waveforms do not capture the nuanced differences in neural dynamics. Additionally, time-frequency spectrograms and channel–channel correlation matrices (Figures 4C and 4D) did not show distinct condition differences, reinforcing the limitations of amplitude-based metrics.

      These findings, along with the superior performance of machine learning-based decoding methods (e.g., CNN), support our claim that amplitude-based approaches are insufficient for fully capturing the complexity of the neural data.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      Thank you for pointing out the ambiguity between "attribution score" and "activation amplitude." To address this, we have revised the manuscript to use "attribution score" only.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

      Thank you for highlighting the importance of generalization across animals. While our study focused on within-subject decoding, we acknowledge that this limits conclusions about shared neural representations across individuals. We expect that inter-animal generalization would be challenging, as models trained on data from a single rat may not perform well on data from others due to differences in electrode placement, brain anatomy, and neural representations. We recognize the value of cross-validation strategies and between-animal analyses and will consider them in future work to address this limitation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would strongly recommend that the authors refine their introduction to be more concise. Many concepts and study aims are repeated many times and, therefore, present as highly redundant text. The introduction may be half the length and still contain the important concepts to set up the justification for the study. I would also suggest refining to be less about sensory deprivation (e.g., with blindness) and more in relation to context, as the acute nature of the study allows one to conclude more about the latter than the former.

      Thank you for your feedback on the introduction. We have revised the section to reduce redundancy and present the key concepts more concisely. We also streamlined the study aims and focused more on the context of the acute nature of the study, as you suggested, rather than emphasizing sensory deprivation. This revision better aligns with the main focus of the research and improves clarity. We believe the updated introduction provides a more direct justification for the study.

      (2) I am not sure if Figures 1-3 are meant to be in grey-scale for some reason (perhaps to represent light and dark), but I would encourage the authors to examine if this is necessary, as the use of color generally helps one more easily follow Figures.

      Thank you for this suggestion. Upon review, we agree that the use of color would enhance the clarity and readability of our figures. We have revised the figures including the newly added supplementary figures to incorporate color.

      (3) Figure 5, Figure legend title - check wording.

      Thank you for pointing this out. The title has been adjusted for consistency with the other figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Analyses that would strengthen the main claims (major):

      (a) Identify the features exploited by the CNN.

      (i) Provide grand-average LFP waveforms for each texture × lighting condition (fore- and hind-limb channels shown separately, spatially arranged as in Figure 3C) and try to relate them to the decoding strategy learned by the CNN.

      Thank you for your helpful suggestion. We have calculated the grand-average LFP waveforms for each texture × lighting condition and included them in Figure 4A, with fore- and hind-limb channels shown separately and spatially arranged as in Figure 3C. Upon visual inspection, the mean waveforms did not reveal clear, distinct features. To further investigate, we computed the channel-wise correlation between different textures under both dark and light conditions. By subtracting the correlation coefficients for the dark environment from those in the light, we observed that the correlation between textures was lower in the dark environment (Figure 4B). This suggests that LFPs are more distinct between textures in the dark, supporting the CNN model's output. However, this also indicates that the CNN has captured more complex, nuanced information, as it is able to discriminate between LFPs on a single-trial basis, rather than relying on mean traces.

      To assess how the correlation between average LFP waveforms varied across channels, we also calculated the channel-channel correlation matrix for all 32 channels in each condition. While we found stronger correlations within each S1 subregion, we did not observe clear differences of correlation matrix between light and dark conditions, nor between different textures (Figure 4C).

      (ii) Add channel-wise and time-frequency maps (e.g., wavelet or spectrograms) for each texture × lighting condition and try to relate them to the decoding strategy learned by the CNN.

      Thank you for the valuable suggestion. We calculated wavelet features for each LFP segment and averaged them across trials to assess differences in LFP between light and dark conditions, as well as across textures (Figure 4D). However, no distinct differences were observed in the spectral map. To investigate further, we computed the differences in spectral maps for LFPs in light and dark trials. We then calculated the difference in attribution scores derived from the integrated gradient map (Supplementary Figure 4A). Subsequently, we calculated the correlation coefficients between the differences in integrated gradients and the differences in power across each frequency band in the spectral map (Supplementary Figures 4B and 4C). A negative correlation was found in the 50-60 Hz range, while a positive correlation was observed in the 80-90 Hz range. These findings suggest that frequency-specific patterns of LFP activity in different conditions may be linked to the texture representations captured by the CNN model. We have included a discussion of these findings in [lines 463-468].

      (b) Quantify the "enhanced separability in darkness" more rigorously.

      (i) Report cluster-distances (e.g. Euclidean) for each individual animal.

      We thank the reviewer for this helpful comment. When calculating the silhouette score, we used Euclidean distance as the distance metric. The silhouette score is defined for each data point as the difference between the average distance to points within its assigned cluster and the average distance to points in the nearest other cluster, normalized by the larger of the two values. Thus, the silhouette score inherently reflects the relative cluster distances both within and across conditions for each individual animal. Because we report and statistically analyze silhouette scores (Figure 6C), these values already quantify and compare the Euclidean cluster distances across conditions at the animal level. For clarity, we have now added a definition of the silhouette score in the Methods section of the main text [lines 269-278]. We also included the calculated silhouette scores in Supplementary Table 1.

      (ii) Run a permutation or bootstrap test (shuffling darkness/light labels within animals) to obtain an empirical null distribution for cluster separability in the network embedding space.

      We thank the reviewer for this important suggestion. In response, we implemented a permutation test to assess the robustness of our cluster separability results. Specifically, we shuffled the darkness/light labels within each animal and recalculated silhouette scores across 1000 resamples to generate an empirical null distribution. The observed separability between light and dark conditions consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 3). This confirms that the enhanced cluster separability in darkness was not attributable to random fluctuations in labeling but instead reflected a systematic reorganization of neural representations.

      (c) Control for possible visually-evoked potentials (VEPs).

      (i) Search the LFPs recorded in light for stereotyped VEP components and/or comment on this possible confound (i.e., VEPs in S1?).

      Thank you for raising this point. Although it would be interesting to observe if a VEP is present in the S1 of rats, this experiment did not involve cue-guided visual stimulation. Additionally, there was no environmental visual cue that could serve as an external trigger to align the LFPs for VEP analysis in S1. Furthermore, since even the somatosensory evoked potential was not clearly visible in the S1 LFP without averaging the aligned LFPs, it is unlikely that we would be able to observe VEPs in single trials.

      (d) Address behavioral and arousal confounds.

      (i) Provide example locomotor-speed traces (aligned with corresponding LFPs) and report locomotor-speed variance across the 10-min blocks.

      Thank you for your comment. We had speedometer installed for the recording of the last two rats. We have now provided example speed traces and the speed variance across blocks in Supplementary Figure 1. The traces show that the locomotor-speed was stable in each trial.

      (ii) If available from the camera recordings, include pupil diameter as a proxy for arousal; otherwise, discuss explicitly how arousal changes might affect S1 LFPs.

      Thank you for this suggestion. We strongly agree that measuring pupil diameters should be incorporated into future studies. However, because our camera did not have sufficient resolution to capture pupil diameters, we have addressed this limitation in the discussion section [lines 525-537].

      (e) Address order effects (and motivation/satiety confounds)

      (i) Present at least a subset of sessions in which the dark block precedes the light block; re-analyze the silhouette score/discriminability with block order as a factor.

      Thank you for this helpful suggestion. We conducted additional analyses using sessions from 6 rats in which the dark block preceded the light block (Supplementary Figure 5A). Using the same model architecture, we calculated the silhouette score for each rat (Supplementary Figure 5B). However, when the order was reversed (dark preceding light), this discriminability effect disappeared. Thus, while we observed a trend toward higher scores in the dark condition, no statistically significant differences in texture discriminability were observed.

      If trial order alone accounted for the increase in discriminability, reversing the order would be expected to yield higher silhouette scores in the light condition. Our findings suggest that factors related to order (e.g., thirst or motivation, as you proposed) are not the sole contributors. Furthermore, previous studies in human participants have shown that brief blindfolding can produce lingering increases in tactile sensitivity, indicating a lasting effect of visual deprivation. Thus, the absence of significant differences in texture representation when the dark condition preceded the light condition may reflect such lasting effects. We have included a discussion in [lines 441-452].

      (ii) Discuss explicitly the potential confounding effect of motivational state/thirst.

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we now explicitly address the potential confounding role of motivational state and thirst in shaping our results. Because animals were water-restricted to maintain task engagement, it is possible that increasing thirst or fluctuating motivation over the course of a session could alter arousal or attentional state, thereby influencing neural separability. However, when the trial order was reversed (dark condition preceding light), silhouette scores did not show a significant increase in the second (light) trial. Thus, while we acknowledge that motivational state may contribute to trial-to-trial variability, the systematic increase in separability during darkness cannot be fully explained by thirst or motivational confounds. This addition has been incorporated into the discussion section [lines 441-452].

      (f) Alignment control and the role of forelimb S1.

      (i) Repeat the decoding analysis with LFPs aligned to hind-limb strike; report whether the fore-limb dominance persists.

      Thank you for your thoughtful suggestion. We appreciate the opportunity to clarify. Our study was designed to ask a different question: how the absence of visual input reorganizes tactile encoding for the body part that actually initiates texture contact in our paradigm (the forepaw). Accordingly, all analyses were aligned to forelimb strike and our array intentionally oversampled S1-forelimb relative to S1-hindlimb (18 vs. 14 electrodes; Fig. 1F–G), yielding clear topographic forelimb-locked event-related responses (Fig. 3B–D) and forelimb-channel dominance in the decoding explainability analyses (Fig. 5D–E). Repeating the full decoding locked to hind-limb strike would test a different hypothesis and would be difficult to interpret for three reasons:

      Design/measurement alignment. Our kinematic detection was built to identify forelimb foot strikes. Extending the detector to hindlimb would require new model training/validation and introduces uncertainty in the exact contact timing relative to the LFP segments we analyze.

      Sampling asymmetry. The array and cortical magnification are not balanced across subregions (18 forelimb vs. 14 hindlimb electrodes; Fig. 1G), so a hind-limb–aligned comparison would be confounded by unequal coverage and signal-to-noise across S1 subdivisions rather than reflecting true “dominance.”

      Scope of the claim. We do not claim that the forelimb is globally more informative about texture; we show the intuitive and topographically specific result that “forelimb S1 codes textures touching the forelimb,” and that these representations become more separable in darkness (silhouette increase; Fig. 5C). A hind-limb–locked re-analysis would likely reveal hindlimb contributions when the hindpaw is the alignment event — but that would not change the central conclusion about darkness enhancing tactile representational separability.

      To address the underlying concern about generality without introducing the above confounds, we have clarified these design choices and limitations in the revised Methods [lines 194-197].

      (g) Amplitude-based baseline.

      (i) Show that a simple linear discriminant or logistic-regression model on peak amplitudes (and/or other simple features like trough width/slope) cannot reach the CNN's accuracy. This kind of "baseline" analysis could also be useful to pinpoint the discriminative features learned by the CNN.

      Thank you for your insightful suggestion. We agree that performing a baseline comparison with a simpler model could help highlight the advantage of using a CNN. However, in our dataset, individual LFP traces do not exhibit clear peaks or well-defined features such as peak amplitude, width, or energy, which makes feature extraction using traditional methods like linear discriminants or logistic regression challenging.

      To address this, we performed principal component analysis (PCA) on the raw LFP traces to reduce the dimensionality and applied a support vector machine (SVM) classifier on the reduced features, in line with the approach used for the CNN models (Supplementary Figure 2A). The results of this analysis, demonstrate that the SVM model struggles to effectively discriminate between conditions, further reinforcing the necessity of the CNN model. The CNN’s ability to automatically learn complex features from the raw LFP data appears to be a crucial factor in achieving superior classification performance (Supplementary Figure 2B).

      (h) Cross-validation and inter-animal generalization.

      (i) Consider replacing the single 80/20 split with k-fold cross-validation within animals.

      Thank you for this suggestion. Instead of using an 80/20 split, we performed 5-fold cross-validation on all rats. The silhouette scores were averaged within each animal across the five folds, and Figure 6C was updated accordingly. After performing a paired t-test, we still observed a significant difference in silhouette scores between the light and dark conditions.

      (ii) Comment on inter-animal generalization.

      Thank you for this valuable feedback. Although we did not explicitly test inter-animal generalization, it is unlikely that a model trained on data from one rat would perform equally well when classifying data recorded from another animal. This limitation arises from two main factors. First, despite careful efforts to implant electrodes in the same brain region and cortical layer across experiments, it is impossible to align all 32 electrodes to identical coordinates. Consequently, the recorded LFPs are obtained from slightly different locations, which may reflect distinct neural processing. Second, even within the same species, individual animals differ in brain size and neural circuit organization. Thus, even if electrodes could be placed at identical anatomical locations, inter-individual variability in brain structure would still lead to differences in the recorded signals. Because deep learning models are often sensitive to small perturbations in their input data, we believe that robust inter-animal generalization is unlikely without fine-tuning the model using data from the target animal. This comment has been inserted in the Discussion [lines 494-507].

      (2) Writing, figure and terminology improvements (minor):

      (a) Figure 5F-G axis label. Decide on either "attribution score" or "activation amplitude" and use that term consistently in panels, legend, and text (currently, I believe it could be confused with raw signal amplitude).

      We have unified the terminology to "attribution score" and applied this consistently across the panels, legend, and text.

      (b) Throughout the manuscript, use "population-level activity" or "average population dynamics" when discussing LFPs (I believe it is more correct to reserve "population code" for multiple single-unit datasets).

      We agree with the reviewer’s point and have adapted the term "population dynamics" to describe LFP information consistently throughout the manuscript.

      (c) Lines 219-221, state down-sampling to 2 kHz, whereas line 289 mentions 10 kHz. Reconcile these numbers.

      We apologize for the confusion and thank the reviewer for thoroughly reading the manuscript. Our original sampling rate was 30 kHz, and all analyses were performed on data resampled to 10 kHz. The reference to 2 kHz was an error, and we have corrected it.

      (d) Specify the tail of each statistical test mentioned in the manuscript and any multiple-comparison correction used.

      We have specified the tail of each statistical test and any multiple-comparison corrections used in the "Data Analysis" section of the Methods.

      (e) Line 244: "variables (He et al., 2015)" → "variables (He et al., 2015)".

      We have corrected this formatting issue and revised it to "variables (He et al., 2015)".

      (f) Line 253: "one-dimentional" → "one-dimensional".

      We have corrected the spelling error and revised it to "one-dimensional".

      (3) Data and code sharing:

      (a) Consider depositing data and code for the analysis in public open repositories.

      Thank you for your suggestion. We have set up a public GitHub repository to share the code. Since the full dataset is quite large (~400GB), we have uploaded a smaller example dataset for the analysis.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      Two minor comments

      (1) Fig 4 (hormone treatment): In this experiment, testosterone is given to males, yet in Sup Fig 6 it is argued that Esr1 is more influential in driving transcriptional changes compared to AR. Does DHT treatment have the same outcome as testosterone? Or, does estrogen treatment in males have the same outcome as testosterone?

      We agree that to distinguish AR and Esr1 activation by testosterone and converted estrogen respectively is a limitation in our study. We added discussion in the “limitation of the study” section.

      Although HM-HCR experiments showed the bidirectional control of transcriptional progression during adolescence, it is unclear if the facilitation in male by testosterone supplement is via activation of AR or Esr1 or both because testosterone will likely be converted to estrogen in the brain. Future studies using dihydrotestosterone (DHT) and estrogen to males may address this issue.

      (2) Fig 3i: There appears to be an age-dependent transcriptional change in male Vgat HR-low cells. Can the authors comment on age-dependent (hormone-independent) transcriptional changes in males versus females.

      We agree that it is important to clarify hormone dependent changes and age dependent changes. We added pair-wise DE results in Vgat HR low population in the main text. As consistent with trajectory analysis, the number of age-dependent genes were fewer than hormonally associated genes.

      “Pair-wise DEG analysis consistently showed that larger number of DEGs between P35 and P23 in Vgat+Esr1+ (male: 146 genes; female: 162 genes) than Vgat+ hormone R<sup>Low</sup> (male: 26 genes; female: 1 gene).”

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      We agree that this was ideal if perinatal and pubertal dynamics are analyzed within the same study to distinguish these two processes. We added discussion in the “limitation section”.

      “2. Although we have identified hormone/Esr1 dependent transcriptional trajectories during adolescence, the relations and interplay with genetically determined perinatal event, which is earlier and robust, are unclear. Some sex differences during adolescence might be an extension of perinatally established sex differences while others might be unique adolescent changes.”

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      We agree that while MPOA is defined based on brain atlas consistently across samples, the boundary is somewhat less obvious compared to other nuclei (e.g. hippocampus, VHM etc). To minimize the contaminations from adjacent areas, we have restricted quantitative analysis to mostly Vgat+ Esr1+ population which are densely located within the MPOA but not in immediately adjacent areas, except posterior BNST which is readily distinguishable. We added clarification in the method as well as added technical limitation in the discussion below.

      Method

      “To disambiguate the MPOA and adjacent brain regions, quantitative analysis is restricted to Vgat+ Esr1+ neurons and is devoid of posterior BNST.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      We agree that similar to #2, this is an important matter. For HCR experiment, we only included animal with recombinase RNA (Cre or Flp) expression within MPOA. Although the recombinase expression was sufficient enough to qualitatively determine the hit or miss, the detection was weak and it was challenging to determine the extent of viral spread. Thus, we also used successful Esr1 deletion as an additional inclusion criteria for AAV-Cre-YFP group. We have added inclusion criteria in the method and technical consideration in discussion.

      Method

      “For HCR2, AAV was injected unilaterally so that successful targeting of the MPOA with AAVCre-YFP (detection of recombinase RNA within the MPOA) and the deletion of Esr1 were confirmed for inclusion of samples.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      We agree that contextualizing our study in the scholarship will clarify the novelty and impacts that this study provides to the community. We have updated the introduction adding a review highlighting puberty associated genomic studies in the brain, which are all bulk (brain region level) as well as the very first puberty scRNAseq study in Human testis.

      “Despite the well-established role of these hormones in shaping behavior, the molecular mechanisms underlying their influence on brain development during adolescence are still limited to brain-region level (bulk)[8]in humans and model organisms and adolescent transcriptional dynamics at single cell resolution in the brain remain poorly understood (but see a pioneering study in the human testis[9]).”

      (5) Throughout the manuscript the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simplify their use of non-standard abbreviations.

      We agree that this is helpful for readers to have the reference of abbreviations in handy at single location. We added an “abbreviation” section as a reference for readers.

      Medial preoptic area (MPOA)

      Single-cell RNA sequencing (scRNAseq)

      Estrogen receptor 1 (Esr1)

      GABAergic neurons (Vgat+)

      Glutamatergic neurons (Vglut2+)

      Hybridized chain reaction fluorescent in situ hybridization (HCR-FISH)

      Gonadectomized (GDX)

      Partition-based graph abstraction (PAGA)

      Hormone-associated differentially expressed genes (HA-DEGs)

      Multiplexed error-robust fluorescence in situ hybridization (MERFISH) differential gene expression (DE)

      Differentially expressed genes (DEGs)

      Support vector machine (SVM)

      Manifold Enhancement Latent Dimension (MELD)

      Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE)

      Androgen receptor (AR)

      single-cell regulatory network inference (SCENIC)

      Reviewer #3 (Public review):

      We appreciate reviewer for the constructive comments to improve our manuscript.

      Weaknesses:

      We already know that Esr1 is important within GABAergic but not glutamatergic neurons for mating behavior. However, there is not enough data to support the claim that disrupting Esr1 in glutamatergic MPOA neurons "had no observable effect." The MPOA is involved in many behaviors and physiologies that were not investigated. More assays would be required to report "no observable effect."

      The small number of cells included in the transcriptional studies is a general concern, as noted by the authors. This is a particular concern for conclusions related to the role of adolescence in glutamatergic MPOA neurons. The paper reports 24,627 neurons across all treatment groups, which include 3 time points, 2 sexes, and GDX conditions. It seems likely that not much was detected in the glutamatergic neurons because of insufficient power.

      Esr1 knockout is initiated in adolescence, not restricted to adolescence. Do we know that the effects on mating behavior are due to what is happening in adolescence vs. the function of Esr1 in adults? Are the effects different if Esr1 is knocked out in mature adults? This comparison would be important to demonstrate that adolescence is a critical time window for Esr1 function.

      We agree that 1. the relatively mild effects observed in Glutamatergic neurons may be partially due to the scale of the study, and 2. Esr1 deletion is permanent once induced and it is challenging to distinguish adolescent and adult transcriptional dynamics using existing viral strategies.

      We added discussion in the “limitation” section.

      “4. While we have observed robust transcriptional progression in Vgat<sup>+</sup> Esr1<sup>+</sup> neurons during adolescence, we observed more mild alternations in VgluT2<sup>+</sup> neurons. Although the scale of our study is comparable or exceeds prior scRNAseq studies in MPOA[22,29], future larger studies may have more sensitivity to detect adolescent transcriptional dynamics in VgluT2<sup>+</sup> neurons.”

      “5. Although we demonstrated adolescent transcriptional changes were observed as early as P35, and either hormonal deprivation or Esr1 KO in prior to adolescence prevented the transcriptional progression (arrested transcriptional state even at adult), given the viral incubation time and permanent deletion of Esr1 after viral injection, it is challenging to disambiguate the role of Esr1 during adolescence and adult. Future studies injecting the virus at adult may provide additional insights on the similarity and difference between transcriptional changes during puberty and maintained transcriptional states at adult.”

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a"charismatic" model system.

      Strengths:

      (1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Thank you for the positive feedback!

      Here, we investigate phenotypic plasticity associated with the adoption of social roles in the clown anemonefish, with strategic growth being just one aspect of that plasticity. Strategic growth, also known as social control of growth, is a fascinating form of adaptive phenotypic plasticity, whereby individuals modify their growth and size in response to fine-scale changes in social conditions (Buston & Clutton-Brock, 2022). In cooperative breeding systems with high reproductive skew, particularly fishes and mammals (possibly including humans), individuals have been shown to i) increase growth/size on the acquisition of dominant status (Dengler-Crish & Catania, 2007; Johnston et al., 2021; Thorley et al., 2018; Van Schaik & Van Hooff, 1996; Walker & McCormick, 2009), ii) increase growth/size when paired with size matched reproductive rivals (Huchard et al., 2016; Reed et al., 2019; this study), and iii) decrease growth/size to avoid conflict (Buston, 2003; Heg et al., 2004; Wong et al., 2007). While strategic growth is fascinating and clearly occurring in this study, we show coordinated changes of multiple aspects of the phenotype as fish adopt social roles. Therefore, we deliberately framed the Introduction broadly to avoid biasing the reader toward viewing growth as the sole or main driver.

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      We also expected to see the HPA/stress axis activated in subordinates, which is why we carried out a targeted exploration of genes known to play a role in this axis. We did not find any genes that were significantly differentially expressed. We believe that there could be two explanations for this. First, from a methodological perspective, it could be due to our use of a whole-body RNA-seq, which may have masked this signal. Alternatively, the stress axis might play a more complex role than just acting as a simple on/off switch for reduced growth. Its activation may peak when competition over size is at its highest (during week one) or, conversely, it may peak later and help maintain reduced growth once hierarchies are firmly established (particularly after the dominant individual reaches its maximum size). To understand the role of the stress axis, future studies should observe how its activation varies over time. We acknowledge that the absence of a stress‑axis signal and its potential explanations were not clearly discussed in the original manuscript, in the revised version, we will address this issue.

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      We had a similar thought. Specifically, we were interested in testing the hypothesis that the final size ratio of pairs, which is indicative of the amount of conflict remaining, would predict gene expression. We examined gene expression within pairs to test for coordinated changes and repeated the analysis, accounting for the pair size ratio. In both cases, we found no clear or consistent pattern within pairs. We will consider including these figures in the Supplementary Materials document.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g.,liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      We decided to use whole-body samples for this initial transcriptomic analysis to capture a broad view of gene-expression differences while keeping sequencing costs and sample requirements manageable. We agree with the reviewer that future work should explore specific tissues sampled from individuals at multiple time points to disentangle transcriptomic differences across tissue types.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where wholebodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      We thank the reviewer for this important point, and agree that an arbitrary fold‑change cutoff is inappropriate/unnecessary. It should be noted that this fold-change cut-off was only used in this single figure, and all other analyses used p-values from the entire dataset. We will remove the fold‑change threshold cutoff and correct Supplementary Figure 3, and any corresponding text.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials?Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      We agree that color can be an important social signal, so we included color measurements in our experimental design. However, after careful consideration of the color results, we decided that our experimental timing and husbandry changes introduced multiple confounding factors, preventing us from drawing confident conclusions. Specifically, our fish were ≈1 month old at the transfer from larval to experimental tanks and had already begun to deepen their orange hue, before our experiment. (In the wild, they would settle at two weeks of age, prior to the deepening of the orange hue). Once individuals attain a certain hue, it seems that color development can be halted, but not reversed. The transfer also involved changes in lighting, tank background, and diet, factors known to strongly affect coloration. Our results show a uniform shift in orange hue and saturation across social groups, suggesting that these confounding factors might have dominated changes in hue.

      For transparency, we report the color data in the Supplementary Materials, but we caution against drawing any strong conclusions. In the revised manuscript, we will recommend that future work include a targeted experiment to robustly test for the effect of the adoption of social roles on coloration or the effect of coloration on the adoption of social roles.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

      Thank you for flagging that parts of the Discussion are a bit difficult to follow. In the revised manuscript, we will work to improve readability of the Discussion. We also appreciate the suggestion of including a conceptual schematic. We will consider whether adding such a graphic will add value to this manuscript or future manuscripts.

      Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results,and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      Thank you for flagging that parts of the manuscript could be condensed, we will work on this as we revise the manuscript.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      We understand the concern raised by the reviewer about the overlap among points in the PCA. We have explored PC1-PC3 and found that PC2 and PC3 showed the clearest, statistically significant clustering by social position, while PC1 did not capture any variation due to social position. We have explored whether other factors might be masking differences, such as genetic relatedness, tank effects, total read count per sample, and found that none of these factors explained sample clustering. Regarding the ellipses shown around the points, they were not intended to capture all points, but rather they show the estimated 95% multivariate t-distribution for that given social group. We will make sure this is clearly explained in the figure legend, and Methods section. In addition, in the revised version, we will show PC1 and PC2, and PC1 and PC3, in the Supplements for transparency.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences,do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      Yes, “15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling” refers to pairs of P1 and P2, we will make sure this is clearly stated in the revised Methods. Yes, we have explored gene expression data considering the size difference between pairs, and found that it showed no clear differences in gene expression patterns (see earlier response to Reviewer #1). We will consider including these figures in the Supplementary Materials document, as well as adding a version of Figure 3A that clearly shows information on pairs, as suggested by the reviewer.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolytic genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      We have explored genes across a broad set of metabolic pathways (glycolysis, TCA cycle, lactic fermentation, PDH complex, cholesterol biosynthesis, fatty-acid synthesis, and beta-oxidation) and show all metabolic genes that showed significant differential expression between P1, P2, and S in Figure 6. Overall, very few metabolism-associated genes were significantly differentially expressed, which is why we decided to combine appetite-regulation and metabolism-associated genes into a single figure (Figure 6). In the revised version, we will ensure that Figure 6 clearly shows the gene sets associated with appetite and metabolism.

      We also examined hormonal pathways (glucocorticoid and thyroid signaling), but did not find genes in these pathways that were significantly differentially expressed. Finally, we would like to clarify that our samples consist of two-month-old juvenile individuals that are sexually immature —under ideal conditions, clown anemonefish can mature in one to two years, but they can also remain sexually immature for a decade or more (Buston & García, 2007) — which is why we did not observe distinct molecular signatures of sexual maturation. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that these are clearly stated in the revised version of the manuscript.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      To clarify, GO enrichment analysis does not establish correlations with traits, it only describes which functions or pathways are over-represented in a given gene set. That is why we began by using WGCNA to define gene sets (modules) that are correlated to phenotypes. Our primary rationale for WGCNA was to identify modules of co-expressed genes that show significant statistical correlation with the phenotypes of interest (social role: P1, P2, S; growth; and food intake). Pairwise differential expression analysis (Figure 3B) identified a few hundred significantly differentially expressed genes, but those tests treat genes independently and are not able to help us link coordinated changes of co-expressed genes to phenotypes of interest. Because WGCNA is blind to traits, it first identifies groups of co-expressed genes, which can help resolve gene expression patterns.

      We therefore ran WGCNA on the rlog-transformed dataset to identify modules of co-expressed genes that show significant correlation with phenotypes of interests. For every module that showed such a correlation, we performed GO enrichment and carefully evaluated the resulting GO enrichment trees (see Supplementary Figs. 4–5). The brown module was highlighted in the main text because it was one of the modules with a significant correlation to growth, and its associated GO enrichment showed clear growth-related signals that were not identified in the pairwise differential expression analysis results.

      (6) The authors say that they have identified coordinated changes in behaviors and the"underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative.The authors acknowledge this in 434-435, but it could be emphasized further.

      We appreciate the reviewer raising this point. In the updated version of the manuscript, we will revise wording to convey that food intake, agonistic behavior, size and growth, and gene expression are all changing continuously, in response to each other and in response to social feedback. An underappreciated aspect of this system (and likely many other systems) is that phenotype (including transcriptome) influences the outcome of social interactions, and the outcome of social interactions influences the phenotype (including the transcriptome). Earlier capture of the transcriptome would reveal different levels of gene expression, reflecting the state of the system at that moment in time.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      We agree with the reviewer that solitary individuals are the most intuitive baseline. Indeed, the experimental design included solitary fish because we expected they would serve as a useful control. Without social restraint, we anticipated they would show unrestricted growth, feeding, behavior, and associated gene‑expression patterns, similar to dominants.

      We initially ran analyses using solitaries as the baseline, but after examining the results, which showed subordinate‑like characteristics for the solitary individuals, we concluded that solitary individuals are not an ecologically appropriate control for this context. Removing juveniles from a social context and housing them in isolation may be stressful and can affect physiology and behavior in ways that do not reflect a natural baseline. From a life‑history standpoint, solitary living is not the typical state for A. percula.

      For these reasons, we reanalysed the dataset using the dominant (P1) as the reference to enable more ecologically meaningful comparisons (this choice was somewhat arbitrary, subordinates could also have been used as the reference). Given that gene expression is relative, we interpret results from both the dominant (P1) and subordinate (P2) perspectives in the Discussion to provide a complete view. We will clarify wording throughout the manuscript to make it clear that everything is relative (e.g., revising Line 474).

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

      We respectfully disagree with the idea that a single baseline/reference growth trajectory exists for any individual of this species. Growth of individuals is entirely social context-dependent: neither fast nor slow growth represents an inherent baseline. When two size‑matched juveniles meet and compete to establish dominance, accelerated growth is the expected trajectory. By contrast, juveniles joining an existing hierarchy are expected to exhibit reduced growth, which minimizes conflict and facilitates their social integration. Unlike species that show non socially mediated growth trajectories, clown anemonefish do not have a context‑independent growth rate, rather, individuals constantly readjust their growth according to their immediate social environment.

      Therefore, growth trajectories must be considered from the perspective of all group members, because they emerge from interactions among individuals rather than reflecting an intrinsic baseline. In this study, we were interested in the establishment of dominance hierarchy and how individuals adjust their phenotypes during this process. By experimentally pairing size‑matched rivals, both individuals are initially expected to pursue the dominant trajectory, and thus neither individual represents a default state. Instead, the outcome reflects a social decision, after which both individuals reinforce their emerging social roles through coordinated changes.

      Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Thank you for the positive feedback!

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited.Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues,even within a single individual. Gene expression analysis should be performed separately for each tissue.

      We understand the reviewer’s concern about whole‑body transcriptomes and agree that tissue‑specific sampling would provide greater resolution of the mechanisms linking gene expression to growth, agonistic behaviors, and food intake. For this initial study, however, we deliberately chose whole‑body samples to capture a broad, unbiased view of gene expression differences while keeping sequencing costs and sample requirements manageable. We explicitly acknowledge the resulting interpretational limits in the Discussion (lines 464; 529–533), and suggest in the last paragraph that the patterns reported here should be used to build on in future studies exploring targeted, tissue‑specific hypotheses.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

      We thank the reviewer for raising this point. Sex change and sexual maturation can indeed drive major transcriptional shifts in clown anemonefish, but our experiment did not encompass such a life‑history transition. All individuals in this experiment were juveniles (≈1 month old at the start, ≈2 months old at the end) and were sexually immature at these ages. Clown anemonefish reach sexual maturation around one to two years under ideal conditions, can delay sexual maturation for years under normal conditions (Buston & García, 2007), and sex change in the genus Amphiprion is known to take over ~5 months (Moyer & Nakazono, 1978). Accordingly, individuals in this study were not sexually mature, and sex change was not biologically plausible over the five-week experimental period of our study. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that it is clearly stated that the fish in this study were sexually immature in the revised version.

      References:

      Buston, P. (2003). Forcible eviction and prevention of recruitment in the clown anemonefish. Behavioral Ecology, 14(4), 576–582. https://doi.org/10.1093/beheco/arg036

      Buston, P. M., & García, M. B. (2007). An extraordinary life span estimate for the clown anemonefish Amphiprion percula. Journal of Fish Biology, 70(6), 1710–1719. https://doi.org/10.1111/j.1095-8649.2007.01445.x

      Buston, P., & Clutton-Brock, Tim. (2022). Strategic growth in social vertebrates (WITH REVIEWER COMMENTS). Trends in Ecology & Evolution, 37(8), 694–705. https://doi.org/10.1016/j.tree.2022.03.010

      Dengler-Crish, C. M., & Catania, K. C. (2007). Phenotypic plasticity in female naked mole-rats after removal from reproductive suppression. THE JOURNAL OF EXPERIMENTAL BIOLOGY.

      Heg, D, Bender, N, & Hamilton, I. (2004). Strategic growth decisions in helper cichlids. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(suppl_6). https://doi.org/10.1098/rsbl.2004.0232

      Huchard, E, English, S, Bell, M B. V., Thavarajah, N, & Clutton-Brock, T. (2016). Competitive growth in a cooperative mammal. Nature, 533(7604), 532–534. https://doi.org/10.1038/nature17986

      Johnston, R A., Vullioud, P, Thorley, J, Kirveslahti, H., Shen, L., Mukherjee, S., Karner, C. M., Clutton-Brock, T, & Tung, J (2021). Morphological and genomic shifts in mole-rat ‘queens’ increase fecundity but reduce skeletal integrity. eLife, 10, e65760. https://doi.org/10.7554/eLife.65760

      Moyer, J. T., & Nakazono, A. (1978). Protandrous Hermaphroditism in Six Species of the Anemonefish Genus Amphiprion in Japan (No. 2). The Ichthyological Society of Japan. https://doi.org/10.11369/jji1950.25.101

      Reed, C., Branconi, R., Majoris, J., Johnson, C., & Buston, P. (2019). Competitive growth in a social fish. Biology Letters, 15(2), 20180737. https://doi.org/10.1098/rsbl.2018.0737

      Thorley, J, Katlein, N, Goddard, K, Zöttl, M, & Clutton-Brock, T. (2018). Reproduction triggers adaptive increases in body size in female mole-rats. Proceedings of the Royal Society B: Biological Sciences, 285(1880), 20180897. https://doi.org/10.1098/rspb.2018.0897

      Van Schaik, C P., & Van Hooff, J A. R. A. M. (1996). Toward an understanding of the orangutan’s social system. In Linda F. Marchant, Toshisada Nishida, & William C. McGrew (Eds.), Great Ape Societies (pp. 3–15). Cambridge University Press. https://doi.org/10.1017/CBO9780511752414.003

      Walker, S P. W., & McCormick, M I. (2009). Sexual selection explains sex-specific growth plasticity and positive allometry for sexual size dimorphism in a reef fish. Proceedings of the Royal Society B: Biological Sciences, 276(1671), 3335–3343. https://doi.org/10.1098/rspb.2009.0767

      Wong, M. Y. L., Buston, P. M., Munday, Philip L., & Jones, Geoffrey P. (2007). The threat of punishment enforces peaceful cooperation and stabilizes queues in a coral-reef fish. Proceedings of the Royal Society B: Biological Sciences, 274(1613), 1093–1099. https://doi.org/10.1098/rspb.2006.0284

    1. Author Response:

      eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

      We thank the expert reviewers and eLife editors Drs. Eade and White for complementing our work and deeming it an “important study” of “broad appeal to developmental neuroscience”. We also acknowledge some of the limitations of our work, including the lack of homozygous deletion of the enhancer element. As we detail below, we tried tirelessly to identify human embryonic stem cell (hESC) clones with homozygous deletions but were unable to. As we speculate in the discussion, this failure may represent a biological property of the enhancer element (possibly an essentiality manifested even in hESCs), or a technical limitation related to the large size (2.7 kb) of the genomic element targeted for deletion. We also clarify that every scRNAseq assay included cells from multiple teratomas.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      We thank the reviewer for highlighting the significance of our work in the field of developmental biology.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

      We appreciate the reviewer’s disappointment with lack of data from a homozygous SOX2 enhancer deletion. We too felt disappointed when we started genotyping our hESC clones. In fact, we spent a year screening multiple hESC clones for a homozygous deletion but were unable to find one. We performed several assays to better characterize the heterozygous clones, including Sanger sequencing, whole-genome sequencing (WGS) and fluorescent in situ hybridization (FISH). All assays pointed in the direction of hemizygous deletion. We do not understand the reasons for the absence of homozygous deletion clones. One possibility is that homozygous deletion of the enhancer is selected against in hESCs, thus preventing growth of colonies. Another possibility is the technical challenge of achieving a large deletion (2.7 kb) in hESCs. We also entertained the possibility of the excised enhancer being excised from the genome but retained as extrachromosomal (ec) DNA, thus producing the hemizygous genotype. However, several assays, such as FISH and PCR diagnostics, argued against this possibility.

      The teratoma assay was chosen as an in vivo metric of spontaneous differentiation of hESCs into the three germ layers, because our overarching hypothesis was that perturbing the enhancer element and 3D chromatin loop regulating SOX2 transcription would impair specification of neuroectodermal precursors. We believe that teratomas offer an opportunity to allow pluripotent cells to declare any predilections toward germ layers in unbiased fashion. Importantly, we did not rely solely on teratomas to assess effects of our genomic perturbations on specification of neuroectoderm, but also pursued cerebral organoids as an orthogonal approach focused on the tissue of interest, the central nervous system.

      Our work does not only describe an important mechanism for regulation of SOX2 transcription in the transition from pluripotency to neuroectodermal specification, but also provides mechanistic insight into the question of whether the developmentally co-regulated activation of the enhancer and formation of the 3D chromatin loop are dependent on each other. Our findings indicate that the two processes occur independently of each other, as evidenced by the fact that the enhancer is uncoupled from chromatin folding, as occurs when the adjacent CTCF motif is deleted. This finding raises the possibility that enhancer activation occurs through yet to be determined transcriptional events, and that establishment of the local 3D chromatin architecture helps fine-tune its influences in the Topologically Associating Domain (TAD) of interest.

      We are further pursuing mechanisms that regulate activation of the enhancer within neuroectodermal lineages and may explain its actions on genomic elements other than the SOX2 locus within the relevant TAD. We are also investigating reasons explaining why hemizygous enhancer deletion produces stronger phenotypes than deletion of the CTCF motif that helps stabilize the 3D chromatin loop.

      Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      We thank the reviewer for appreciating the importance of this regulatory mechanism in the establishment and maintenance of SOX2 expression in the human neural lineage.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

      We appreciate the recommendation of the reviewer to better acknowledge prior work in mouse neural development. We will ensure full acknowledgment of these studies in the revised manuscript.

      We also appreciate the suggestion for biological replicates in our scRNA-seq assays. We clarify that each scRNA-seq arose from combining multiple teratomas from each experimental group, thus ensuring that findings reflect reproducible biology rather than isolated findings from single teratomas. This clarification will be emphasized in the revised manuscript.

      Finally, we absolutely agree with the reviewer that more CRISPR-deleted clones would have strengthened the study. Unfortunately, we realized that characterization of each clone takes multiple years and addition of more clones would have made the study too lengthy.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modelling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modelling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca<sup>2+</sup>-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca<sup>2+</sup> signals does not exclude more localized or subtle Ca<sup>2+</sup>-dependent mechanisms, and conclusions regarding Ca<sup>2+</sup> independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Concerning Ca<sup>2+</sup> signals, the Reviewer is right. While we found no change in Ca<sup>2+</sup> signalling apart from a slow Ca<sup>2+</sup> accumulation during long trains at 1 Hz, the possibility of an undetected change cannot be excluded. We have added a word of caution in this direction on p. 11. Concerning the 1.5 mM Ca<sup>2+</sup> experiments, the Reviewer presumably alludes to the first recovery train (yellow) point in Supplementary Fig. 2C. This is also the last point (s11) of the slow train at 0.5 Hz because no delay at all was interposed between the slow train and the recovery train. We have now included one more experiment (with a present total number n = 6), and we have corrected Fig. S2C accordingly. In the new version the depression measured for s4-s10 vs s1 during the 0.5 Hz trains is 0.69 +/- 0.05 (p = 0.00058, paired one-tail t-test). The ratio of the s1 value of the recovery train compared to control s1 is 0.83 +/- 0.08 (p = 0.028, paired one-tail t-test).

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modelling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, overgeneralizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      As suggested, we clarify the distinction in the revised version between experimental data and modelling, and we refrain from making definitive statements on underlying cellular mechanisms.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      We thank the Reviewer for drawing our attention to this important point. Below 10 ms, rate constants are largely determined by the large amplitude, fast decaying Ca<sup>2+</sup> signal occurring near voltage-dependent Ca<sup>2+</sup> channels (‘Ca<sup>2+</sup> nanodomain’). After 10 ms, the rate constants depend on the low amplitude, slowly decaying Ca<sup>2+</sup> signals averaged over the entire varicosity (‘volume-averaged Ca<sup>2+</sup>’). We explain this better in the revised version (Materials and Methods, p. 21).

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca<sup>2+</sup>), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid overinterpreting these data.

      This has been discussed above (‘weaknesses’).

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

      We have attended this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

      The Reviewer makes a very important point that had escaped our attention. If the responses were declining over the course of an experiment, near the end of the recordings, a high proportion of failures would be associated with a weak response to the second AP. This could distort the relation between initial failures and amount of LFD, perhaps to the point of indicating LFD after failures when there were none. As suggested by the Reviewer, we tested this possibility by examining the stability of the synaptic responses during experiments. We found a mean s<sub>1</sub> value of 0.87 ± 0.13 for the first half of the experiments used in Fig. 5, and of 1.10 ± 0.17 for the second half (p > 0.05, n = 10). This analysis shows that there was no rundown during these experiments. We show in Author response image 1 a plot of s1 as a function of the number of experiments. These plots do not suggest any artefactual correlation between failures, mean s1, and rundown.

      Author response image 1.

      Plot of s1 as a function of train number for the experiments of Fig. 5. In response to a request of Reviewer 2, this figure illustrates the evolution of s1 values as a function of train number for the experiments used to produce Figure 5. In each experiment, about 20 s1 values were obtained at two ISIs (either 10 ms and 500 ms, or 800 ms and 1600 ms). The figure shows two examples of s1 values as a function of train number (these values fluctuate widely between 0 and 3), and the average across cells and ISI values. There is no indication of a rundown of S1 values as a function of train number

      Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca<sup>2+</sup>). When experiments were repeated in physiological Ca<sup>2+</sup>, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      We respectfully disagree with these sweeping criticisms, as described in more detail below.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>. In a small number of experiments performed in more physiological Ca<sup>2+</sup> (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      - Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca<sup>2+</sup>.

      - Doussau et al., (2010) recorded from Aplysia synapses in 3X Ca compared to seawater.

      - Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The Reviewer suggests that LFD may only occur under non-physiological conditions, if the release probability has been increased by artificially elevating the extracellular Ca<sup>2+</sup>. The implication is that LFD is at best a curiosity with little or no significance for brain signalling. We disagree with this point of view for several reasons.

      Concerning the statement ‘In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal’: This is the purpose of the analysis shown in Fig. 5.

      The statement ‘the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>’ is inaccurate. Fig. S2C shows a clear LFD in 1.5 mM Ca<sup>2+</sup>, as acknowledged by Reviewer 1 (‘low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>’). However, we failed to provide a p-value for the depression in the initial version of the paper (p = 0.004, n = 5, with this data set; paired t-test, one-tailed). In the revised version, we document the 1.5 mM results more extensively, including the incorporation of the results of an additional experiment, and an explicit statistical analysis of the data (p = 0.00058, n = 6; paired t-test, one-tailed).

      Concerning the statement ‘there is no depression after a single stimulus’: We find that the onset kinetics of LFD is slower in 1.5 Ca<sup>2+</sup> than in 3 Ca<sup>2+</sup> (respectively 1.8 ISI and 0.51 ISI, Fig. 2C and Fig. S2C). This explains that the PPR is not significantly <1 in 1.5 Ca<sup>2+</sup> without implying any weakening of extent of LFD at steady state.

      As explained in the manuscript (p. 5), in a previous work, we developed a method to ascribe changes in SV pools, within the RS/DS model, with specific modifications of s1, s2 and s5-s8 during test 100 Hz trains (Tran et al., 2022). This method was developed in 3 mM Ca<sup>2+</sup> conditions, and for this reason, we performed most experiments for the present work in 3 mM Ca<sup>2+</sup>.

      Chiu and Carter (2024) demonstrated LFD in neocortical synapses; they performed their study in 1.2 mM Ca<sup>2+</sup>, not in elevated Ca<sup>2+</sup>.

      Rudolph et al. (2011) showed low frequency depression not only in elevated external Ca<sup>2+</sup>, but also in 0.5 mM Ca<sup>2+</sup>. While Rudolph et al. (2011) did not make an explicit link between their observations and LFD, there is no reason to doubt that these observations are an example of LFD. They showed a biphasic depression when switching the stimulation frequency from 0.05 Hz to 2 Hz. In one of the founding papers of LFD, Doussau et al. (2010) describe a biphasic depression when switching the stimulation frequency from 0.025 Hz to 1 Hz; the Fig. 1 of the two papers (Rudolph 2011 and Doussau 2010) are strikingly similar.

      Lin et al. (2022) would probably not agree with the statement that the depression at the calyx is ‘largely monotonic’, as they stress the finding of quasi-constant depression between 5 and 50 Hz.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      In fact, we clearly listed the difference in external Ca<sup>2+</sup> as a likely source of the discrepancy by saying ‘This discrepancy presumably stems from differences in experimental conditions (room temperature, stimulation of multiple presynaptic PFs and 2 mM external Ca<sup>2+</sup> concentration in the previous work, vs. near-physiological temperature, single presynaptic stimulation and 3 mM external Ca<sup>2+</sup> here)’.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      See our comments above: the revised version includes the requested statistics. On p. 6 of the manuscript, we do provide an explanation for the apparent lack of LFD at 1.5 Ca<sup>2+</sup> and 2 Hz, namely a superimposition of LFD with facilitation. At 1.5 Ca<sup>2+</sup> and 0.5 Hz, our LFD numbers are not weaker than at 3 mM Ca<sup>2+</sup> and 0.5 Hz of 1 Hz.

      Altogether, it is correct that many LFD experiments have been carried out in high release probability synapses, and/or under conditions of elevated Ca<sup>2+</sup>. However, the reasons underlying these choices are diverse (in our case, to build on the previous SV pool analysis developed in Tran et al. 2022 in 3 Ca<sup>2+</sup> conditions) and do not imply a limitation to the phenomenon. LFD is present in physiological conditions for low-to-moderate release probability synapses (as shown in our work), and altogether, there is no reason to dismiss LFD as nonphysiological.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewerthanexpected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The Watanabe lab showed an SV deficit at docking sites at times ranging from about 100 ms to several seconds (Kusick et al., 2020, their Fig. 5E). This corresponds to the ISI values where we see paired-pulse depression. In their Summary, Kusick et al. raise the possibility of SV fusion as an alternative to undocking at the 100 ms time point. But, the same issue had previously been considered in Miki et al., 2018 with other techniques (their Fig. 2d), where it was shown that the SV deficit seen in paired-pulse experiments could not be explained by fusion. This leaves undocking as the most likely explanation, at least in our preparation. We have added a new paragraph on p. 14 to clarify this point.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and it was not shown to be Ca<sup>2+</sup> sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      This is not an accurate description of the Kusick results or of our results. In the Kusick paper, the SV deficit seen at <5 ms after stimulation is attributed to exocytosis, not to undocking. Clearly, it is Ca<sup>2+</sup> dependent. Our manuscript describes potential calcium-dependent undocking not during the time 10 ms- 150 ms, during which our undocking rate is assumed to be calcium-independent, but starting at 150 ms, and lasting a few hundred ms thereafter.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      It is important to remember that our sequential two-step model was not based on EM data, but on a series of functional data including variance-mean analysis of summed SV release numbers; covariance analysis among subsequent SV release numbers; analysis of release latencies as a function of stimulus number during an AP train; analysis of SV release numbers under conditions of very high release probability. We note that the phenomenon of Ca<sup>2+</sup>-dependent docking that we proposed based on these observations has been consistent with flash-and-freeze or zap-and-freeze results from several laboratories. Concerning potential filamentous connections between SVs and the AZ plasma membrane at a distance of several 10s of nm, this has been seen not only in frog or mice neuromuscular junctions, but also at brain synapses (ex: Siksou et al., Journal of Neuroscience 2007; Cole et al., Journal of Neuroscience 2016; Fernandez-Busnadiego, Journal of Cell Biology 2010; 2013).

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      Please see our response above to a similar point by Reviewer 1.

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

      We could not find statements in the Chiu and Carter paper or in the Doussau et al. paper explaining LFD ‘by high-release probability, paired with an activity-dependent increase in either docking or release probability’. As far as we can see, Chiu and Carter do not propose any specific mechanism for LFD, beyond saying that depression and facilitation must be separate. Doussau et al. (their Fig. 6) clearly frame their interpretation in a sequential two-step model. As in the preceding Miki et al. paper (which they cite extensively), they assume a rapid (a few ms), Ca-dependent transition between their ‘reluctant pool’ and their ‘fully-releasable pool’, respectively homologous to RS and DS. Thus, the Doussau et al. interpretation is close to that presented in our present work, even though significant differences exist. An important difference is that Doussau et al. did not use simple synapses, so that they did not have access to key synaptic parameters such as the number of docking sites or the release probability per docking site. Consequently, the model in Doussau et al. does not have the same level of detail as ours. The revised version explains better the differences and similarity between the models of Doussau et al. and that exposed in our work (new paragraph on p. 14).

    1. Author Response:

      Summary of Planned Revisions:

      We will clarify the qPCR methodology and interpretation to address potential misunderstandings.

      We will assess hearing in the generated HA-tagged mouse lines and, where appropriate, include a properly powered ABR analysis in the revised manuscript.

      We will address concerns regarding the z-stack in Figure 1f.

      We will include additional quantification for Figure 7B to strengthen the analysis.

      We will revise the relevant statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      While we appreciate the suggestion to examine TMEM30B localization on the ATP8B1 KO background, this is not feasible within a reasonable timeframe; we will clarify this limitation in the manuscript.

      We will incorporate relevant prior work (e.g., George and Ricci, 2026) demonstrating minimal Annexin V labeling prior to P6 and lack of PS externalization in TMC1/2 double knockout models.

      We will clarify that hearing thresholds for TMEM30B-HA and ATP8B1-HA lines will be addressed in this study, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      We will soften statements regarding HA-tag insertion and clarify that, to our knowledge, localization and function are not disrupted, while acknowledging this as a potential limitation.

      We will revise the Methods section to clarify differences in fluorescence measurements across experiments.

      In addition to the experiments in response to reviewer’s suggestions, we will add the following data that we have generated while the paper was in review:

      Distortion product otoacoustic emission (DPOAEs) of the Atp8b1 KO and Tmem30b KO mice. Consistent with OHC function, their DPOAEs thresholds were elevated.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure1D.

      The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      We thank the reviewer for this comment. qPCR data were normalized to GAPDH as the reference (housekeeping) gene. We will clarify this in the Methods section to ensure transparency and reproducibility.

      (2) Figure 1F.

      The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      We appreciate this important point. The image shown represents a single z-slice from a larger stack, and the hair cell body lies outside the plane of this section. To clarify this, we will revise the figure presentation. Specifically, we can provide the full z-stack (already available via OSF) and/or replace the image with a resliced whole-mount view to better visualize the full cellular context.

      In terms of the possibility that the lack of staining in the hair cell’s plasma membrane might be due to insufficient antibody penetrance, we routinely perform Prestin (located in OHC plasma membrane) staining after saponin-mediated permeabilization and have never experienced antibody accessibility issues. Nevertheless, we will perform co-labeling for Prestin and include in the new submission.

      (3) Figure 7B.

      Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      We thank the reviewer for this suggestion. To better capture variability, we will include an additional quantification measuring the fraction of hair cell bundles with detectable ATP8B1-HA and TMEM30B-HA signal per field of view. This analysis will complement the existing intensity-based quantification.

      (4) Lines 346-349

      The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      We agree with the reviewer and will revise this statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      We appreciate this insightful suggestion. However, performing this experiment would require generating a compound mouse line (crossing TMEM30B-HA into the ATP8B1 knockout background), which is not feasible within the revision timeframe. Additionally, the lack of a robust commercial antibody for TMEM30B further complicates this approach. We will note this as a future direction in the revised manuscript.

      (6) Lines 359-374.

      The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      We thank the reviewer for this suggestion and will incorporate relevant prior work, including George and Ricci (2026), which demonstrates minimal Annexin V labeling prior to P6, and further supports our interpretation.

      (7) Lines 392-399.

      The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

      We agree and will expand the discussion to include that TMC1/2 double knockout hair cells do not exhibit phosphatidylserine externalization, supporting the idea that flippase activity becomes critical in the context of scrambling.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      We thank the reviewer for raising this important point. In this study, we will focus on TMEM30B-HA and ATP8B1-HA mouse lines, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      Both TMEM30B-HA and ATP8B1-HA mice are viable and exhibit normal breeding and aging. Preliminary (pilot) ABR measurements indicate wild-type–like hearing thresholds. We agree that this is important and will attempt to raise sufficient mouse numbers (in the time given) for a properly powered ABR analysis in the revised manuscript.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      We appreciate this concern. To our knowledge, the HA tag does not appear to disrupt localization or function of the tagged proteins. However, we agree that this cannot be fully excluded. We will therefore soften our conclusions about IHC flippases and clarify that additional flippases (ATP8A1, ATP8A2, ATP11A) are under investigation and will be described in a separate study.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      We thank the reviewer for this observation. We interpret the elevated signal at P0 as reflecting transcription preceding detectable protein expression. While expression in other cochlear cell types is possible, we have not observed detectable ATP8B1 localization outside hair cells using the HA-tagged model. We will clarify this point in the manuscript.

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

      We appreciate the need for clarification. Identical acquisition parameters were maintained within each experiment used for direct comparison (e.g., within a given panel). However, different panels (e.g., Figures 6B vs. 6D) were acquired on different days using different imaging settings.

      We will revise the Methods section to explicitly state this and clarify that comparisons are intended only within panels, not across experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3 (Public review):

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI:

      https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      See public review for main points. To summarize, I find the conceptual framework of the paper very valuable and an important advance. However, in this age of data, I would have expected that the authors would make an effort to build more realistic models that could relate directly to neural data (including connectome and activity) and muscular dynamics at the segmental level.

      This point is addressed in detail in our public review response. In brief, we agree that a segmental neuromechanical model informed by connectome data would provide richer mechanistic insight. However, such an approach would greatly increase complexity and reduce accessibility. Our aim here is to present a coarse-grained, kinematic-level framework that is modular, extensible, and designed to accommodate models at different levels of abstraction. Importantly, extensions that incorporate realistic neuromechanics or connectome-derived circuits can be readily integrated, provided they conform to the modular principles of the proposed behavioral architecture.

      The authors do not cite figures in order or appearance, which makes it hard to read.

      This has been corrected. Figures are now cited in the correct order throughout the revised manuscript.

      I would explain the model in more detail in the main text. Currently, the model is introduced through Figure 1 in an abstract way. It is really hard to make the connection between this figure to the nuts-and-bolts of neuromechanics. And, I believe, for this paper, the details of the modeling matter and are not just technical points to be hidden in the appendix. The video (video 1) is not helpful.

      We have restructured the Model section to provide more detail directly in the main text, moving explanations that were previously confined to the Appendix. This includes explicit description of the locomotory oscillator model, the intermittency module, and their empirical calibration. At the same time, we retained mathematical and implementation details in Materials & Methods to keep the reading flow accessible. Additionally, we expanded the caption of Video 1 and clarified in the text what it illustrates, making the video more informative.

      Modeling choices lead to further weaknesses. While the model can replicate observed locomotory patterns, it does not fully explain the underlying neurobiological mechanisms that govern behavioral intermittency. For example, the crawl-bend interference mechanism, while capturing observed phase-dependent attenuation of turning, is implemented in a simplified, statistical manner rather than being derived from detailed neuromuscular dynamics. The intermittent locomotion model, which generates alternating runs and pauses, relies on log-normal distributed stridechains but does not explicitly model neural mechanisms responsible for switching between movement states.

      We agree with this point. A fully mechanistic implementation of crawl-bend interference would require a detailed segmental neuromechanical model, which we deliberately refrained from integrating in order to keep the current study tractable and focused on a coarse-grained, kinematic-level description. Likewise, the intermittency module is currently based on data-fitted distributions of stridechains and pause durations, without explicit modeling of the neural mechanisms responsible for switching between these states. To our knowledge, these mechanisms remain unresolved, though alternative approaches have been suggested, for example, an artificial neural network model of intermittency (Sakagiannis et al., 2020). To ensure this limitation is transparent to the reader, we now explicitly state it in a newly added “Limitations of the study” subsection in the Discussion.

      We also highlight that the behavioral architecture is designed to be extensible, so that future work may incorporate such mechanistic models when available, while preserving the modular framework.

      I am curious about why the authors chose to model the mushroom body with much more realism than other modules.

      We clarified that this choice was not due to a bias in modeling depth, but to demonstrate the modularity and flexibility of the architecture. The mushroom body (MB) model we integrated was developed in our previous work as a biologically realistic spiking neural network. By incorporating it into the current framework, we show that models of very different abstraction levels – from simple statistical oscillators to detailed spiking networks – can coexist and interact under the same architecture. This rationale is now explicitly stated in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      The manuscript from Sakagiannis et al. proposes a novel model for locomotion and foraging in Drosophila. Their ambition is to make a unified model that will incorporate distinct layers of complexity to describe and predict the locomotor behaviour of a larva, during exploration, chemotaxis and even learning. The paper fails in doing so, starting with a rather interesting exploratory model and becoming less and less convincing as it progresses, with thinner (chemotaxis) and thinner (learning) experimental and theoretical support. The model for chemotaxis is extremely simplified compared to the work of other laboratories. The associative learning paradigm is taken from another paper from the same research group and is not sufficiently explained. In its current form, the paper is of very limited theoretical and practical value. The analysis is insufficient to judge the overall quality and scalability of the model. It is hard to know if the model could be adopted by others in the larval community more widely in other animals. Would it be flexible and robust enough to be used to model other behavioural conditions?

      We appreciate this critical perspective. Our aim is not to present a final, fully parameterized model of all larval behaviors, but to introduce a flexible, modular behavioral architecture that integrates models at different levels of abstraction and can be expanded by the community. To support adoption, we have revised the manuscript to highlight the availability of the framework as a Python package (larvaworld), supplemented with documentation, tutorials, and code examples. This makes it easier for other researchers to reuse, extend, and test the architecture under additional behavioral conditions. We also explicitly refer to modeling studies that have adopted the proposed framework and the locomotory model itself.

      Below, we address the reviewer’s points layer by layer.

      (1) Exploratory behaviour. The strongest part of the paper. The authors propose a new method to analyse locomotion. They take into consideration the instantaneous linear and angular velocity. They assume the existence of two oscillators, which is really interesting. They incorporate the distribution of pauses duration and number of the strides. The incorporation of the strides is very exciting. They do not include handedness with has already been studied and incorporated in a mode for exploration they seem to have missed (Wosniack et al 2022). Figure 4 shows the dispersion. At first glance, it is very obvious that the model larvae do not behave like the animal. The distance they move from the centre is wider (Figure 4A). What is measured in dispersion (Figure 4B)? Just the distance travelled during 40s? A better measure of the similarities or differences between the model and real larvae would be interesting, such as analysing the Mean Square Displacement. Would the model be good if compared to the long-term exploratory behaviour from Sims et al. 2020, that the author previously used?

      The authors should convince the readers that their model is better, or at least as good than the ones already available.

      We thank the reviewer for these constructive suggestions. In the revised manuscript we now reference and discuss handedness, citing Wosniack et al. (2022, eLife), and highlight its potential role as an additional axis of individual variability. We also clarified the distance metrics used in Figure 4: dispersal denotes the Euclidean distance from the origin at the end of the trajectory, while pathlength denotes the cumulative distance travelled. Since larvae typically encounter the arena boundary within the first 40 seconds of exploration, dispersal is shown only over this interval.

      With respect to the reviewer’s suggestion of using mean-squared displacement (MSD), we now explicitly describe the relation between dispersal and MSD. Dispersal is an individual-level displacement measure from which population-level metrics such as MSD can be directly derived.

      Regarding long-term exploration, we agree that extended trajectories—as reported by Sims et al. (2020) over timescales of up to one hour—constitute a valuable complementary regime. Our experimental dataset is limited to 3-minute recordings in a bounded Petri dish, which constrains the accessible timescales of dispersal analysis. We now explicitly note in the Results that comparison to long-horizon datasets such as Sims et al. (2020) represents an important future direction that will require larger or unbounded arenas.

      Together, these revisions strengthen the presentation of the exploration results and clarify how our model relates to established statistical measures of larval foraging behaviour.

      (2) Chemotaxis. The chemotaxis model is so briefly explained in the result section that it is hard to understand. A modulation of the frequency and amplitude of lateral oscillator as a function of the concentration? The authors cannot differentiate between weathervaning and turning in this model (at least I can't understand how). What happened with the distribution of pauses and the directions of turns in Figure 5? The authors do not use real behavioural data to contract their model. How do we know that the parameters they have used reflect the larval behaviour? For example: what is the success rate for larvae to reach the area of high concentration? How close do they get? What is the length of the tracks from start to a target area of high concentration? Where are the calibration data for chemotaxis? This information is critical to understand the model, it needs to be shown in the result section. The authors mention an 8.9uM peak concentration. Of what?

      The model is oversimplified in comparison with Davies et al. 2015 and it is not clear at all how it reflects the real chemotaxis, which is a rather complex behaviour.

      We thank the reviewer for these detailed comments. In the revised manuscript we substantially expanded the description of the chemotaxis model. We now provide an explicit mathematical formulation of how odor concentration modulates the lateral oscillator through the quantity A<sub>0</sub>, which perturbs both the frequency and amplitude of bending according to the mechanism proposed by Wystrach et al. (2016). We additionally clarify that the motor layer - including the intermittency module and all parameters governing crawling, pausing, and turning - remains fully identical to the configuration calibrated on the exploration dataset; no refitting was performed for the chemotaxis condition.

      To address the reviewer’s question regarding the distinction between weathervaning and head casting, we now explain that both behaviours emerge naturally from the same coupled-oscillator structure via stride-phase–dependent crawl–bend interference. High-amplitude headcasts occur during pauses when crawl-induced attenuation is lifted, whereas low-amplitude weathervaning arises during runs when the interference is active.

      This unified mechanism eliminates the need for separate modules.

      The chemotaxis experiments were implemented to qualitatively replicate the behavioural patterns described in Gómez-Marín et al. (2011, Fig. 1A–1F), and we now include explicit figure references in the captions. Because the present implementation is a proof of concept rather than a quantitatively calibrated chemotaxis model, we do not report success rates, approach distances, or track-length statistics, as these depend strongly on odorscape geometry and calibration against quantitative single-animal datasets that were not available for the current work. This clarification has been added to the text and is stated explicitly again in the Limitations section.

      Finally, we now specify that the reported odor concentrations (e.g. 8.9,µM) follow the values used in Gómez-Marín et al. (2011), and we added the precise Gaussian function used to generate the odorscape in the Materials & Methods. Together, these revisions provide a clear and transparent account of the chemotaxis model and its scope.

      (3) Associative learning paradigm. I assume that the authors intended to incorporate a bias in chemotaxis behaviour towards a particular odorant (CS) that would have been associated with a reward food (US). However the model works slightly differently, it is represented by an aversive and an appetitive gradient.

      Theoretically, this is already an assumption (unless there is evidence for it, that should be referenced). It would be more conservative to have one neutral side and one appetitive (attractive) side. Second, the use of a mushroom body model, (even though it has already been published) to decide on the valence adds a layer of complexity that seems unnecessary. The learning process is different from the output process. Finally, the model intends to show us a "realist simulation of Drosophila locomotion" and we do not know how the larvae reach the right side during the test. It would be useful to have some comparison of the larval and model behaviour towards the preferred side.

      In this last section, the objective of the research unweaves and falls short of its ambition.

      We thank the reviewer for these helpful comments. In the revised manuscript we clarified that our implementation follows the standard larval conditioning protocol in which a rewarded odor (CS+) is tested against a neutral odor, not against an aversive one. The previously contradictory phrasing has been corrected, and the text now consistently reflects the established experimental procedure.

      We further explain that the mushroom body (MB) model is included not in order to increase biological complexity in this section, but to demonstrate the flexibility of the proposed behavioral architecture: detailed circuit models and more abstract motor modules can coexist under the same framework. The MB model implements associative plasticity independently of any behavioral simulation, and its output - a scalar odor valence - is transformed linearly into an odor-gain parameter that modulates turning during the test phase. This separation between learning and behavioral output mirrors the logic of the biological system while keeping the overall architecture modular.

      Regarding the reviewer’s request for insight into “how larvae reach the right side,” we note that standard group assays used in larval olfactory learning provide only population-level preference indices rather than detailed individual trajectories. Our comparison to empirical data therefore relies on these established preference indices, which the model successfully reproduces across training trials, including the characteristic saturation reported in Jürgensen et al. (2024). We now state explicitly that although the behavioral simulation does generate full trajectories for each virtual larva, the lack of corresponding experimental single-animal tracks precludes a direct trajectory-level comparison. This clarification has been added to the revised text.

      Together, we believe that these revisions improve clarity and better situate the learning simulations within both the behavioral architecture framework and the constraints of available experimental data.

      Reviewer #3 (Recommendations for the authors):

      Figure 1a is very dense and I am struggling with the terms "reactive" and "basic" due to a general lack of clarity about the details of the model organization. For example, why do all of the sensory inputs point to turning proprioception? Why is proprioception two different things for turning and crawling? Why are some senses in light green while olfaction is in dark green? Why is feedback only from feeding, when crawling, head casting, and turning will change the sensory environment as well? Why is head casting not a behavioral module here? Why focus on following/being constrained by the "subsumption architecture paradigm" over a focus on the known literature and neuroanatomy?

      We thank the reviewer for this careful inspection of Figure 1. In the revised version we improved both the figure and its caption, as well as the corresponding description in the text.

      Specifically:

      - The “basic” layer has been renamed the “motor” layer for clarity, and the caption has been expanded to better describe each component.

      - The sensory inputs are now shown to target the motor layer as a whole, rather than just the proprioceptive component of turning.

      - Each motor module is conceptualized as a sensorimotor loop (green-red), which explains why proprioception appears in both crawling and turning.

      - The color coding has also been clarified: modules used in the current simulations are shown in darker shades, while others are faded.

      - Sensory perturbations caused by body locomotion – as in the case of crawling and turning – are not depicted in the figure as feedback between modules. We make this more explicit in the caption. The signal from feeding to the above layers is neuromodulatory – as indicated by the purple arrowhead.

      Finally, we explain that head casting and weathervaning are not modeled as separate modules, since both behaviors emerge from the coupled oscillator mechanism through crawl-bend interference. Our adherence to the subsumption architecture paradigm is motivated by its success in robotics and its conceptual alignment with hierarchical sensorimotor loops, but we have now made clearer that this is a simplifying framework rather than a rigid constraint.

      "Stimulus free conditions" (line 102) don't really exist. Substrate and temperature will always be present, light will have some intensity, etc. Does this really refer to fictive behaviors?

      We thank the reviewer for raising this point. In the revised manuscript we have removed the term “stimulus-free conditions” entirely to avoid the misleading implication that larvae experience no sensory input. We now explicitly describe these experiments as free exploration in the absence of navigation-guiding gradients, which accurately reflects the laboratory assay while avoiding any suggestion of fictive behavior. This terminology has been updated consistently throughout the text.

      The first results section is closer to an introduction than the intro itself is, owing to its focus on the context of the work the paper actually does rather than a broad review of larval behaviors that are not considered within this work.

      We believe the reviewer is referring to the “Model” section rather than the “Results.” The Model section is deliberately separated to outline the theoretical background of the behavioral architecture and to make explicit the general modeling assumptions, which explains why it cites previous work in detail. By contrast, the Introduction is intended as a brief overview of the broader larval behavioral repertoire, since the larva serves here as the case study for our framework. Presenting this repertoire is important because it defines the behaviors that populate the different layers of the architecture, even if only a subset of them is implemented in the simulations presented in this study.

      While the model components are described in the modeling section, no question is actually discussed. What is the goal of this model?

      This broader question is addressed in the public review section

      "Crawler" and "turner" are inconsistently described. They are described as "modules" in Figure 1, but they seem more like behavioral primitives.

      The specific terms "crawler" and "turner" refer to the computational modules, but correctly the reviewer points out that these generate the respective “crawling” and “turning” behavioral primitives. This has been made explicit in the Materials & Methods.

      Do larva-larva interactions matter here?

      In the revised manuscript we now state explicitly that larva–larva interactions are not included in the present simulations, as each virtual larva is modeled independently in accordance with the single-animal datasets used for calibration. We also point the reader to the Limitations section, where we note that although social interactions lie outside the scope of this study, the Larvaworld software package already supports tactile sensing and collision handling, enabling such interactions to be incorporated in future work.

      The description of the locomotor system, with coupled oscillators between crawling frequency and bending is very empirical. Is this because of the 2-segment model effectively limiting peristalsis to a single segment? What are the limits of this approach?

      The stride-phase–dependent modulation of bending amplitude was identified through kinematic analysis of full 12-segment larval datasets and is therefore independent of our later decision to implement a two-segment simplification. This means that the empirical relationship we describe should hold for any multisegment model, regardless of the reduced representation used in the present implementation. Generally, we performed our detailed empirical analyses with the goal to uncover statistical relations, which in turn were use for our data-driven coupled oscillator model in combination with the stochastic element of stride-chain and pause duration.

      Line 190: The paper starts discussing experimental larva tracks. These experiments need to be described.

      The reviewer probably refers to the dataset analysed in this study. This is a public dataset as described in the Dataset Description section in Materials & Methods, along with a description of the experiment per se.

      The purpose of Figure 2 is not entirely clear. Several panels are not referenced in the text (F,G,H) and all panels are referenced extremely out of order. Figure 3 is similarly hard to follow for the same reasons of being referenced out of order. In fact, this section is largely duplicated by the "Model calibration" appendix, which I find to be much more clearly written and with more directly relevant figure panels.

      In the revised manuscript, all panels of Figures 2 and 3 are now cited in the correct order, and their roles in the narrative have been clarified. Figure 2 is explicitly presented as a summary of the empirical kinematic analyses that motivate the structure of the locomotory model, while Figure 3 illustrates the corresponding model components. To avoid redundancy with the “Model calibration” appendix, we streamlined the main text and replaced duplicated descriptions with cross-references to the appendix, which contains the full methodological detail.

      The data describe larvae behaving with a range of parameters, presumably both as individuals and across time. However, the models described seem to employ a population of larvae that shares a common best-fit parameter and the equations presented in the methods are all ordinary differential equations without noise or stochasticity. Where is the inter-individual variation coming from?

      The reviewer is correct to point out the importance of variability. Our approach is agent-based, and we model populations of non-identical individuals rather than replicates of a single average larva. The simulated larvae retain variability across several parameters, capturing the combined range observed in the data. This was described in the original manuscript, and to avoid possible misunderstandings, we have now expanded the “Inter-individual variability” section in the Materials & Methods and, where appropriate, clarified this point elsewhere in the text.

      The absolute orientation of trajectories in Figure 4A is not meaningful in your model. I suspect it would be more informative to show aligned trajectories in order to better visually assess the behavioral similarity. Also, the biological experiment needs to be described here. Time crawling seems to not be a great fit, although the peaks are fairly well aligned. Do you have thoughts on why this is?

      In Figure 4A, which is intended as a visual comparison between experimental and simulated trajectories, the experimental tracks were transposed so that all starting points coincide at the center of the arena. As the reviewer notes, they were not rotated to a common axis, since our subsequent analysis focuses on spatial dispersal rather than directional alignment. The description of the experimental dataset has been clarified in the revised text.

      The reviewer is also correct that the distribution of time spent crawling is narrower in the simulations than in the experimental data. This reflects the fact that in the present study only three crawling-related parameters were sampled to generate inter-individual variability, and time spent crawling was not among them. We deliberately chose to assess how well the model reproduces distributions for behavioral metrics that were not explicitly fitted or parameterized. This point has now been made explicit in the revised manuscript.

      How did you assess the agreement of chemotaxis results with Gomez-Martin et al? It would be useful for the comparison to be made explicit within this paper, as well. How were the chemotaxis parameters fit?

      The agreement between experimental and simulated chemotaxis was assessed only qualitatively, as we did not perform quantitative locomotor analyses on chemotaxis datasets. For these simulations we used the same motor layer, including all its modules, as calibrated in the free-exploration condition (Fig. 4). The only additional adjustment was a single weighting parameter that translates the appetitive or aversive valence of odor sources into modulatory input for the bending module. This parameter was tuned manually using a visual criterion of performance, to ensure that both attractive and aversive chemotaxis were observable. We now make explicit in the text that for more complex simulations we retain the calibration obtained in simpler conditions and build upon it, rather than re-optimizing the model. Moreover, we now provide reference to the exact figure numbers in Gomez-Martin et al. for direct visual comparison also of the perceived concentration metrics in our Figure 5E&F where experimental and simulated data show a very good correspondence.

      Similarly, what are the key parameters for the mushroom body model and how did you fit their relationship to behavior? Was there actually feedback between the behavior of the larva and the training or was the SNN only used to generate the odor gain constant?

      The reviewer is correct to highlight this point. In the present study the mushroom body model was simulated independently to generate the odor-specific behavioral bias. This output was then translated into an odor gain constant, which served as input for the subsequent behavioral simulations of odor preference. There was no closed-loop interaction between the larval behavior and the training of the spiking network in this version. Establishing such a closed-loop connection is part of our future goals.

      It is unclear where feeding (as introduced in Figure 1) entered into the work presented, if at all.

      The reviewer is correct that the feeding module does not play a role in the present study. It was included in the behavioral architecture for completeness and because it is already implemented in the larvaworld package (see Sakagiannis et al., 2024). We have clarified this in the revised text.

      "During pauses, the input to the crawler module I_c = 0 and therefore forward..." The equations presented for the crawler module do not contain I_c.

      The inconsistency regarding the crawler module input has also been corrected. The equations now explicitly include the tonic input parameter, making them consistent with the descriptive text and our model implementation.

      Larva do more than crawl forward, they can also hunch up, head cast with their head in the air, dig, crawl backward, roll, and other behaviors. Because the individual modules in this framework have been defined as coupled oscillators, how would you decide to implement such aspects? At what point does the oscillator approach break down? In this model, how does the larva decide whether to bend left or right, and how is that affected by the environment or internal state? Can a larva bend in the same direction twice in a row?

      The intermittent coupled-oscillator model presented here does not attempt to cover the full larval repertoire, such as hunching, digging, backward crawling, or rolling. Nor does it explicitly implement handedness as a directional bias. Nevertheless, the framework already allows for sequences of repeated turns: from a stationary position a larva can execute successive bends of varying amplitude, which may occur in the same direction, mimicking repeated head casts to one side.

      Extending the model to include additional locomotor primitives would require the development of new modules, which could expand the basic locomotor layer either alongside or in place of the current lateral oscillator module. As noted in the manuscript, the modules implemented here are not intended as definitive but as placeholders that demonstrate how the architecture can integrate more elaborate models in the future. In this context, future directions include introducing handedness as part of inter-individual variability and enriching the behavioral repertoire with additional modules to capture the broader range of larval actions.

      I was not able to install `larvaworld` either through pip in a fresh environment on OS X 15 and various Python versions between 3.8 and 3.12. I ran into a range of issues, from `tables` (which is understandable) to issues installing the old NumPy in Python 3.12 where `setuptools` is no longer included. The packaging should be made more robust, or the working environment could be better defined. For example, the version pinning of dependencies seems much more strict than I would expect for a user-focused Python library, particularly with out-of-date versions of core tools like NumPy.

      We thank the reviewer for going to length and testing the implementation and pointing these issues to us. We have recently updated the package (version 2.0.1, November 2025) to improve installation robustness, relaxed unnecessary dependency pinning, and provided an environment specification to facilitate reproducibility. The revised manuscript directs users to recently updated installation instructions.

      Automated testing for python versions 3.10-3.11 for MacOS, Windows and Ubuntu is already implemented. Unfortunately we have not yet tried it on OS X15. Please post any issues on the larvaworld’s github page : https://github.com/nawrotlab/larvaworld.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. Author response:

      Thank you for the valuable feedback. We will be updating the manuscript to incorporate the reviewers' terrific suggestions. We specifically have:

      • Reduced redundancy and streamlined overlapping sections (especially around research alignment, protected time, and clinical demands)

      • Made the core decision-making framework more explicit and easier to extract (in a new Table 1, with clearer synthesis in the text)

      • Strengthened the emphasis on institutional/program context as a key determinant of success—arguably as important as specialty choice

      • Added more actionable guidance for trainees on how to evaluate departments (e.g., NIH Reporter, T32 presence, R01 density, K→R track record)

      • Included a slightly more explicit statement acknowledging that while all specialties can support physician-scientist careers, the structural ease varies and may require different levels of negotiation/support

      We did not address the broader workforce/job market question, since it feels outside the scope.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake. 

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness. 

      In the revised version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation. The key additional experiment is the phenotype of reduced bacterial uptake in low serum, but not in high serum conditions. The authors suggest this could be due to the SM from serum itself affecting the entry. While this explanation is plausible, prolonged exposure of cells to low serum is well documented to alter several cellular functions, particularly in the context of this manuscript, lysosomal positioning, exocytosis and Ca2+ signaling. A better control here could be WT cells grown in low serum.

      As the reviewer suggested, we did culture both, WT control cells as well as ASM knock-outs, under low serum conditions before conducting the invasion assays. Hence, the detected effects on S. aureus invasion must be caused by lack of functional ASM in the mutant.

      We apologize that this did not become evident from the manuscript’s text. We thus included a change in line 259 which now reads:

      ”To test whether FBS confounded our invasion experiments, we cultivated WT as well as ASM K.O. cells in medium with reduced FBS concentration (1%) and determined the S. aureus invasion efficiency (Figure 2I).”

      If SM in serum can interfere, why do they see such pronounced phenotype on bacterial entry in WT cells upon chemical inhibition?

      We explain the differences between inhibitor-treated WT cells and ASM K.O.s by the severe accumulation of SM upon genetic ablation of ASM. We demonstrated this by HPLC-MS/MS measurements in Figure 2L. If cells were cultured in 10% FBS, an ASM K.O. resulted in approx. 4-times higher levels of cellular SM C18:0 when compared to WT cells, while amitriptyline treatment of WT cells had no effect, and ARC39 treatment increased SM C18:0 levels only by 2-fold. This likely results from different durations of SM accumulation in the cell pools which is caused either by complete absence of ASM (in case of the ASM K.O.) or only in the hour-range upon treatment with the inhibitors.

      Under low serum conditions, the severe SM C18:0 accumulation in the ASM K.O. was found decreased (from 4-fold to 2-fold when compared to WT cells; Figure 2M). Here, the WT cells used as reference also were cultured in the same manner as the ASM K.O. A similar pattern was observed for other SM species (Supp. Figure 3). This correlates with the S. aureus invasion phenotype in ASM K.O.: under high serum conditions (and resulting in severe SM accumulation) we did not detect an invasion defect, while under low serum conditions (resulting in only moderate SM accumulation) S. aureus invasion was reduced in the knock-outs when compared to WT cells cultured in the same conditions, respectively.

      While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      Since the comments starting with the line above are identical to the previous comments by the reviewer, we assume that these points of criticism still resound with the Reviewer, although we had agreed previously with the reviewer that we do not show formation of ceramide-enriched platforms, we had changed the manuscript accordingly in the previous revision round already (see also our comment below).

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We continue to share the reviewer’s desire to discriminate between ASM-dependent and ASMindependent processes, but the simultaneous occurrence of multiple pathways of bacterial uptake is currently the limiting factor and technological challenge in our laboratory, since these events happen rapidly. We do hope that we or others will be able to address these limitations in the future, for instance with the technologies suggested by the reviewer.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASMmediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ? 

      We here want to elaborate again, since our revision experiments demonstrate the ASM-dependency of the rapid uptake under low serum conditions – see also above. We were convinced that the genetic evidence of an S. aureus invasion phenotype in ASM K.O.s under these conditions would eliminate the reviewer’s concern about the role of ASM during the bacterial invasion (see also above). Our lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype observed by us.

      We agree with the reviewer, however, that it remains elusive why changes in the sphingolipidome increase ASM-independent S. aureus internalization by host cells. One explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus in certain cell types (3, 4). In other cell types, such as A549 cells, S. aureus invades in an αtoxin and caveolin-1 dependent fashion (5). It will be interesting to study, to what extent such processes as described by Goldmann and colleagues will depend on ASM. However, a characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript. 

      As to the centrality of the pathway: we cannot and do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were intrigued by our finding of an ASM dependent uptake pathway for S. aureus – especially its speed. In different as of yet still unidentified host cell types or cell lines such a pathway may pose a major entry point for pathogens. Alternatively, we may have identified an ASM-dependent mode of receptor uptake, with which the bacteria “piggyback” into the cells.

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We again want to add that we measured phagosomal escape of S. aureus in WT and ASM K.O. cells cultured in 1% FBS (low serum conditions) and compared it to escape rates obtained with host cells cultured in 10% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (see Author response image 1). This was addressed already during the manuscript’s first revision. We found that escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium.

      Author response image 1.

      We therefore think that prolonged absence of ASM has additional side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      As it is currently unclear in how far the prolonged absence of ASM activity affects cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      Knock-downs in our laboratory are based on the vector pLVTHM(6). Inducible knock-downs in the cells would require the introduction of an inducible Tet<sup>on</sup> system, which the cells currently do not harbor.

      However, it needs to be stated that for optimal gene knock-downs, the induction of this system has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (7). However, the course of infection in macrophages differs from non-professional phagocytes (8). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms (see also above). We thus already had changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also had added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) Reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) Increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we had added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (9).”

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection. 

      We again thank the reviewer for this suggestion. We already had included the following section in our discussion (then: line 593): “Since fluorescent calcium reporters allow to monitor this process microscopically, future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References for the purpose of this response letter:

      (1) Rappaport, J., C. Garnacho, and S. Muro, Clathrin-mediated endocytosis is impaired in type AB Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm, 2014. 11(8): p. 2887-95.

      (2) Rappaport, J., et al., Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm, 2015. 12(5): p. 1366-76.

      (3) Hoffmann, C., et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci, 2010. 123(Pt 24): p. 4280-91.

      (4) Tricou, L.-P., et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports, 2024. 14(1): p. 28643.

      (5) Goldmann, O., et al., Alpha-hemolysin promotes internalization of Staphylococcus aureus into human lung epithelial cells via caveolin-1- and cholesterol-rich lipid rafts. Cell Mol Life Sci, 2024. 81(1): p. 435.

      (6) Wiznerowicz, M. and D. Trono, Conditional suppression of cellular genes: lentivirus vectormediated drug-inducible RNA interference. J Virol, 2003. 77(16): p. 8957-61.

      (7) Li, C., et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal, 2018. 28(10): p. 916-934.

      (8) Moldovan, A. and M.J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol, 2019. 21(3): p. e12997.

      (9) Rühling, M., et al., Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio, 2025. 0(0): p. e03654-24.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The study does not explore or discuss how oral ingestion of Nora virus leads to the colonization of stem cells, which are located basally in the gut. This mechanism should be discussed.

      We have added an additional paragraph (4th) in the Discussion dealing with this issue and are further discussing the consequences of RNAi potentially not being functional in progenitor cells in the paragraph on antiviral responses.

      (2) The authors fail to detect Dicer-GFP fusion protein expression in stem cells, a finding that could explain why the virus persists in these cells. Further investigation is needed to determine whether RNAi functions are effective in stem cells compared to enterocytes. For clarification, the authors could cross esg-Gal4 UAS-GFP and Myo-Gal4 UAS-GFP with UAS GFP-RNAi and/or express a Dicer-GFP construct under a stem cell-specific driver.

      Actually, it is well-known in the Drosophila literature on the intestinal epithelium that RNAi functions well in progenitor cells as the technique has been widely used to understand the control of stem cell division and differentiation in tens of articles. We provide here just a few examples: Jiang et al., Nat Commun (2025) https://doi.org/10.1038/s41467-024-55255-1; Zhai et al., PLoS Genetics (2017) https://doi.org/10.1371/journal.pgen.1006854; Wu et al., https://doi.org/10.1371/journal.pgen.1009649.

      (3) The presentation of experimental parameters (e.g., pathogen type, temperature, time points) should be improved in the results section and at the top of the figures to enhance clarity. Additionally, details regarding the mode of oral infection (continuous exposure vs. single feeding on a filter) should be specified. Given that fly stock flipping frequency influences microbiota load (as noted in Broderick et al.), this should be reported, especially for lifespan studies.

      P. aeruginosa oral infection was always by continuous exposure, as detailed in the Mat.& Meth. section. Nora infection was done by exposure to the viral solution for 24h, as detailed in Mat. & Meth. The flipping frequency had also been reported in that section.

      (4) To confirm that enterocyte colonization requires stem cell proliferation and differentiation, the authors should analyze Nora virus localization in JAK-STAT-deficient flies infected with bacteria or toxicants. This would help determine whether the virus can infect enterocytes in the absence of enterocyte differentiation, but stimulation of stem cells.

      We now provide these data (pictures and quantification) in Fig.7 G-H and discuss them in the main text.

      (5) The study does not discuss the spatial distribution of Nora virus infection along the gut. Specifically, it remains unclear whether viral colonization is higher in gut regions R2 and R3, which contain proliferative stem cells. Addressing this could provide valuable insights into the virus's infection dynamics.

      We have now specified that Nora virus was detected only in the posterior midgut; we are now also providing a schematic illustration in Fig. S5J.

      Recommendations for the authors:

      Major Suggestion

      See weaknesses section for key areas requiring improvement.

      Minor Suggestions

      (1) Line 79: Mention Nox in the text. Key references on Nox include Jones (2013), Iatsenko (2018), and Patel (2016).

      Done.

      (2) Line 92: The long list of publications is unnecessary and can be shortened.

      We are not sure that many investigators are aware of the scope of our studies on host-pathogen relationships and this is the adequate place for a reminder.

      (3) Line 196: Cite Choi et al. (Aging Cell, 2008; 7:318-334. doi: 10.1111/j.1474- 9726.2008.00380.x) for the initial work on gut dysplasia during aging. However, note that dysbiosis in aging is demonstrated in Buchon et al. (2009, Genes and Development) and other studies.

      Done.

      (4) Line 265: It would be interesting to clarify whether the shortened lifespan of Norainfected flies after a clean injury is dependent on the microbiota.

      The shortened life span of Nora-infected flies is not due to the injury as demonstrated in Fig. S4F. Hence, the shortened lifespan is differentially affected by the microbiota according to nutrition conditions as documented in Fig. 3D-E.

      (5) Line 285: Clarify what is meant by "polyubiquitin promoter"-do the authors mean a ubiquitous Gal4 driver? Specify the Gal4 lines used in the result section.

      Done. The construct is a direct fusion of the ubiquitin p63E promoter to the Dicer-fluorescent protein sequences as described in Girardi et al., Sci Rep, 2015.

      (6) Line 347: Indicate the references aligning with the most recent studies on this topic.

      Done.

      (7) Line 373 and elsewhere: Mention studies that have shown the microbiota influence on lifespan, in relation to dietary richness.

      Done.

      (8) Line 588: Provide details on the method used for hemolymph collection.

      Done.

      (9) Line 964: Clarify the phrase "as previously shown"-where in this paper was it demonstrated?

      The legends have been rewritten and the phrase has been deleted.

      (10) Line 987: In "survival of non-infested with PA14," explicitly mention Nora to distinguish between different infections.

      Done.

      Figures & Experimental Details

      (11) Figures: Improve figure legends or add information at the top of figures, specifying:

      Number of flies used to monitor Nora virus titer.

      Temperature conditions. o Age of flies used in experiments.

      Done.

      (12) Figure 2E: The lifespan of Nora-negative flies appears very short. Was this lifespan assay conducted at 29{degree sign}C? What was the fly stock flipping rate?

      Correct, it was 29°C. As described in the Material and Methods section, the flies were flipped every two (29°C) to four days (25°C).

      (13) Figure 4C: Improve labeling on the plate for better clarity.

      Done.

      (14) Figure 6C: The figure legend on the right is difficult to interpret. Clarify what "+" indicates and explicitly write out the genotype. Is NP identical to NPG4G80?

      Done. NP is the NP1 driver. We usually use it in a version that also includes a Gal80<sup>ts</sup> transgene to express the gene of interest only at the adult stage.

      (15) Dissection Details: Clearly state which part of the gut was dissected-midgut, entire gut, {plus minus} Malpighian tubules. This should be specified in the results section.

      Done (no Malpighian tubules nor crop) for RTqPCR analyses.

      (16) Clean Injury: Provide more details in the results section regarding the injury site and needle size.

      Done.

      (17) Use "Abx" instead of "AntiB," as the former is more commonly recognized.

      Done.

      Reviewer #2 (Public review):

      The title does not seem to be fully supported by the data. While the authors convincingly show the increased sensitivity to Pseudomonas infection, effects on another tested bacterium, Serratia marcescens, were not significantly different between Nora-virus-infected and noninfected flies. Thus, effects of 'intestinal infection' seem to be too broad a claim.

      We agree with the reviewer and have accordingly modified the title, which now explicitly refers to P. aeruginosa.

      Also, whether the Nora virus increases sensitivity to oxidative stress is not so clear to me: the figure that supports this claim is the survival assay of Figure 5F. However, the difference in survival between control and paraquat-treated Nora (-) flies seems to be in the same order as between control and paraquat-treated Nora (+) flies. Rather, cause and effect seem to be the reverse: paraquat increases ISC proliferation, higher viral loads, and consequently shorter survival. I suggest rephrasing the title and conclusions accordingly.

      While we usually just directly compare Nora (+) vs. Nora (-) flies with the same conditions, we note that the difference of survival between control and paraquat-treated Nora (-) flies is of about 9 days, based on LT50 values whereas it is of 8 days for Nora(+) flies. This difference is of about two days when comparing Nora (+) to Nora (-) flies exposed to paraquat. Thus, Nora does contribute to an increased sensitivity to oxidative stress likely by the process highlighted by the reviewer and also by its own detrimental action on the homeostasis of the intestinal epithelium and associated disruption of its barrier function.

      Quantification of immunofluorescence microscopy is missing, rendering the images somewhat anecdotal. Quantification should be provided. It will then also be of interest to quantify the number of Nora (+) cells, and the Nora virus levels per infected cell (e.g. Figure 5H). Also, the claim that the Nora virus initially infects ISC and later (upon stress) infects enterocytes requires quantification.

      Missing quantifications of pictures have been added: Figs. S5E and 7H. We are not sure we understand the reviewer comment on “Nora virus levels per infected cell”: the signal we are seeing may correspond to aggregates of the virus and would be impossible to quantify reliably, e.g., in the right-most panel of Fig. 5H. Fig. 5I clearly shows that no Nora is detected in enterocytes of young 5-day-old flies in the absence of infectious or xenobiotic challenge.

      Genetic support for the role of the JAK-STAT pathway in driving ISC proliferation and supporting Nora virus replication is convincing. It would also be of interest to analyze other pathways implicated in ISC proliferation (e.g. JNK, EGFR), especially given the observations of Nigg et al, showing an involvement of STING/NF-kB and EGFR pathway in driving intestinal phenotypes of Drosophila A virus-infected flies (doi: 10.1016/j.cub.2024.05.009).

      We agree with the reviewer that these would be interesting experiments to perform, especially in the light of one hypothesis that antiviral defenses may prevent the initial infection of enterocytes as discussed at length in our updated discussion on host antiviral defenses. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions. In this work, we used the interference with the JAK-STAT pathway as a second tool to block the division of ISCs.

      Figure 5E: An intriguing observation is that GFP:Dicer2 seems to be unstable in Nora virusinfected cells. Here, GFP control driven by the same driver line would be required to confidently conclude that this is due to an effect on Dicer-2 specifically.

      Actually, this experiment was not performed using the Gal4-UAS system but a direct fusion. We do know that GFP is stable when expressed in enterocytes, e.g., Lee et al., Cell Host&Microbe (2016) DOI: 10.1016/j.chom.2016.10.010.

      Legends are mostly conclusive, and essential information about the experimental setup is missing in the captions of multiple figures, making the interpretation of the data difficult. See my private recommendations for suggestions to improve the data presentation.

      Done.

      Recommendations for the authors:

      Suggestions for the presentation of the data:

      (1) I found the names Ore-R(SC) and Ore-R(SM) for noninfected vs infected Ore-R flies not very intuitive. I suggest renaming them into something that makes the infection status clear.

      These notations refer to two distinct sub-strains that may reflect different origins with some likely genetic drift accounting for the distinct properties of the two sub-strains. As the ORE-R (SM) have different infection status: infested, cleaned, re-infected, we fear that this would not clarify the matter. Of note, ORE-R(SC) are refractory to Nora virus infection (Fig. S1I).

      (2) Please define the number of flies analyzed for survival assays in the legends.

      Done.

      (3) The authors provide conclusions in most of the figure legends, without providing an explanation of the experiment that was done. Conclusions should be used sparingly, if at all, in legends. Also, relevant information is often missing in the legends (time points after infection, Figure 2E food source, etc.). I suggest the authors carefully double-check their legends and rephrase the conclusive legends with descriptive ones.

      Done. The figure legends have been rewritten.

      (4) Several of the legends indicate that 'data represent the mean of biological triplicates' however some panels do not represent triplicates (e.g. Figure 1C-E). Please correct.

      Done.

      (5) Legends: which multiple comparison test was used for ANOVA?

      Done. Tukey’s post-hoc test was used for direct comparisons.

      (6) Line 888: black arrows are not shown in the figure.

      Corrected.

      (7) Figure 1F: legend on the figure seems incorrect (all are labeled Nora (+)); likewise for Figure 2C (all labeled Nora (-)).

      Corrected.

      (8) Materials and methods: please describe how the Nora virus antibody was raised (and specify on line 271 what viral protein is recognized).

      Done. As the whole virus was used for immunization, we cannot state which specific viral proteins are detected by the antibody.

      (9) Please define what is presented in the box plots (mean, range, whiskers, individual data points).

      Done.

      (10) Figure 4 and associated text (line 221): a brief explanation of the Smurf assay would be useful.

      Done.

      (11) Figure 4C: I did not find the picture of the agar plate informative, as similar information is conveyed in Figure 4D. Also, the labelling cannot be clearly read.

      Figure 4D provides a quantification of panel C. The readability has been improved.

      (12) Figure 4C: It is suggested that Nora-positive, smurf-negative flies were analyzed, but from Figure 4B it seems that these do not exist. Please explain.

      The data in Fig. 4B do not represent absolute numbers but percentages. Thus, there were at most 50% of SMURF-positive flies at the time of the assay, the rest being Smurf-negative yet Nora-positive.

      (13) The abbreviations PA14 and Db11 are used in several figures. I would suggest defining the abbreviation in the legend to facilitate interpretation.

      Done.

      (14) Figure 5A/5G: the Nora virus RNA levels in this figure are dramatically lower than the levels in other figure panels. Please check/correct.

      Done. The reviewer is indeed correct: we have forgotten to write that for these two panels, the loads are relative and not absolute as is the case in other panels. 5A: the load in whole flies was taken to be 1; 5G: untreated Nora-positive flies were taken to be 1.

      (15) Figure 6A: total number of AporTag positive cells are reported. Were the same number of total cells analyzed? Please define.

      We have not counted all of the cells in each midgut but provide the number of ApopTag positive cells per midgut. We thus make the assumption that the overall number of midgut cells is not varying much from one midgut to the other. Visual inspection of DAPI-stained nuclei did not reveal any obvious change in the density of enterocyte nuclei as illustrated in Fig. S6 (we guess that everyone in the field is making the same assumption when counting mitotic ISCs with PHH3 staining).

      (16) Figure 6C: I find the shades of blue difficult to distinguish and suggest to us other colors.

      Done.

      (17) There seems to be a large mismatch between the percentage of Nora virus-positive cells in Figures 5C, 6H and the images of Figures 5G and 5H. Why?

      We think there might be a mistake with the Figure numbers cited by the referee. We guess the point the referee was trying to raise is the difference of perceived Nora virus burden between Fig. 5H and Fig. 6G, a quite valid point. For Fig. 5H, we had measured the Nora-virus load by RTqPCR (Fig. 5G, relative burden) but had not quantified the images. This is now done and shown in Fig. 5I. In Fig. 5H, young flies were used and hence there was no Nora virus detected in ECs, as now quantified in Fig. 5I. For Fig. 6G, we had to use 30-day old intestines to be able to observe Nora virus in the enterocytes of the controls. We have now included this important point in the main text and in the Figure legends.

      (18) The Title of the legend in Figure 7 is not supported by the data as 'spread through the intestine' has not been analyzed. Please adjust.

      Done.

      (19) All figures in which ANOVA is used: I assume that anything not labeled with an asterisk was found to be non-significant? If so, this should be indicated in the manuscript.

      Actually, we have not highlighted obvious differences to maintain clarity (e.g., Fig. 1E between uncured Ore-R(SM) and cured Ore-R(SC). We thus have underlined the biologically relevant differences in the panels. The interested readr can refer to the primary data that are accessible on a data repository.

      (20) Figure 7C: the authors may want to contrast their finding that Upd3 was not upregulated in Nora virus-infected flies (in the absence of PA14) with the findings of Kuyateh et al, who did report upregulation of Upd3 (https://doi.org/10.3390/v15091849).

      We thank the reviewer for pointing out this study we were unaware of. We would like to point out that this article is difficult to follow as it is not 100% clear in which of the analyzed studies the induction of upd3 was observed and which exact experimental conditions were followed, e.g., young or old flies, whole flies or gut… We have looked in more detail at ref. 133 of this article, which refers to an unpublished study from the Hultmark laboratory that is however available online: (https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=15&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=Nora+virus&language=en&pid=diva2%3A1045375&aq=%5B%5B%5D%5D&sf=all&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=4587).

      In that study, flies were “infected” with Nora virus by expressing a cDNA clone injected into embryos. The problem is that for some unknown reasons the authors used Relish mutant flies. It is thus difficult to conclude as these flies are defective for the IMD and Sting pathways whereas our flies are wild-type. We were also interested to read that genes involved in midgut stem cells differentiation were expressed in flies harboring Nora virus, which is in keeping with the data of the present study. However, it is difficult to discuss this when we know little on the background of the studies analyzed by Kuyateh et al, in as much as our Discussion is already rather long.

      (21) Figure 7E: are the differences between control and Dome/Stat knockdown flies significantly different for Nora (+) flies (in the absence of Pseudomonas)? This is not clear from the data presentation.

      The answer to the question is positive: the JAK-STAT pathway also contributes to the maintenance of intestinal epithelium homeostasis in the absence of bacterial infection, that is presumably basal conditions. We have modified Fig. 7E to include more comparisons.

      Textual suggestions:

      (22) Line 25 strives > thrives

      Done.

      (23) Lines 150- 152, etc are not very informative. Also, some of the viruses analyzed are not "known contaminating viruses", but viruses used experimentally (VSV, IIV6, CrPV). I suggest adjusting the phrasing.

      Done.

      (24) Line 862: weaker fitness > lower fitness.

      Done.

      (25) Virology terms:

      (a) I suggest not using the term titer for qPCR readouts (which do not involve titration). Viral RNA level or viral RNA load would be more appropriate.

      Done.

      (b) I would propose rephrasing the Y-axis label of Figure 1C, E to Nora RNA load (same for other figures showing viral RNA).

      Done.

      (c) Infested: rather use the more accurate term infected.

      Done.

      (d) Contamination: rather use the term infection.

      We have modified some but not all occurrences of this word. We believe that it is important to use the word contamination when referring to enterocytes: the enterocytes are not infected by Nora; rather, differentiated infected ISCs become contaminated enterocytes. Infection refers to an active process whereas contamination refers to a state.

      (e) Proliferation: rather use the term replication.

      According to our US-English dictionary, proliferation refers to the “rapid reproduction of a cell, part, or organism”, which is the meaning we intend. Replication does not have this notion of speed of reproduction.

      (f) Drosophila should not be italicized in Drosophila A virus, following the ICTV convention that a "virus name should never be italicized, even when it includes the name of a host species or genus" https://ictv.global/faq/names.

      Done.

      (26) Line 873-975: please rephrase the legend of Figure 1F as the current one is not informative.

      Done.

      (27) Line 934: I suggest moving the justification of the time point chosen "= LT50 on the survival test in 935 Fig. 2E" to the main text.

      Done.

      (28) Line 936: with drop > with a drop.

      No longer relevant.

      (29) Line 940-941: the grammar of the sentence does not seem to be correct as it suggests that SDS induces Diptericin expression.

      No longer relevant.

      (30) Line 952-953; line 980: please correct mismatch singular/plural (antibody have, inhibition do).

      Done.

      (31) Line 422: "It will be interesting to determine whether the absence of a Dcr2 fluorescent proteins fusions in progenitor cells that we report in this study rules out a role for the RNAi pathway in intestinal host defense against the Nora virus". It would be of interest to discuss this finding in the context that virus-derived Nora virus siRNAs can be easily detected and that the viruses encode an RNAi antagonist (doi: 10.1371/journal.ppat.1002872).

      Done. We have updated the Discussion and propose a model whereby RNAi would prevent primary infection of enterocytes and then virus replication in proliferating progenitor cells would allow the virus to effectively inhibit the RNAi machinery when the infected progenitor cells become enterocytes.

      (32) Line 159: Nora virus phenotypes differ between laboratories. I would be interested to read the authors' speculations on why this would be the case.

      Our work shows that the effects of Nora virus depend significantly on several parameters we have identified: nutrition quality, age, exposure to abiotic or biotic stresses, and fly genotypes with the existence of Nora-refractory strains. These parameters as well as potential differences between laboratories are actually discussed in the second paragraph of the Discussion.

      (32) Line 175: capitalization of ORE-R vs Ore-R at other places in the manuscript.

      Done.

      (33) Line 185-194: PA14 and Pseudomonas are used interchangeably. Perhaps it is clearer to stick to a single term for consistency.

      PA14 is one clinical strain used to study P. aeruginosa. There are many others such as PAO1, which is also widely used. We have decided to write P. aeruginosa PA14 the first time we are using it in each figure legend, and use only PA14 afterwards.

      Reviewer #3 (Public review):

      The claim that Dcr2 is not abundant in ISCs because the protein is not stable is logically consistent and reasonable. Perhaps I missed this, but the authors could additionally knock down or use somatic CRISPR to delete Dcr2 in ISCs to test whether a lack of Dcr2 underlies sensitivity. In this experiment, the expectation would be that depleting Dcr2 in ISCs genetically would make little difference to susceptibility overall compared to controls. This is not an essential experiment request.

      We agree with the reviewer that these would be interesting experiments to perform. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions dealing with the specific steps of RNA interference that may be missing in progenitor cells.

      Recommendations for the authors:

      (1) Line 206-207 and 214-216: the order of ideas presented here is unintuitive. In Lines 206207, it is said that ABX treatment had no effect, which is counterintuitive to the nature of infection susceptibility. But this is resolved in Lines 214-216 when the reader realizes that S3G is fed on a sucrose solution, and so likely microbiota-depleted. Perhaps more could be said to clarify this in the main text, and/or swap the order of these observations so a casual reader is not confused about the nature and extent of the microbiota contributing to the sensitivity of Nora-infected flies.

      As suggested by the reviewer, we have clarified the text with respect to the food source and microbiota load; we emphasize that the microbiota plays a protective role in Nora-negative flies fed on sucrose solution even though the microbiota load is very low under these conditions. Of note, the microbiota is not depleted in sucrose-fed Nora-positive flies: we suspect that delaminating enterocytes may actually provide directly or more likely indirectly (peritrophic matrix) nutrients for the microbiota.

      (2) Line 262-265: the text may be a bit exaggerated given only 3 pathogens tested, one of which was a fungal natural infection breaching the cuticle and largely bypassing the gut. This could be re-phrased.

      The important point is that uninfected Nora-positive flies die with a LT50 of about 10 days even when noninfected; it has nothing to do with the number of pathogens tested. Thus, any infection that causes death with kinetics in this range may be misinterpreted in the absence of a relevant uninjured or clean injury control.

      (3) Line 379-382: I don't know if citing Schissel et al. is needed here. This paper's methods and data are highly problematic, as mentioned by the authors. This is not a highly cited paper, nor does it add value to the present discussion to cite it only to discredit it. Perhaps this can be left out and the field can move on quietly - naturally, this choice is the present authors', and this is just my view.

      We have actually cited this article at two other places and thus had not cited it “only to discredit it”. We have nevertheless removed the lines as suggested by the reviewer.

      (4) Line 404: perhaps clarify "Interestingly, mammalian stem cells..."

      Done.

      (5) Line 455: my understanding of digital PCR is that it is highly useful for detecting rare variants but not particularly better than qPCR for estimating loads/titres? This is not to say dPCR is worse, just that dPCR and primer-specific RT + qPCR are comparable if load/titre is desired. For instance, Qiagen actually recommends qPCR over dPCR specifically (and pretty much exclusively) for gene expression: https://www.qiagen.com/us/applications/digitalpcr/beginners/dpcr-vs-qpcr.

      (6) Perhaps Line 455 could drop the advocacy for digital PCR? I agree using dissected guts, or seemingly aged individuals per Figure 3B(?), is a valuable thing to point out. Maybe the aged individuals point could be added here? I guess the idea behind dissected guts is to have samples enriched in Nora virus.

      Cleaning Nora-positive strains is really difficult and we suspect that as long as there is one viral particle left, it may be sufficient to re-ignite the contamination of the strain. Our own experience with digital PCR on the expression of AMP-like molecules in the head of flies is that we found the approach to be more sensitive than classical RTqPCR (Xu et al., EMBO Rep, 2023).

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      We thank the editors and the reviewers for their constructive feedback in helping us strengthen this manuscript.

      During the revision process, new information was shared with us by the 10x Genomics team regarding the Xenium probe sequences evaluated in our original paper. Briefly, the Xenium probe sequences we evaluated represented an earlier iteration of the probes used to generate the data in Janesick et al. Further, we were made aware that the probe sequences used in Janesick et al. represented an earlier iteration of the commercially available Xenium v1 Human Breast Gene Expression Panel. We now elaborate further in a new Supplementary Note. We have therefore updated the paper throughout to reflect this new understanding, though we emphasize that our conclusions do not change. Rather, this newfound understanding provides stronger evidence of off-target probe binding with imperfect sequence matching, which we support with new supplementary figures.

      (1) Limited evaluation of tissues and gene panels

      “The results were only tested with one tissue (human breast). However, this is not a major weakness, as one can easily extrapolate that this should be the case for any other tissue.”

      “Does not apply the OPT method to the most widely used Xenium gene panels (e.g., pan-Human, pan-Mouse panels with ~5,000 genes each).”

      “The authors claim that OPT is a generalizable method for identifying off-target probes. To support this claim, they should provide similar predictions for the Xenium Pan-Human or Pan-Mouse gene panels, which are more widely used than the breast cancer panel.”

      “While I understand that conducting new experimental studies is likely beyond the authors' intended scope of the manuscript, the narrow reliance on Janesick et al. for all of the validation makes it difficult to assess the broad usability of OPT. In the absence of designing and then validating novel padlock probe designs with OPT, are there other publicly available datasets that authors could perform secondary analysis on using OPT?”

      Our primary focus on breast cancer was driven by data availability rather than tissue specificity. For this probe panel, matched Xenium, Visium, and scRNA-seq datasets are publicly available, enabling direct cross-platform comparisons of gene expression and allowing us to evaluate the impact of off-target probe binding in Xenium.

      OPT is tissue-agnostic and can be applied to any probe panel regardless of tissue type. To demonstrate this generalizability, we have now applied OPT on all publicly available 10x Genomics probe sets beyond the breast panel, including the Xenium pan-Human and pan-Mouse gene panels. The complete results of these analyses have been generated and are provided as a compressed zip file accompanying the revised manuscript.

      Beyond pre-designed panels, in this revision, we have now also applied OPT to custom Xenium gene panels from the Human BioMolecular Atlas Program (HUBMAP) and further demonstrate integration of HUBMAP RNA-seq data to evaluate the impact of potential predicted off-targets in a new section “Bulk RNA-seq reference atlases suggest off-target binding can variably impact results in Xenium custom probe panels.”

      Overall, in these newly evaluated panels, we identify many cases of off-target probe binding with non-negligible expression of off-target genes in the target tissue, underscoring that our findings are not specific to human breast tissue. Therefore, in the revision, we have broadened the title to “Evidence of off-target probe binding affecting 10x Genomics Xenium Gene Panels compromise accuracy of spatial transcriptomic profiling”

      (2) Limited quantifications

      “Lacks clarity on how the confidence level of off-target predictions is calculated.”

      “How can the confidence level of these off-target predictions be quantitatively assessed? Please provide benchmarks or validation metrics if available.”

      We thank the reviewer for raising this important point. To strengthen our claim that predicted off-targets can contribute to observed Xenium expression patterns, we incorporated a quantitative assessment in addition to the qualitative comparisons presented previously. Specifically, we leveraged Visium and scRNA-seq data to compare spot- and cluster-level expression of target genes alone versus expression aggregated with their predicted off-target genes. Across all examples shown, inclusion of predicted off-targets consistently resulted in stronger agreement with the Xenium results, as reflected by decreased RMSE and increased Pearson correlation relative to using the target gene alone.

      We emphasize, however, that OPT does not assign a formal confidence score to off-target predictions based on sequencing data alone. Importantly, identification of a potential off-target by OPT does not imply that it will necessarily affect Xenium results. As we’ve noted, if the off-target gene is not expressed, then it will not affect the observed gene expression magnitudes of the target gene. To help users assess whether predicted off-target genes will affect observed gene expression magnitudes of the target gene for a tissue of interest, we now provide a complementary analysis, including heat-map visualizations comparing the expression of target genes and their predicted off-targets in matched bulk RNA-seq or scRNA-seq datasets from the same tissue (Supplementary Figures 9, 10, 11). We hope this evaluation pipeline will clarify to researchers they can evaluate whether predicted off-targets will appreciably affect results in their tissue of interest.

      (3) Under-developed and non-essential software

      “The manuscript section on the software tool feels underdeveloped.”

      “Once the 10X Genomics corrects their gene panels according to this finding, the tool (OPT) will not be useful for most people. Still, it can be used by those who want to design de novo probes from scratch.”

      “Since the authors claim that OPT is intended for community use, the paper should provide a clear, step-by-step user guide, such as Jupyter tutorial, ideally as supplementary material.”

      We agree with the reviewers that the description of the software tool itself is relatively concise. This is intentional, as the primary goal of this manuscript is not to introduce a standalone software framework, but rather to use the tool as a means to characterize and quantify off-target probe binding and its potential downstream impact on spatial gene expression analyses. Accordingly, our emphasis is placed on the biological and analytical insights enabled by this approach, rather than on extensive software tool details. To support potential users, we have now included additional software documented with an example Python notebook demonstrating how it can be applied to any probe panels in the GitHub repository: https://github.com/JEFworks-Lab/off-target-probe-tracker/blob/main/example.ipynb

      Likewise, the primary goal of this manuscript is not to suggest that a specific vendor’s probe panels are flawed, but rather to demonstrate that off-target probe binding is a general and underappreciated phenomenon that can occur in some probe-based spatial transcriptomics platforms to meaningfully impact downstream analyses and biological interpretation.

      OPT was developed as a framework to identify potential off-target probe interactions based on sequence homology. In practice, OPT can serve as a post hoc tool that allows researchers to assess whether predicted off-target interactions may exist in a given panel and to account for these possibilities when interpreting spatial expression patterns, even when panels have been developed by the many probe designing methods now highlighted in the revised manuscript. Given the complexity of probe design and hybridization behavior, we believe that explicitly identifying and reporting potential off-targets remains valuable for downstream data interpretation, cross platform comparisons, and reproducibility. Thus, OPT is intended to complement existing probe design strategies and vendor efforts, rather than replace them, by providing researchers with additional context to interpret their data more accurately.

      In our revision, we have therefore elaborated on this in the discussion, reiterated here for convenience: “Although we focus here on the 10x Genomics Xenium technology, we do not exclude the possibility that off-target binding may similarly affect other probe-based gene detection approaches from other commercial vendors. Any technology that relies on hybridization-based detection is inherently susceptible to off-target probe binding when sequence similarity exists. Further, hybridization-based detection often inherently involves a trade-off between sensitivity and specificity. Given these inherent technological limitations, we therefore emphasize the importance of transparency through sharing probe sequences. However, many companies do not release the probe sequences used in their assays, limiting the consumer’s ability to fully interpret their results as well as the community’s ability to effectively characterize and benchmark performance variation across platforms. Therefore, we strongly recommend that companies publish probe sequences for pre-designed panels and likewise that researchers using these technologies should obtain and publish probe sequences used in their studies to support transparent and reproducible science. “

      Recommendations for the authors:

      “The paper only describes evidence of the off-target effect based on perfect sequence homology, although the tool (OPT) provides an option to find additional "potential" off-targets that allow mismatches. It would be very nice if the authors could additionally provide at least one example of off-target binding with at least one mismatch.”

      We thank the reviewer for the opportunity to clarify this point. In addition to analyses based on perfect sequence homology, we examined predicted off-target binding when allowing mismatches at the terminal ends of probe sequences. This analysis is presented in the Results section titled “OPT results when allowing mismatches at the terminal ends of the probe sequences identifies additional off-target candidates.”

      In this revision, we now allowed a 10bp padding on either end of the 40bp probe sequence, permitting imperfect sequence matching at the terminal regions. Under these conditions, OPT identified additional off-target candidates, including TUBB2B and ACTG2, which we highlight as representative examples (Supplementary 7,8). We further demonstrate how these predicted off-target interactions impact gene expression concordance by comparing Xenium measurements with both Visium and scRNA-seq data, showing measurable changes in cross-platform agreement. Together, these results illustrate that allowing mismatches reveals biologically relevant off-target effects beyond those captured by perfect sequence homology alone.

      “Clarifications and updates for Figure 2A-B

      Xenium offers a resolution of up to 200 nanometers with continuous readout, without pixel gaps. However, the figures shown in Figure 2A-B appear pixelated - why is this the case? Could the authors clarify this discrepancy and, if possible, provide the raw feature intensity data for Xenium in the supplementary materials?

      Additionally, there appear to be no visible gaps in the Visium graphs. Could the authors update the figure panels to represent the true spot locations for Visium, to more accurately reflect the underlying data structure?”

      We thank the reviewer for the opportunity to clarify these points. The goal of Figure 2A-B is to facilitate a direct visual comparison of gene expression patterns between the Visium and Xenium platforms. To enable this comparison, we aggregated the single-cell Xenium data into spatial patches matching the effective resolution of Visium spots (55x55µm). Similarly, Visium spots were rendered as patches to produce a more continuous visual representation. As a result of this aggregation and visualization choice, the Xenium expression plots appear pixelated despite Xenium’s native subcellular resolution (up to ~200 nm with continuous readout). We have clarified this processing and visualization step in the Methods to avoid confusion.

      With respect to the Visium expression plots, the lack of gaps is also a consequence of rendering each spot as a filled patch rather than plotting traditional Visium spots. This was done intentionally to maintain visual consistency with the aggregated Xenium data and to emphasize spatial concordance rather than the underlying sampling geometry. We have now explicitly stated this design choice to improve clarity.

      “I found the format of the manuscript to be at times confusing and perhaps a bit of an odd fit for a general interest journal. A significant portion of the manuscript is spent critiquing a specific publication, "High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis" published by Janesick et al. (of 10x Genomics, Inc.) in Nature Communications in 2023. This content would seem more appropriate as a Comment submitted to Nature Communications, potentially to be accompanied by a response from the authors of Janesick et al. at 10x.”

      I would like to address this important point as the corresponding author who takes primary responsibility for the unconventional decision to submit this manuscript to eLife as opposed to as a commentary suggested by the reviewer.

      Consistent with the reviewer, I did initially consider submitting this as a Matters Arising to Nature Communications. However, after consultation with other senior colleagues and co-authors, I decided to forgo this route on the basis that the information provided in a Matters Arising must be kept confidential. I was concerned that this would lead to long, drawn-out private exchanges. As we note in the manuscript, the Xenium platform's widespread use and high cost imposed a certain urgency that I believed warranted open and rapid dissemination.

      Therefore, we submitted to eLife with the hope that eLife’s unique continuous post-publication public peer review process will enable the rapid dissemination of these important financially-sensitive insights while permitting constructive criticisms from both industry and academic expert reviewers to be openly considered by all readers.

    1. Author Response:

      (1) Clarification of the distinction between resting-state trait measures and ongoing neural dynamics

      All the Reviewers commented that this study provides a useful characterization of the relationship between trait-based resting-state neural dynamics and behavioral measures. At the same time, we agree that including ongoing EEG dynamics during task performance would have added important complementary information. In particular, task-related EEG would allow a more direct characterization of the relationship between ongoing neural activity and behavioral indices at the single trial level, thereby helping to clarify the role of ongoing neural dynamics in evidence accumulation and perceptual decision-making. It would also enable testing how pre-stimulus alpha oscillations and aperiodic activity dynamically influence temporal integration, serial dependence, and confidence on a trial-by-trial basis.

      However, we would like to emphasize that the primary aim of the present study was to investigate trait-level resting-state neural dynamics, which are known to be relatively stable and consistent within individuals, such as individual alpha frequency (e.g., Grandy et al., 2013; Wiesman & Wilson, 2019; Gray & Emmanouil, 2020) and aperiodic neural dynamics (Demuru and Fraschini, 2020; Pathania et al., 2021; Euler et al., 2024), and to examine whether these stable neural characteristics predict behavioral measures indexing temporal perception. Accordingly, the present study was designed to address how stable individual differences in resting-state neural dynamics shape temporal performance, rather than within-task neural fluctuations during the temporal task. We agree that combining resting-state and task-related EEG would be a valuable direction for future work, but this lies beyond the scope of the current dataset, as EEG was not recorded during task performance. Furthermore, we agree with the Reviewers that some of the wording in the Discussion can be clarified to emphasize the trait-level, rather than trial-level, nature of the task and potential interpretations.

      Additionally, we agree that the relationship between eyes-open (EO) and eyes-closed (EC) resting-state EEG, and their differential associations with behavior, warrants further discussion. In our data, EO resting-state activity emerged as a stronger predictor of behavioral performance than EC. Conceptually, resting-state EO and EC should not be considered interchangeable measures of the same underlying neural activity, but rather as related yet distinct brain states, with overlapping neural generators expressed under different state constraints. EC is typically associated with stronger posterior alpha activity and a more internally oriented mode, whereas EO reflects a more visually engaged and vigilant state, closer to the conditions under which perceptual judgments are formed. This may explain why, in our findings, brain–behavior associations are more evident in EO, consistent with the greater similarity between the EO condition and the task context. In this sense, EO may emphasize exteroceptive processing and visual readiness, whereas EC reflects a more internally oriented configuration. This difference in functional weighting could account for the stronger behavioral correlations observed in EO in the present study. The distinction between these resting states has been emphasized in previous EEG and neuroimaging work showing differences in power, topography, and large-scale network organization (e.g., Marx et al., 2004). Additionally, these state-related differences may reflect physiological changes related to sensory processing (El Boustani et al., 2009) and arousal (Lendner et al., 2020). Accordingly, the present dissociation may arise because EO provides a resting-state measure that is more proximal to the sensory and excitability conditions engaged during task performance (for similar findings, see also Deodato and Melcher, 2024). However, we agree with the reviewers that further clarification of these state-related differences is warranted. In the revised manuscript, we will (i) expand the Discussion to more clearly articulate the conceptual distinction between EO and EC and their expected links to perceptual and confidence measures, (ii) systematically describe EO–EC differences across all EEG measures analyzed, and (iii) quantify the relationship between EO and EC indices to directly assess the extent to which they share trait-like variance across individuals.

      In the revised manuscript, we will clarify these points by adjusting the text, strengthening the conceptual framing, and expanding the Discussion, including a more detailed outline of future research directions.

      (2) Functional interpretation of psychometric measures

      The Reviewers raised an important point regarding the interpretation of the psychometric parameters investigated in our study. In particular, we agree that the slope of a binary psychometric function does not provide a direct measure of sensory temporal resolution or perceptual sensitivity, and that our original wording may have overstated this interpretation. Rather, the slope reflects the steepness of the transition between response categories and indexes overall behavioural variability, which can arise from multiple sources, including variability in sensory encoding, decision criteria, and occasional response errors (e.g., Wichmann and Hill 2001; Prins 2012).

      We therefore agree that interpreting steeper slopes as necessarily reflecting “temporal precision” may be overly specific, and that there are other possible interpretations. In the revised manuscript, we will adopt more cautious terminology and describe the slope more generally as indexing behavioral variability in the transition between perceptual reports, which may reflect a combination of sensory and decisional factors. Importantly, our results demonstrate robust relationships between neural measures and the consistency or sharpness of perceptual categorization, rather than uniquely isolating sensory temporal resolution. While, in standard psychophysical frameworks, the slope is related to internal variability in the sensory representation, this relationship depends on model assumptions and does not uniquely isolate sensory precision (e.g., Prins, 2016). Following the reviewers’ suggestion, we will also refine our psychometric modeling by incorporating a lapse parameter. We agree with the Reviewer that accounting for occasional stimulus-independent errors (e.g., lapses) can improve parameter estimation and prevent biases in slope and threshold estimates when lapse rates are implicitly fixed to zero (Wichmann & Hill, 2001). In the revised manuscript, we will therefore (i) clarify the terminology used to describe psychometric parameters and (ii) report additional analyses including lapse rates.

      In addition, we agree that complementary modeling approaches could help disentangle perceptual and decisional contributions to the observed effects by providing access to latent parameters of perceptual decision-making. For example, within a signal detection framework, one could test whether EEG measures relate to perceptual sensitivity versus decision criterion, while sequential sampling models such as the diffusion model (e.g., Ratcliff and McKoon, 2008) could assess whether neural measures are associated with parameters such as drift rate, decision boundary, starting bias, or trial-to-trial variability. However, several characteristics of the present paradigm limit the direct applicability of these approaches. First, the task relies on a continuous manipulation of sensory evidence across stimulus durations (ISIs), and behavioral responses are summarized through psychometric functions rather than modeled at the single-trial level. As a result, the current framework does not provide direct access to trial-by-trial latent decision variables required by these models. Second, reaction times were not collected, which constrains the application of sequential sampling models that rely on joint modeling of accuracy and response times. Finally, while the task involves categorical judgments (integration vs. segregation), it does not include explicit signal-absent or catch trials, which can help constrain sensitivity and criterion estimates within classical signal detection formulations. Despite these limitations, we agree that these approaches could still provide useful insights. In the revised manuscript, we will explore whether alternative modeling approaches (e.g., signal detection-based metrics or Bayesian psychometric modeling) can help further characterize the contributions of perceptual sensitivity, decision criterion, and response variability to our behavioral measures. While these analyses will necessarily remain exploratory given the structure of the current dataset, they may provide initial insights into whether the observed effects reflect perceptual or decisional dynamics. A more definitive dissociation, however, is beyond the scope of the present study and will be an important direction for future work.

      (3) Control analyses and robustness of EEG–behavior relationships

      The Reviewers raised interesting points regarding the interpretation of our control analyses and the potential influence of stimulus structure on the observed EEG–behavior relationships. We agree that these aspects require clarification and additional analyses to strengthen the robustness of our findings.

      First, regarding the control analyses across frequency bands, we acknowledge that while our main analyses appropriately dissociate oscillatory and aperiodic components using spectral parameterization, the control analyses were based on conventional band-power measures. As correctly noted by the reviewers, band-limited power estimates can be influenced by the aperiodic background, which complicates the interpretation of null effects in the other frequency bands. In the revised manuscript, we will address this issue by extending our spectral parameterization approach to these control analyses. Specifically, we will recompute band-specific measures after removing the aperiodic component, allowing a clearer comparison across frequency bands and a more robust assessment of the specificity of alpha-related effects. Preliminary analyses suggest that these updated results are likely to be consistent with our initial findings, thereby reinforcing the robustness of the reported effects.

      Another important point raised by the reviewers concerns the temporal structure of the stimulus stream. We agree that the continuous alternation of Gabor stimuli at varying durations introduces quasi-periodic stimulation rates that may induce entrainment of neural oscillations. Notably, some inter-stimulus intervals correspond to frequencies within the alpha range, which raises the possibility that the observed relationship between resting alpha frequency and integration thresholds may not solely reflect intrinsic sampling speed, but could also be influenced by the degree of alignment between an individual’s alpha rhythm and the temporal structure of the stimulus. As highlighted in prior work (e.g., Gulbinaite et al., 2017; Keitel et al., 2019; Gallina et al., 2023; Duecker et al., 2024), rhythmic stimulation in the alpha range can interact with intrinsic alpha oscillations and modulate both neural and perceptual processing. Although our study does not include EEG recordings during task performance and therefore cannot directly assess stimulus-locked responses or neural entrainment, we agree that this factor should be explicitly considered in the interpretation of our findings. To address this point, in the revised manuscript we will perform additional control analyses to assess the robustness of the observed relationships while accounting for potential rhythmic stimulation confounds. Specifically, we will explore whether the strength of behavioral effects and their relationship with EEG measures depends on the alignment between each participant’s individual alpha frequency and the effective stimulation rate induced by the stimulus presentation. In addition, we will test whether the association between resting-state alpha frequency and behavioral measures is disproportionately driven by stimulus durations corresponding to alpha-range temporal frequencies. These analyses will help determine whether the observed effects primarily reflect intrinsic sampling properties or are modulated by resonance-like interactions between endogenous rhythms and stimulus timing. We will also address all additional recommendations raised by the reviewers in the revised manuscript.

      References

      Demuru, M., & Fraschini, M. (2020). EEG fingerprinting: Subject-specific signature based on the aperiodic component of power spectrum. Computers in Biology and Medicine, 120, 103748.

      Deodato, M., & Melcher, D. (2024). Correlations between visual temporal resolution and individual alpha peak frequency: Evidence that internal and measurement noise drive null findings. Journal of Cognitive Neuroscience, 36(4), 590-601.

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B., Sivarao, D. V., & Zoefel, B. (2024). Challenges and Approaches in the Study of Neural Entrainment. Journal of Neuroscience, 44(40).

      El Boustani, S., Marre, O., Béhuret, S., Baudot, P., Yger, P., Bal, T., ... & Frégnac, Y. (2009). Network-state modulation of power-law frequency-scaling in visual cortical neurons. PLoS computational biology, 5(9), e1000519.

      Euler, M. J., Vehar, J. V., Guevara, J. E., Geiger, A. R., Deboeck, P. R., & Lohse, K. R. (2024). Associations between the resting EEG aperiodic slope and broad domains of cognitive ability. Psychophysiology, 61(6), e14543.

      Gallina, J., Marsicano, G., Romei, V., & Bertini, C. (2023). Electrophysiological and Behavioral Effects of Alpha-Band Sensory Entrainment: Neural Mechanisms and Clinical Applications. Biomedicines, 11(5), 1399.

      Grandy, T. H., Werkle‐Bergner, M., Chicherio, C., Schmiedek, F., Lövdén, M., & Lindenberger, U. (2013). Peak individual alpha frequency qualifies as a stable neurophysiological trait marker in healthy younger and older adults. Psychophysiology, 50(6), 570-582.

      Gray, M. J., & Emmanouil, T. A. (2020). Individual alpha frequency increases during a task but is unchanged by alpha‐band flicker. Psychophysiology, 57(2), e13480.

      Gulbinaite, R., Van Viegen, T., Wieling, M., Cohen, M. X., & VanRullen, R. (2017). Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention. Journal of Neuroscience, 37(42), 10173-10184.

      Keitel, C., Keitel, A., Benwell, C. S., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119-3129.

      Lendner, J. D., Helfrich, R. F., Mander, B. A., Romundstad, L., Lin, J. J., Walker, M. P., ... & Knight, R. T. (2020). An electrophysiological marker of arousal level in humans. elife, 9, e55092.

      Marx, E., Deutschländer, A., Stephan, T., Dieterich, M., Wiesmann, M., & Brandt, T. (2004). Eyes open and eyes closed as rest conditions: impact on brain activation patterns. Neuroimage, 21(4), 1818-1824.

      Pathania, A., Euler, M. J., Clark, M., Cowan, R. L., Duff, K., & Lohse, K. R. (2022). Resting EEG spectral slopes are associated with age-related differences in information processing speed. Biological Psychology, 168, 108261.

      Prins, N. (2012). The psychometric function: The lapse rate revisited. Journal of Vision, 12(6), 25-25.

      Prins, N. (2016). Psychophysics: a practical introduction. Academic Press.

      Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural computation, 20(4), 873-922.

      Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & psychophysics, 63(8), 1293-1313.

      Wiesman, A. I., & Wilson, T. W. (2019). Alpha frequency entrainment reduces the effect of visual distractors. Journal of cognitive neuroscience, 31(9), 1392-1403.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study presents convincing evidence that uncovers a novel signaling axis impacting the post-mating response in females of the brown planthopper. The findings open several avenues for testing the molecular and neurobiological mechanisms of mating behavior in insects, although broad concerns remain about the relevance of some claims.

      Thank you very much for your letter and the insightful, valuable comments from the reviewers on our manuscript. These suggestions have been instrumental in strengthening the quality and clarity of our work. We have carefully addressed each concern, performed additional experiments, revised the relevant sections thoroughly, and made extensive refinements to the Discussion to clarify future research directions. Below is our detailed point-by-point response.

      Public Reviews:

      Reviewer #1 (Public review):

      In this work, Zhang et al, through a series of well-designed experiments, present a comprehensive study exploring the roles of the neuropeptide Corazonin (CRZ) and its receptor in controlling the female post-mating response (PMR) in the brown planthopper (BPH) Nilaparvata lugen and Drosophila melanogaster. Through a series of behavioural assays, micro-injections, gene knockdowns, Crispr/Cas gene editing, and immunostaining, the authors show that both CRZ and CrzR play a vital role in the female post-mating response, with impaired expression of either leading to quicker female remating and reduced ovulation in BPH. Notably, the authors find that this signaling is entirely endogenous in BPH females, with immunostaining of male accessory glands (MAGs) showing no evidence of CRZ expression. Further, the authors demonstrate that while CRZ is not expressed in the MAGs, BPH males with Crz knocked out show transcriptional dysregulation of several seminal fluid proteins and functionally link this dysregulation to an impaired PMR in BPH. In relation, the authors also find that in CrzR mutants, the injection of neither MAG extracts nor maccessin peptide triggered the PMR in BPH females. Finally, the authors extend this study to D. melanogaster, albeit on a more limited scale, and show that CRZ plays a vital role in maintaining PMR in D. melanogaster females with impaired CRZ signaling, once again leading to quicker female remating and reduced ovulation. The authors must be commended for their expansive set of complementary experiments. The manuscript is also generally well written. Given the seemingly conserved nature of CRZ, this work is a significant addition to the literature, opening several avenues for testing the molecular and neurobiological mechanisms in which CRZ triggers the PMR.

      However, there are some broad concerns/comments I had with this manuscript. The authors provide clear evidence that CRZ signaling plays a major role in the PMR of D. melanogaster, however, they provide no evidence that CRZ signaling is endogenous, as they did not check for expression in the MAGs of D. melanogaster males. Additionally, while the authors show that manipulating Crz in males leads to dysregulated seminal fluid expression and impaired PMR in BPH, the authors also find that CRZ injection in males in and of itself impairs PMR in BPH. The authors do not really address what this seemingly contradictory result could mean. While a lot of the figures have replicate numbers, the authors do not factor in replicate as an effect into their models, which they ideally should do. Finally, while the discussion is generally well-written, it lacks a broader conclusion about the wider implications of this study and what future work building on this could look like.

      Thank you very much for your insightful and valuable comments on our manuscript. We have carefully addressed each of your concerns, revised the relevant sections thoroughly, and conducted additional experiments to further strengthen our conclusions. To better focus on the core finding of this study, the critical role of Crz/CrzR signaling in regulating the post-mating response (PMR) of female brown planthoppers (BPH), and to eliminate potential confusion associated with the male-related data, we have removed the experiments investigating CRZ function in males from the current version of the manuscript. These observations on male CRZ signaling will be explored in greater depth and presented as a standalone study in a separate manuscript in the future.

      Reviewer #2 (Public review):

      Summary:

      The work presented by Zhang and coauthors in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques that orthogonally demonstrate the involvement of corazonin signalling in regulating the female post-mating response in these species.

      They first injected synthetic corazonin peptide into female brown planthoppers, showing altered mating receptivity in virgin females and a higher number of eggs laid after mating. The role of corazonin in controlling these post-mating traits has been further validated by knocking down the expression of the corazonin gene by RNA interference and through CRISPR-Cas9 mutagenesis of the gene. Further proof of the importance of corazonin signalling in regulating the female post-mating response has been achieved by knocking down the expression or mutagenizing the gene coding for the corazonin receptor.

      Similar results have been obtained in the fruit fly Drosophila melanogaster, suggesting that corazonin signalling is involved in controlling the female post-mating response in multiple insect species.

      Notably, the authors also show that corazonin controls gene expression in the male accessory glands and that disruption of this pathway in males compromises their ability to elicit normal post-mating responses in their mates.

      Strengths:

      The study of the signalling pathways controlling the female post-mating response in insects other than Drosophila is scarce, and this limits the ability of biologists to draw conclusions about the evolution of the post-mating response in female insects. This is particularly relevant in the context of understanding how sexual conflict might work at the molecular and genetic levels, and how, ultimately, speciation might occur at this level. Furthermore, the study of the post-mating response could have practical implications, as it can lead to the development of control techniques, such as sterilization agents.

      The study, therefore, expands the knowledge of one of the signalling pathways that control the female post-mating response, the corazonin neuropeptide. This pathway is involved in controlling the post-mating response in both Nilaparvata lugens (the brown planthopper) and Drosophila melanogaster, suggesting its involvement in multiple insect species.

      The study uses multiple molecular approaches to convincingly demonstrate that corazonin controls the female post-mating response.

      Thank you very much for your valuable and insightful comments on our manuscript. We highly appreciate your recognition of the study’s value, including its focus on non-model insects, the evolutionary implications of corazonin signaling, and the rigorous use of multiple molecular techniques. We have carefully addressed your suggestions and revised the manuscript accordingly to enhance its clarity, accuracy, and depth. Below is our detailed response to your comments.

      Weaknesses:

      The data supporting the main claims of the manuscript are solid and convincing. The statistical analysis of some of the data might be improved, particularly by tailoring the analysis to the type of data that has been collected.

      Thank you for your valuable suggestion regarding statistical analysis. We fully agree that tailoring statistical methods to the specific type of data enhances the rigor and reliability of our findings.

      In response, we have comprehensively re-evaluated and revised the statistical analyses for all datasets in the manuscript:

      (1) For proportion-based data (e.g., female mating receptivity, re-mating rate), we replaced inappropriate tests (e.g., ANOVA) with chi-square tests for contingency tables, which are more suitable for comparing categorical variables.

      (2) For time-series data (e.g., receptivity at different time points post-injection), we adopted generalized linear models (GLM) with logit links followed by pairwise contrasts to address concerns of multiple testing, instead of hour-by-hour Mann-Whitney tests.

      (3) For continuous data (e.g., number of eggs laid, gene expression levels), we retained Student’s t-tests or one-way ANOVA after verifying normality, and used non-parametric tests (Mann-Whitney, Kruskal-Wallis) for non-normally distributed data.

      All revisions have been clearly described in the figure legends and Methods section, ensuring transparency and reproducibility. We believe these adjustments significantly improve the statistical robustness of our conclusions.

      In the case of the corazonin effect in females, all the data are coherent; in the case of CRISPR-Cas9-induced mutagenesis, the analysis of the behavioural trait in heterozygotes might have helped in understanding the haplosufficiency of the gene and would have further proved the authors' point.

      Thank you for this insightful suggestion. We fully agree that analyzing the behavioral traits of heterozygous mutants is crucial for understanding the haplosufficiency of the Crz and CrzR genes, and we regret overlooking this aspect in the initial submission.

      To address this gap, we have conducted additional behavioral assays using heterozygous Crz (+/ΔCrz) and CrzR (+/CrzR<sup>M</sup>) mutant females.

      (1) For re-mating receptivity: We found no significant differences in either re-mating rate or egg-laying output between +/ΔCrz females and wild-type females. By contrast, +/CrzR<sup>M</sup> females exhibited re-mating and oviposition phenotypes comparable to those of homozygous CrzR mutants, with no significant differences detected between these two genotypes.

      (2) These results indicate that the Crz loss-of-function phenotype is recessive, and that a single functional copy of Crz is sufficient to sustain a normal post-mating response (PMR), but the CrzR loss-of-function phenotype is dominant, and that a single functional copy of CrzR is insufficient to maintain a normal post-mating response.

      This supports our core conclusion that CRZ signaling is critical for mediating the female PMR, as even partial reduction of gene dosage impairs the response.

      The heterozygote data have been integrated into the revised manuscript, including updated figures (e.g., Figure 1J-K for Crz heterozygotes and Figure 3I-J for CrzR heterozygotes) and corresponding legends. We believe this addition strengthens the rigor of our genetic evidence and provides valuable insights into the gene dosage requirements for CRZ-mediated PMR regulation.

      Less consistency was achieved in males (Figure 5): the authors show that injection of CRZ and RNAi of crz, or mutant crz, has the same effect on male fitness. However, the CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, yet they have the same effect. A comment about this discrepancy would have improved the clarity of the manuscript, pointing to new points that need to be clarified and opening new scientific discussion.

      Thank you for highlighting this important discrepancy in the male-related CRZ signaling data. We fully acknowledge the inconsistency: CRZ injection (which was intended to activate the pathway) and Crz RNAi/mutagenesis (which was intended to inhibit the pathway) yielded similar effects on male fitness, and we regret not addressing this ambiguity in the initial submission.

      To resolve this confusion and refocus the current manuscript on its core objective—elucidating the role of endogenous CRZ/CrzR signaling in female post-mating response (PMR), we have removed all experiments, analyses, and discussions related to male CRZ function. This decision ensures that the manuscript maintains a clear, cohesive narrative centered on female reproductive physiology, as recommended by both reviewers and the editorial team.

      Regarding the observed discrepancy in males, we recognize its scientific significance and plan to investigate it thoroughly in a standalone follow-up study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The manuscript would be significantly strengthened by an explanation of the seemingly contradictory results obtained in males, where both CRZ injections and Crz silencing afford the same results. Additionally, Crz expression data in the MAGs of D. melanogaster males is necessary to support your conclusions of endogenous signaling in this species. Besides correcting several imprecisions and inconsistencies in the text and figures, to improve quality and accuracy, the abstract should be restructured and the discussion modified as recommended by reviewers.

      Thank you for your comprehensive letter and valuable guidance. We have carefully addressed all the points raised by the editorial team and reviewers, and the revised manuscript now incorporates substantial improvements to clarity, accuracy, and scientific rigor. Below is our detailed response to your specific requests:

      Contradictory Male-Related Results

      We fully acknowledge the importance of addressing the contradictory findings in male CRZ signaling, where both CRZ injection and Crz silencing/mutagenesis yielded similar effects on male fitness. To resolve this ambiguity and maintain the manuscript’s focus on its core objective, elucidating endogenous CRZ/CrzR signaling in the female post-mating response (PMR), we have removed all male-related experiments, analyses, and discussions from the revised manuscript. This decision ensures that the current work remains cohesive and centered on female reproductive physiology, as recommended by the reviewers.

      We recognize the scientific significance of the male-specific discrepancy and plan to investigate it in a standalone follow-up study in the near future.

      Crz expression data in D. melanogaster Male Accessory Glands (MAGs)

      To support our conclusion of endogenous CRZ signaling in D. melanogaster females, we have supplemented the manuscript with additional experiments verifying the absence of CRZ in male MAGs:

      (1) RT-PCR Analysis: We detected no Crz mRNA in dissected male MAGs, whereas Crz expression was confirmed in the male head (positive control).

      (2) Immunohistochemistry and GAL4 system: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label CRZ-producing neurons, combined with anti-CRZ antibody staining, we observed no CRZ-specific signal in male MAGs.

      These results demonstrate that D. melanogaster male MAGs neither synthesize nor contain CRZ peptide, confirming that CRZ acts as an endogenous female signaling factor (rather than a male-transferred seminal fluid component) in this species. The new data are included in Figure 5H-I and described in the Results and Methods sections.

      Correction of Imprecisions and Inconsistencies

      We have systematically revised the manuscript to address text and figure inaccuracies:

      Text Revisions: Corrected typos (e.g., Line 854), standardized species names (replacing “Drosophila” with “D. melanogaster” throughout), removed redundant or inappropriate sentences, and refined terminology (e.g., replacing “expression” with “localization” for protein detection).

      Figure Corrections: Fixed inconsistent Y-axis labels and numerical ranges (e.g., aligning percentages/probabilities with appropriate scales), resolved color scheme confusion, standardized oviposition-related labels to “Per female egg numbers within 3 days,” and added details on sample sizes and replicates to all figure legends.

      Statistical Improvements: Re-evaluated statistical analyses for proportion-based datasets (applying chi-square tests for contingency tables) and time-series data (using generalized linear models to address multiple testing), with revised methods clearly described in the text and figure legends.

      Abstract Restructuring and Discussion Modification

      Abstract: We have restructured the abstract to group results thematically (rather than sequentially) for improved readability. The revised abstract emphasizes the core findings: CRZ/CrzR signaling is critical for female PMR in both N. lugens and D. melanogaster, acts endogenously in females, and is required for male seminal fluid factors to induce PMR. Male-related content has been removed since experimental data are deleted from the rest of the paper.

      Discussion: We have modified the discussion to include the evolutionary conservation of CRZ-mediated female PMR, the molecular and neurobiological implications of CRZ/CrzR signaling, and future research directions (e.g., dissecting downstream pathways in the female reproductive tract and brain). We have also reduced tangential content and clarified how our findings advance understanding of female endogenous signaling in PMR regulation. A new section was added at the end, which discusses outstanding questions related to CRZ and the PMR in both insect species.

      To both the above-mentioned sections and the Introduction we also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      All revisions in the manuscript are highlighted in red for easy reference. We believe these changes significantly strengthen the study’s focus, clarity, and scientific impact. Thank you again for your time and consideration.

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract could benefit from some restructuring. Right now, it reads like a sequential reporting of the results, but clumping together results thematically would make it easier to read, in my opinion. Also, see above re: my concerns about no evidence for the signal being endogenous in D. melanogaster.

      Thank you for your constructive suggestions regarding the abstract and the evidence for endogenous CRZ signaling in D. melanogaster. We fully agree with your feedback and have addressed both points thoroughly in the revised manuscript:

      (1) Abstract Restructuring

      We have restructured the abstract to group results thematically, rather than sequentially, to enhance readability and highlight the core findings. The revised abstract now organizes key information into three cohesive sections:

      The context and significance of female post-mating response (PMR) regulation, emphasizing the gap in understanding endogenous female signaling pathways.

      The core findings across both study species (Nilaparvata lugens and D. melanogaster), including the critical role of CRZ/CrzR signaling in suppressing re-mating and promoting oviposition, and its requirement for male seminal fluid factors to induce a PMR.

      The conclusion regarding the evolutionary conservation of endogenous CRZ signaling in female PMR, reinforcing the study’s broader implications.

      We also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      This thematic structure eliminates the linear “result-by-result” narrative, making the abstract more concise and impactful while clearly communicating the study’s key contributions.

      (2) Evidence for Endogenous CRZ Signaling in female D. melanogaster

      To address your concern about the lack of evidence for endogenous signaling in female D. melanogaster, we have supplemented the manuscript with two sets of critical experiments confirming that CRZ is not derived from male accessory glands (MAGs) but acts endogenously in females:

      RT-PCR Analysis: We performed RT-PCR on dissected male MAGs, male heads (positive control), and female tissues. Results showed no detectable Crz mRNA in MAGs, confirming that males do not synthesize CRZ in this tissue.

      Immunohistochemical and Genetic Labeling: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label Crz-expressing neurons, combined with anti-CRZ antibody labeling, we observed no crz/CRZ signal in male MAGs. This confirms that MAGs neither produce nor sequester mature CRZ peptide.

      These findings demonstrate that CRZ signaling in D. melanogaster females is endogenous, as the peptide cannot be transferred from males during copulation. The new data are presented in Figure 5H-I and described in the Results section, with corresponding methods detailed in the Methods section.

      The revised abstract integrates this new evidence to explicitly state the endogenous nature of CRZ signaling in both BPH and D. melanogaster females, aligning with the thematic structure and addressing your concerns comprehensively. We believe these changes significantly improve the clarity and rigor of the abstract and the manuscript overall.

      (2) The authors use Drosophila as a broad placeholder throughout the manuscript, while they are specifically referring to D. melanogaster in several places. I would go through the manuscript and switch with the appropriate Drosophila species/species'.

      Thank you for pointing out this important detail regarding species-specific terminology. We fully agree with your suggestion to ensure accuracy and consistency in referencing the Drosophila species studied.

      We have systematically reviewed the entire manuscript, including the abstract, introduction, results, discussion, methods, and figure legends, and revised all instances where the general term “Drosophila” was used. All references now explicitly specify “D. melanogaster” to accurately reflect the species utilized in our experiments.

      (3) For the figures, I think the number of replicates is a distracting addition to the plot. This is still useful information, but could instead be added in as a line/table, in my opinion.

      Thank you very much for your suggestion. We have added the information on the number of replicates and sample sizes to the corresponding figure legends, which we hope improves clarity and readability.

      (4) There are typos in the y-axis label of all of the oviposition figures. A better re-wording would be "Per female egg numbers within 3 days".

      Thank you very much for your suggestion. Following your recommendation, we have now standardized the Y-axis label for all oviposition-related figures to “Number of eggs per female within 3 days.”

      (5) In Figure 1B and Figure 1 - Supplement 3a, since the comparisons are solely between control vs treatment, I would not join means across treatments that I am not comparing.

      To address this, we have revised Figure 1B and Figure 1—Supplement 3a by removing the connecting lines between group means. The updated figures now display independent mean ± SEM values for each dose (Figure 1B) and time point (Figure 1—Supplement 3a), with significance markers only applied to the control vs. treatment comparisons we actually tested. This revision eliminates any implied relationships between non-comparative groups and ensures the data visualization aligns with our statistical approach. We appreciate the reviewer’s suggestion, which has improved the clarity of the data presentation.

      (6) The authors mention courtship rate in lines 511, but from a look at the methods, this is not the courtship rate! This is a measure of the number of males engaging in any form of courtship. Also, in Figure 5 Supplement 2A, it appears that under 1% of males are courting. This seems extremely low. Do the authors mean percentages? In that case, I would reformat from 0 to 100/relabel the y-axis.

      Thank you for your observation and valuable feedback on this terminology and figure presentation issue. We fully acknowledge the inaccuracies and have addressed them comprehensively:

      (1) Correction of "Courtship Rate" Terminology

      We agree that the term “courtship rate” in Line 511 was incorrect, as our measurement reflects the proportion of males engaging in any form of courtship (not a rate per unit time). However, since we have removed all male-related data (including this section and associated figures) from the revised manuscript to focus on the core finding of female post-mating response (PMR), this terminology error has been eliminated entirely.

      (2) Revision of Figure 5 Supplement 2A

      Consistent with the removal of all male-related experiments, Figure 5 and its supplementary materials (including Supplement 2A) have been excluded from the revised manuscript. This ensures the current work remains cohesive and centered on female PMR, while also resolving the Y-axis labeling ambiguity you identified.

      We appreciate your careful attention to these details, which helps enhance the accuracy and clarity.

      (7) It appears Figure 5A, 5D, and 5G are mislabeled? Aren't all rematings with wild-type males?

      Thank you for identifying this labeling inconsistency. You are absolutely correct, all re-mating assays in the original figures involved wild-type males, and the mislabeling was an oversight.

      However, we have removed Figure 5 (and its associated subpanels A, D, G) entirely from the revised manuscript, as part of our decision to exclude all male-related data.

      (8) I am not sure I understand why a 30-minute post-injection threshold was chosen and what this table means. Could the authors elaborate on the methodology here on how they quantified premature ejaculation?

      Thank you for your question regarding the 30-minute post-injection observation window and the methodology for quantifying premature ejaculation.

      While we have removed all male-related data (including the corresponding table and premature ejaculation analyses) from the revised manuscript to focus on our core finding, this is no longer included in the manuscript.

      (9) Line 29 - "distensible" seems an odd choice of word here.

      We have revised Line 29 and removed “distensible”. “Peptide injection and knockdown of CRZ expression by RNAi or CRISPR/Cas9-mediated mutagenesis demonstrate that CRZ signaling suppresses mating receptivity”.

      (10) Line 57 - delete "a" from "a post-mating response" and "A PMR" because the authors are referring to a very specific suite of post-mating behaviours.

      We have revised Line 57 (and other relevant instances throughout the manuscript) to delete the article "a" from these phrases.

      (11) Line 352, delete a from "and in a significantly".

      We have revised Line 356 to remove the extraneous "a", correcting the phrase to "and in significantly".

      Reviewer #2 (Recommendations for the authors):

      The work presented in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques, including dsRNA injection to induce RNA interference and CRISPR-CAS9-mediated site-specific mutagenesis. The experimental design is appropriate; the results are solid and support the conclusion of the manuscript. Overall, the merit of the manuscript is to present compelling evidence that the female post-mating response is mediated by corazonin, at least in the analysed species. There are multiple reports in multiple insect species, indeed, that male factors, particularly those secreted by male accessory glands, induce post-mating response in females, but the female pathways underlying this phenomenon are poorly understood.

      There are points the authors can consider to improve the manuscript quality.

      Thank you for your generous and insightful assessment of our manuscript. We deeply appreciate your recognition of the study’s strengths, including the appropriate experimental design, solid results, and meaningful contribution to understanding female endogenous pathways in post-mating response (PMR) regulation.

      We have carefully incorporated all your constructive suggestions (e.g., statistical analysis revisions, figure label standardization, text refinements) to further strengthen the manuscript’s rigor and clarity. By focusing on corazonin (CRZ/corazonin receptor (CrzR) signaling in female brown planthoppers (Nilaparvata lugens) and validating these findings in Drosophila melanogaster, we aim to provide a conserved model for female endogenous PMR regulation across insect species.

      Thank you again for your thoughtful and supportive feedback, which has been instrumental in refining our work. We believe the revised manuscript now more effectively communicates the significance of CRZ-mediated female signaling in bridging the gap between male-derived cues and PMR execution.

      (1) Line 20: "optimal offspring". This is not a zoological parameter. One can use "optimal fitness".

      We have revised Line 20 to replace "optimal offspring" with "optimal fitness" as recommended.

      (2) Line 36-40: I think that the main message of the manuscript is the involvement of the corazonin pathway in controlling the female post-mating response. The involvement of corazonin in the male reproduction is also of note, but out of topic (in my opinion). The male corazonin is not transferred during mating from males to females, and the involvement of corazonin in controlling the gene expression in the MAGs is of note, but it is poorly related to the effect of corazonin in the female. I am not suggesting removing these data from the paper; they are important. But I do not find them that important to include them in the abstract, also because it confounds the reader at first. A similar statement can be made for the discussion (lines 728-745): making this the first piece of data commented on takes the stage, but this is not the main take-home message of the paper.

      Thank you for this suggestion. We fully agree that including male-related CRZ data in the abstract and leading the discussion with these results distracted from the primary focus and risked confounding readers. In fact, we also removed the entire section on the role of CRZ in males. We have addressed this issue comprehensively in the revised manuscript as follows:

      (1) Abstract Revision

      We have completely removed all content related to male CRZ function from the revised abstract. The updated abstract now exclusively emphasizes the core findings:

      The requirement of CRZ/CrzR signaling for mediating key female PMR traits (suppression of remating, promotion of oviposition) in both Nilaparvata lugens and Drosophila melanogaster;

      Experimental evidence confirming that CRZ acts as an endogenous female signaling factor (not a male-transferred molecule);

      The evolutionary conservation of CRZ-mediated female PMR regulation across the two insect species.

      We also added a comment on the evolutionary conservation of CRZ and GnRH signaling in reproduction.

      (2) Discussion Section Restructuring

      We have restructured the Discussion to prioritize the core message of female PMR regulation:

      Lead paragraph adjustment: Lines 728–745 (originally focusing on male CRZ and MAG gene expression) have been deleted.

      Revised opening focus: The Discussion now only contain a synthesis of our key findings on female CRZ signaling, including its molecular mechanisms, cross-species conservation, and implications for understanding endogenous female pathways downstream of male seminal fluid cues.

      We appreciate your suggestions for the narrative focus of the manuscript.

      (3) Line 49: "Reproductive behavior is critical for population sustenance and survival of the species": I find this intro a little teleological evolutionary speaking, and I am not totally sure that this has ever been demonstrated as a concept. I would skip it, simply saying "Reproductive behavior in insects is influenced...".

      Following your suggestion, we have revised Line 49 to streamline the introduction and avoid “teleological language”. The updated sentence now reads: "Reproductive behavior in insects is influenced by a complex interplay of neural, hormonal, and environmental factors."

      (4) Line 58: "A PMR has been documented across diverse insect taxa, including Drosophila melanogaster, Anopheles gambiae, Aedes aegypti, and the brown planthopper (BPH), Nilaparvata lugens". There are many other insect species for which PMR has been shown: crickets, fruit flies, grasshoppers, etc. Therefore, I would say "for example" to underline that it is not a complete list. Being an incomplete list, I suggest that the authors pay attention to the cited literature: the literature cited in the case of Anopheles gambiae demonstrates the synthesis of hormones in the MAGs, but it has nothing to do with PMR; there is nothing cited for Aedes aegypti, even if the authors named the species.

      Thank you for this constructive feedback on the framing of PMR studies across insect taxa and the accuracy of our cited literature. We fully agree with your suggestions and have addressed these issues comprehensively in the revised manuscript:

      (1) Revision of the Sentence Structure

      We have modified Line 58 to explicitly indicate that the listed species are examples rather than a complete inventory of insects with documented PMR. The revised sentence reads:

      "The PMR has been documented across diverse insect taxa, for example, Drosophila melanogasterAnopheles gambiaeAedes aegypti, crickets (Gryllodes sigillatus), grasshoppers (Dichromorpha viridis), and the brown planthopper (BPH)Nilaparvata lugens"

      (2) Correction of Literature Citations

      We have thoroughly reviewed the citations associated with the listed species to ensure they directly support the role of PMR:

      For Anopheles gambiae: We have replaced the previously cited study (focused on MAG hormone synthesis) with two relevant references that explicitly characterize PMR traits—including mating-induced oviposition stimulation and remating suppression—in this mosquito species.

      For Aedes aegypti: We have added two newly published studies that document key PMR phenotypes (e.g., post-mating refractoriness and altered feeding behavior) and their underlying molecular mechanisms in this species.

      For crickets (Gryllodes sigillatus): We added a newly published study that documents PMR phenotypes in Gryllodes sigillatus.

      We have also verified that the citations for D. melanogaster and N. lugens remain directly relevant to PMR regulation, with no adjustments needed.

      All revised citations are properly formatted and integrated into the text, with corresponding updates to the reference list.

      (5) Line 111-132: I find this redundant: it is a long summary of the methods and the results. I do not think it is needed here, but I think the authors should point to the main message of their data.

      Thank you for pointing out the redundancy of Lines 111–132. We fully agree that this section, disrupted the flow of the introduction of our study.

      To address this, we have completely removed Lines 111–132 from the revised manuscript. In place of this redundant content, we have added a concise, focused paragraph that emphasizes the central hypothesis and key objective of our work: specifically, to identify the endogenous female signaling pathways that mediate the post-mating response (PMR) downstream of male-derived cues, and to validate the conserved role of corazonin (CRZ) signaling in this process across Nilaparvata lugens and Drosophila melanogaster.

      (6) Line 156: This sentence is not needed here.

      We have deleted the sentence in Line 156 from the revised manuscript.

      (7) Figure 1E, J supplementary 3A: The label of the Y axis is the percentage of the mating females (expected 0-100%), but the numbers show the fraction (0-1). On the contrary, in Figure 1 Supplement 4, the label says "probability of survival" and the probability goes from 0 to 1, while the number of the axis goes from 0 to 100 (percentage).

      Thank you very much for pointing out these inconsistencies. We have carefully reviewed all Y-axis labels and corresponding numerical ranges throughout the manuscript and corrected the mismatched axes.

      (8) Figure1B, C, F, K supp 2, 3A: I found this use of colours confounding. Why did the authors use the light blue for sCRZ, but the mean and SE are shown in pink, which is the colour for CRZ? Furthermore, it is not reported anywhere how many individuals have been used per replicate. There is the total number of insects, the number of replicates, but there is no indication about the minimum number of insects per replicate in this and many other subsequent experiments.

      Thank you for identifying these critical inconsistencies in figure color coding and missing details on sample allocation per replicate, and we greatly appreciate your meticulous review of our data presentation.

      We have addressed these issues in the revised manuscript as follows:

      (1) Standardization of Color Coding

      We apologize for the confusing color mismatch between group labels and data points in Figure 1B, C, F, K, and Supplements 2 and 3A. We have unified the color scheme across some figures to ensure consistency:

      The sCRZ (control) group is now consistently represented by light blue for both labels and mean ± SE data points.

      The CRZ (treatment) group is now consistently represented by pink for both labels and mean ± SE data points.

      For Figures 1C, F, K and Supplementary Figure 2, we were concerned that the mean and s.e.m. bars might be visually obscured by the data points. To improve their visibility, we therefore used the opposite color to display the mean and s.e.m.

      All figure legends have been cross-checked and updated to reflect this standardized color coding.

      (2) Addition of Sample Size per Replicate

      We acknowledge that the lack of information on the minimum number of insects per replicate was a key gap in our experimental reporting. We have supplemented this critical detail in this way:

      Figure Legends: For Figure 1B, C, F, K, and Supplements 2 and 3A (as well as all subsequent experiments), we have added explicit statements specifying the minimum number of insects per replicate, alongside the total sample size and number of replicates (e.g., “n = 3 replicates, with a minimum of 10 females per replicate; total N = 35 females”). All revised figures and their corresponding legends have been integrated into the updated manuscript, and we have cross-checked all other figures to avoid similar issues.

      (9) Figure 1C, F, K, Supplementary Figure 3B: Y axis labels - "Eggs numbers of per female...". I suggest changing it to "Number of eggs per female...".

      We have revised the Y-axis labels for Figure 1C, F, K and Supplementary Figure 3B to Number of eggs per female...” as recommended. Additionally, we cross-checked all other oviposition-related figures in the manuscript to ensure uniform use of this standardized label, eliminating any inconsistent phrasing across the dataset.

      (10) Legend Figure 1B: Mann Whitney test. How did the authors perform the test? Hour by hour? I am not sure this is the best way to analyse the data, because it is a case of multiple testing. Probably a linear model or a glm might be a better fit.

      Thank you very much for pointing out this issue. In Figure 1B, each concentration group was analyzed using data from independent individuals, and therefore the comparisons do not involve repeated measures across time; for this reason, we consider the Mann–Whitney test appropriate for this dataset. For Figure 1—Supplement 3A, however, our original analysis compared treatment and control groups hour by hour, which indeed raises concerns regarding multiple testing. Following your suggestion, we have removed the potentially misleading connecting lines and reanalyzed the dataset using a generalized linear model (GLM). The updated figure and revised legend have been included in the revised manuscript.

      (11) Legend Figure 1E: ANOVA test. These are proportions, not continuous variables of the samples. Tests for proportions might be a better fit (chi-square, etc.).

      To address this issue, we have re-analyzed the proportional data in Figure 1E using Pearson’s chi-square test of independence, which directly evaluates the association between treatment group (sCRZ vs. CRZ) and the binary mating status (mated vs. unmated) of females. This test is statistically robust for proportional data and avoids the assumptions of normality and homogeneity of variances required for ANOVA.

      (12) Knockout experiments: I agree with the authors that the data are strong enough to sustain the conclusions. However, is the corazonin knockout haplosufficient or is it recessive? What is the behaviour of the heterozygotes?

      Thank you for this insightful question regarding the genetic basis of the corazonin (CRZ) knockout phenotype.

      To address your query, we have supplemented experiments with additional phenotypic analyses of heterozygous CRZ knockout females (+/ΔCrz), and we clarify the genetic nature of the knockout as follows:

      (1) Genetic basis of the CRZ knockout:

      The CRZ knockout line was generated via CRISPR-Cas9-mediated deletion of the Crz coding region, resulting in a recessive loss-of-function mutation. Homozygous knockout females (ΔCrz) exhibited the full phenotypic suite reported in the manuscript (impaired post-mating suppression of remating, reduced oviposition rate, and disrupted CRZ signaling in the reproductive tract).

      (2) Phenotype of heterozygous females:

      Behavioral and physiological assays of +/ΔCrz heterozygotes revealed no significant differences compared to wild-type (+/ΔCrz) females across all measured post-mating traits. Specifically:

      Remating rates of +/ΔCrz females were indistinguishable from wild-type controls at 48 h post-mating.

      Oviposition output of +/ΔCrz females matched wild-type levels over a 3-day assay period.

      (3) Updates to the manuscript:

      We have added these heterozygote data as figure1J and K in the revised manuscript, with corresponding descriptions in the Results and Methods sections. We have also explicitly noted the recessive nature of the Crz mutation in the Genetic Manipulation subsection, ensuring clarity for readers.

      These results confirm that the Crz knockout phenotype is fully recessive and that one functional copy of the Crz gene is sufficient to maintain normal post-mating responses—supporting our conclusion that CRZ signaling is required for mediating female PMR.

      We thank you again for raising this important point, which has strengthened the genetic rigor of our study.

      (13) Figure 1, Supplementary 1: I do not understand why the authors point out the fact that these are Protostomia. These are all Arthropoda, there is not a single species outside this Phylum. Caerostris darvini should be Caerostris darwini.

      Thank you for this feedback regarding Figure 1 and Supplementary Figure 1. We fully agree and have addressed these issues in the revised manuscript:

      (1) Removal of the "Protostomia" designation

      We have deleted all references to Protostomia from the figure legends and associated text.

      (2) Spelling correction of Caerostris darwini

      We apologize for the typographical error in the species epithet. We have corrected the misspelling Caerostris darvini to the taxonomically accurate Caerostris darwini (Darwin's bark spider) across all instances in Figure 1, Supplementary Figure 1, and their corresponding legends. We have also cross-checked all other species names in the manuscript to eliminate similar typographical errors.

      (14) Line 299: CRZ expression: I found this confounding, given that the authors were talking about the expression of the gene. I would use the term localization, referring to the protein/peptide (is it what the authors were pointing at?).

      To resolve this ambiguity, we have revised Line 299 to replace CRZ expression with CRZ peptide localization, which accurately describes the experimental focus (immunofluorescence staining and confocal imaging of the CRZ protein). We have also cross-checked the entire manuscript to standardize this terminology:

      We use Crz gene expression exclusively when referring to transcriptional analyses (e.g., qRT-PCR results).

      We use CRZ peptide localization when describing the spatial distribution of the protein (e.g., immunostaining assays).

      (15) Figure 2C: The expression is relative to...? I would make it explicit on the axis.

      Thank you for this helpful comment. We apologize that the normalization reference was not sufficiently clear in the original version. In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference genes Actin and 18SrRNA, and then expressed relative to the mean expression level of the tissue showing the highest Crz expression, which was set to 1. We have clarified this information in the figure legend and the Methods section.

      We have revised Figure 2C as follows:

      Updated the Y-axis label to explicitly state the reference: “Relative Crz gene expression”.

      Added a supplementary note in the figure legend to confirm that relative expression values were calculated using the 2<sup>⁻ΔΔCt</sup> method, with the reference gene serving as the internal control for normalization.

      Additionally, we have cross-checked all other qRT-PCR-related figures in the manuscript to ensure that the reference for relative expression is clearly indicated on the corresponding axes, standardizing this key detail across all gene expression datasets.

      (16) Figures 3B, E, I, L, M, N: Percentage and proportions, as in Figure 1; furthermore, please provide the minimum number of individuals per replicate. Furthermore, as in Figure 1, the data are proportions, and I would use statistical tests that are studied for this kind of data.

      Thank you for this helpful suggestion. We have reviewed and corrected the Y-axis labels and corresponding numerical ranges in these figures, and we have added the number of replicates and the minimum number of individuals per replicate to the figure legends. In addition, following your recommendation, we have reanalyzed these proportion data using chi-square tests for contingency tables.

      (17) Figure 3: As in Figure 1, it would be interesting to know which is the behaviour of the heterozygotes.

      Thank you for suggesting to complement the data in Figure 3 with heterozygote phenotypic analyses.

      To address this, we have conducted additional behavioral and physiological assays of heterozygous CrzR knockout females (+/CrzR<sup>M</sup>) and integrated these data into the revised Figure 3 and its legend:

      Phenotypic characterization of heterozygotes: Across all traits measured in Figure 3 (e.g., remating rate and oviposition efficiency,), +/CrzR<sup>M</sup> females exhibited no significant differences compared to homozygotes.

      This confirms that the CrzR knockout phenotype is dominant and that one functional copy of the CrzR gene can’t to maintain normal post-mating response (PMR).

      Manuscript updates:

      We added heterozygote data in Figure 3I and J. Accordingly, we updated the Results text to reflect the revised panel labeling.

      We supplemented the figure legend with statistical comparisons between heterozygotes and wild-type groups (using chi-square tests for proportional data).

      We included a brief description of heterozygote phenotypes in the Results section to contextualize the genetic basis of the CrzR-mediated PMR regulation.

      (18) Figure 3 Supplement 1: Can the authors indicate which model for maximum likelihood they chose? Did they perform a pre-test to assess which substitution model was the best for their data?

      Thank you for this critical question regarding the model selection for maximum likelihood (ML) phylogenetic analysis in Figure 3 Supplement 1. We fully agree that specifying the substitution model and validation process is essential for ensuring the reproducibility and rigor of phylogenetic inferences.

      To address this, we have supplemented the manuscript with detailed information on the model selection and validation steps, as follows:

      (1) Substitution model selection

      Prior to constructing the ML tree, we performed a model selection pre-test using the ModelFinder tool integrated in IQ-TREE 2, which evaluates the fit of candidate nucleotide substitution models to the CrzR amino sequence alignment via the Bayesian Information Criterion (BIC). The model selection procedure identified the LG+G model as the best-fit substitution model for our dataset. This model uses the Le and Gascuel (LG) amino-acid substitution matrix and incorporates a gamma-distributed rate variation among sites (G) to account for among-site rate heterogeneity.

      (2) Manuscript updates

      We have added this detailed model selection process and the final LG + G model specification to the legend of Figure 3 Supplement 1.

      We have also included information on bootstrap validation (10000 ultrafast bootstrap replicates) to support the node support values reported in the phylogenetic tree.

      (19) Figure 4 Supplement 1: I would be explicit about what it is relative to (which gene).

      Thank you for this helpful comment, In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference gene Actin, and then expressed relative to the mean expression level of the tissue showing the highest CrzR expression, which was set to 1. This normalization strategy provides a robust and biologically representative reference. We have clarified this information in the figure legend and the Methods section.

      (20) Line 518 and Line 525 and Figure 5: The authors show that injection of CRZ and RNAi of crz or mutant crz has the same effect on male fitness. How do the authors explain this contradiction? The CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, but nevertheless, they have the same effect. I would probably test the expression of some of the genes whose expression is altered in crz mutant males (next paragraph) to see if an altered CRZ signalling pathway (both ways) might affect gene expression in the MAGs in the same way.

      Thank you for raising this important point. As explained above, we have removed all data related to CRZ function in male BPHs from the current version.

      (21) Figure 5, Figure 7: As in Figures 1 and 3, please pay attention to the percentages and proportions and the statistical tests.

      Thank you for pointing out these issues. We have carefully reviewed and corrected the percentage/proportion labeling in the relevant figures, including the Y-axis descriptions and numerical ranges, as well as revised the corresponding figure legends. In addition, we have reanalyzed the data using statistical tests appropriate for proportion data. All corresponding revisions have been incorporated into the updated manuscript.

      (22) Line 728-745: As already stated for the abstract, the male effect of crz is, to me, a side product, and I am not sure the male crz signalling has something to do with the female crz signalling. It is interesting, nobody showed that CRZ affects expression in the MAGs, but this is not the main message of the paper, and it confuses the reader. I would reduce the discussion about this aspect and move it to the end, but this is my own take.

      We have removed all data related to CRZ function in males for the reasons outlined above.

      (23) Material and methods/results: as a general suggestion, I would be explicit about the timing of receptivity inhibition in the species. I've seen the authors have established this in precedent work, and I would refer to that work and make the reader aware of how the receptivity works in the species (i.e., that it is not permanent and lasts for a few days after first mating). This allows a better understanding of the experimental design.

      Thank you for this valuable and constructive suggestion. We fully agree that explicitly describing the timing of receptivity inhibition in Nilaparvata lugens, and linking it to our earlier work, will strengthen the rigor and clarity of the manuscript.

      To address this, we have revised the Materials and Methods and Results sections as follows:

      (1) Materials and Methods (Experimental Design subsection)

      We have added a dedicated paragraph that explicitly defines the temporal dynamics of post-mating receptivity inhibition in N. lugens, with direct reference to our prior work[1]. The text clarifies:

      “In N. lugens, mating induces a transient suppression of female receptivity that is not permanent. Females typically start regain remating willingness 72 h after the first mating, as documented in our previous study[1]. This temporal window guided the design of our remating assays, in which females were paired with naive males at 48 h post-initial mating to capture both the suppressed and recovered phases of receptivity.”

      (2) Results (Post-mating Receptivity section)

      We have incorporated a brief contextual sentence at the start of the section to reinforce this key species-specific trait, ensuring that readers connect our assay timings to the temporal dynamics of receptivity in N. lugens.

      These revisions ensure that the rationale behind our experimental timing is transparent and well-supported, allowing readers to fully grasp how our assays were tailored to the biological characteristics of N. lugens.

      (24) Line 854: There is a typo "CRZ peptide. virgin female", the dot should be a comma.

      We have revised Line 854 to correct the punctuation: the dot has been replaced with a comma, resulting in the phrasing "CRZ peptide, virgin female". In addition, we have changed the wording in this sentence to ensure scientific rigor and to avoid colloquial expressions.

      (1) Zhang, Y.J., Zhang, N., Bu, R.T., Nässel, D.R., Gao, C.F., and Wu, S.F. (2025). A novel male accessory gland peptide reduces female post-mating receptivity in the brown planthopper. Plos Genet 21, e1011699. 10.1371/journal.pgen.1011699.

  2. Mar 2026
    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

      We thank the editors and reviewers for their expert assessment of our findings and their interest in our conceptual framework. Below we respond to the specific reviewer and editor comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-PhenologySwitch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and postsolstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Thank you for your generous description of our study and the manuscript.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees.

      The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      We thank the reviewer for pointing out that we could improve our explanation of the different responses to July and August cooling in experiment 1. Whilst we incorporated this in the conceptual model and the figure caption (Fig. 1b), we now also address this topic in more depth in the discussion section, focussing on daylength and photosynthetic assimilation as the possible mediators of this change in responses (L350-371).

      For the early-season development effect vs the late-season temperature effect we can use the leaf-out day-of-year (as a proxy for development), and the summer cooling treatments (direct temperature effect) to assess the relative importance of these two components of our model. We have now included a variance partitioning analysis following this logic, see L246-252 for methods, L278-281 for results.

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      This question may reflect a misunderstanding regarding the light availability that we hope to address with improved clarification. The duration and intensity of the lighting in these experiments was always set to reflect the average conditions experienced in Zurich for those respective times of the year. Day length in spring is shorter than it is in summer, so the durations were simply adjusted to reflect this reality. The 13-hour, 4,300 lux conditions in experiment 1 were only for the April-May period, when we reduced developmental rates for the late-leafing trees (L125-129). In July, the photoperiod was set to 16 hours and light intensity was approximately 7,300 lux (L150-154). This is equitable to experiment 2–when treatments were applied in June and July–where photoperiod was 16 hours and light intensity approximately 6,900 lux (L206-207). These conditions reflect the average daylengths in Zurich, and the maximum light intensity output by the chambers.

      As mentioned in our initial author response, we do not think small differences in soil moisture levels should influence our conclusions. All pots were watered sufficiently to avoid water deficit, and all efforts were made to minimise differences in water availability. A Tukey honest significant difference test showed that only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate, difference = 6%, p < 0.05) had significantly different soil water content, a pair whose responses are not compared. We have added words to this effect in the figure legend of Fig. S1.

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      We agree that photoperiod likely plays a central role. Our conceptual model (Fig. 1) explicitly incorporates photoperiod as the framework within which temperature responses are regulated (L72-75, L627-629 & L638-641). The Solstice-as-Phenology-Switch hypothesis assumes that the annual progression of daylength sets the physiological “window” for trees’ responsiveness to temperature. Our experiments therefore focused on how temperature responses differ before versus after the solstice, while recognising that this reversal is likely enabled by the photoperiod signal. In other words, photoperiod provides the regulatory backdrop, and our results identify how diel and seasonal temperature cues are interpreted within that photoperiodic framework.

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”; L409-411) and explicitly call for follow-up studies across species and forest contexts (L413–414). At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and ground-based phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.

      Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Thank you for the kind comments. We appreciate your concerns regarding the severity of our treatments and the generalisability of our results, and you can find our detailed responses below.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants (L129-L133). We have added text in the Methods to clarify this aim (L129-131 & L156-161).

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods (L146-148) and Discussion (L345-346).

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions (L141145).

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that including more data on photosynthetic assimilation would be valuable for interpreting phenological responses. Indeed, it was our intention to collect this information. However, unfortunately, we experienced technical challenges with the equipment available to us during the experimental period, which prevented us from collecting a full dataset. Nevertheless, we were able to obtain measurements during pre-solstice cooling (now presented as Fig. S12, including data for all treatments), which show that cooling treatments strongly reduced assimilation rates compared to controls. Importantly, these strong reductions occurred across all cooling treatments, yet their phenological outcomes differed markedly, demonstrating that assimilation alone cannot explain the observed responses. As we discuss, our findings are consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1 (comment 4), our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously (L409411) and highlight the need for further research across species (L413–414).

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker (L179-195 for methods, L296-311 for results). On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, bud set occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech” (L1-2).

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. As mentioned above (see Reviewer 1 comment 3), photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of season timing?

      We interpret this concern as relating to the flexibility in reversal timing that we observed. Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21. Rather the hypothesis implies that reversal occurs around the solstice, when photoperiod cues cause tree individuals to shift from accelerating to decelerating their seasonal development. Our conceptual model (Fig. 1) explicitly incorporates this flexibility by showing how the timing of the reversal depends on developmental speed: Individuals that develop more slowly (or leaf out later) cross the compensatory point later in the summer, whereas fast developing individuals reach it earlier.

      Our experiments support this framework: pre-solstice full-day cooling delayed bud set, whereas post-solstice full-day cooling advanced it, with differences between early- and late-developing individuals consistent with the model. Moreover, the contrasting impacts of daytime vs. night time cooling demonstrate how diel conditions can further shape when the reversal is expressed. Thus, rather than contradicting the Solstice-as-Phenology-Switch hypothesis, our findings reinforce it and extend it by showing how flexibility arises from interactions between developmental progression, diel temperature responses, and photoperiod.

      We have added an additional section in the Discussion that elaborates on how our results support the Solstice-as-Phenology-Switch hypothesis (L416-432).

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      (1) The current strength of evidence is incomplete. Extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses could make the conclusions more solid.

      We agree with the vast majority of the reviewer comments and have made the relevant edits. We believe that these have dramatically improved the clarity of the manuscript. The revised analyses have not changed our conclusions, though we have toned down generalisations.

      (2) The Solstice as Switch hypothesis is about the effect of temperature warming. However, the two experiments did not simulate warming but rather cooling. Although a temperature difference can be obtained compared to the control in both cases, the impacts on plant physiology and phenology should still be different between the two scenarios.

      Thank you for raising this point, which requires clearer communication in our manuscript. The Solstice-as-Phenology-Switch hypothesis posits that changes in temperature before and after the summer solstice have opposite effects on the autumn phenology of northern forest trees. While the hypothesis has most often been framed in terms of warming, the underlying mechanism concerns whether development is accelerated or slowed relative to ambient conditions. In essence, we are exploring the effect of changes in temperature – not warming per se. In warmer springs, development begins earlier and/or proceeds faster, while in colder springs the opposite occurs; the same logic applies to post-solstice conditions. We have extended our explanation in the Introduction (L69-71).

      In our experiments, we applied cooling to create strong contrasts in developmental rates without damaging the trees. These treatments allow us to test the direction of phenological responses relative to ambient conditions. Thus, although we used cooling rather than warming, the results are directly informative for the Solstice-as Switch framework, which concerns the relative effect of temperature changes rather than the absolute direction of manipulation.

      (3) The number of groups for bud type and summer temperature treatment is too small to be used as a random effect; it would be more appropriate to treat them as fixed-effect terms.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9, see L271) and inferences are not altered. We also report the bud type effects for experiment 1 (L262-266) and experiment 2 (L292-293)

      (4) Please add more clarifications for Figure 4 about what this figure is for and how you derived this figure, whether the data were from your experiments or others.

      We have rewritten the caption for Figure 6 (Fig. 4 in the previous manuscript) to clarify where the data came from and how the figure was generated (L687-693). This figure serves as a visual guide to aid the understanding of the processes that may govern the patterns we have observed. Figure 6a uses data from previous studies on diel patterns in F. sylvatica, specifically growth (Zweifel et al., 2021) and photosynthetic assimilation rates (Urban et al., 2014). To aid visualisation, we linearly interpolated between measurements points, converted the values to a relative percentage (compared to observed maximum), and then smoothed the resulting curves. Based on the evidence from experiment 2, we suggest there may be a temperature threshold below which overwintering responses (e.g. bud set) are induced in F. sylvatica. Figure 6b depicts a theoretical diel pattern of this potential threshold. In simple terms, the threshold must be lower at night because nights are typically colder than days.

      Reviewer #2 (Recommendations for the authors):

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect, so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      See point (3) in reviewing editor’s recommendations for the authors.

      (2) Could the authors move the methods earlier and remind readers of them in the results?

      We have addressed this issue, please see detailed response under reviewer 2’s concerns.

      Urban O, Klem K, Holišová P, Šigut L, Šprtová M, Teslová-Navrátilová P, Zitová M, Špunda V, Marek MV, Grace J. 2014. Impact of elevated CO2 concentration on dynamics of leaf photosynthesis in Fagus sylvatica is modulated by sky conditions. Environmental Pollution 185: 271–280.

      Zweifel R, Sterck F, Braun S, Buchmann N, Eugster W, Gessler A, Häni M, Peters RL, Walthert L, Wilhelm M, et al. 2021. Why trees grow at night. New Phytologist 231: 2174–2185.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Naim et al. use genetically engineered mouse models and tissue culture cell lines to investigate the role of the SLAP adaptor protein in colonic epithelium and colon tumour formation. The SLAP adaptor protein is known to be a negative regulator of tyrosine kinase signaling in hematopoietic cells, but its role outside the immune system is less well defined. Here, the authors use genetically engineered SLAP-deficient mice, tissue-specific SLAP KO, and colonic organoids to demonstrate that SLAP is expressed in cells of the colonic epithelium, where it acts as a cell-autonomous regulator of proliferation and differentiation. In addition, they provide biochemical evidence that loss of SLAP expression in cultured colonic organoids results in increased Src family kinase activity and global tyrosine phosphorylation, consistent with its known role as a suppressor of tyrosine kinase activity in immune cells. Consistently, treatment with an SRC kinase inhibitor inhibited the growth of SLAP-deficient organoids. These data provide solid evidence of a cell-autonomous role of SLAP in the colonic epithelium.

      This work would be improved by further description and interpretation of the SLAP expression pattern shown in the constitutive and tissue-specific KO to further support the conclusions made. In Supplementary Figure 1, magnification of the colon epithelium areas with SLAP expression shown by b-gal and anti-SLAP staining, highlighting regions of interest, would better support the conclusions regarding SLAP expression in specific regions of the colon epithelium. In Supplementary Figure 1B, the authors should indicate that the SLAP staining referred to is epithelial and in resident immune cells, as is mentioned in the text. Also, magnification of the boxed area of LRG5 staining in Figure 1 would improve this figure.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that a more detailed description and visualization of SLAP expression in the colonic epithelium would strengthen our conclusions. In response, we will revise Fig 1 and S1 to better highlight SLAP expression patterns. Specifically, we will include higher-magnification images of the colonic epithelial regions in Suppl Fig 1, with clearly indicated regions of interest. We will also clarify in the legend of Suppl Figure 1B that SLAP staining is observed in both epithelial and resident immune cells, as described in the text. Additionally, we will provide a magnified view of the boxed area showing LGR5 staining in Figure 1 to improve clarity.

      Using a chemically induced model of colitis-associated cancer, the authors demonstrate that inactivation of SLAP shows a trend toward increased tumor formation (though this did not reach significance) as well as increased Src family kinase activity within tumors. Tumor spheres from SLAP-deficient animals showed enhanced growth that was suppressed by treatment with a Src family kinase inhibitor. Of note, the latter effect was specific to SLAP-deficient tumor spheres. These observations are convincing and support the authors' conclusion that SLAP has a tumor suppressor role in CRC through inhibition of SFK signaling.

      Mechanistically, elevated expression of the RTK, EphB2, was detected in immunoblots of SLAP KO colonic crypts, while overexpression of SLAP in CRC cell lines downregulated EphB2 protein levels. Using an EPHB2 inhibitor, the role of EPHB2 in the growth of SLAP-deficient colonic organoids was demonstrated. While these data generally support the authors' conclusion that SLAP limits colonic organoid growth by downregulating RTKS such as EphB2 and downstream Src family kinase activity, they do not show which cell types/regions in the colonic epithelium have increased EPHB2 protein and how this relates to SLAP and phospho-SRC expression, as shown in Figure 1 and Figure S1 immunocytochemistry. The expression of EphB2 and its role in colonic tumorsphere growth were not investigated.

      Overall, this work provides evidence of SLAP adaptor function in restricting tyrosine kinase signaling in the colonic epithelium, and suggests that loss of SLAP expression could promote tumorigenesis in this context.

      We also thank the reviewer for their positive comments regarding our tumor studies and the role of SLAP in regulating SFK signaling.

      Regarding the mechanistic insights involving EphB2, we appreciate the reviewer’s suggestion to further define its spatial expression and relationship with SLAP and phospho-SRC. To address this, we plan to extend our analysis to assess the effect of Slap depletion on EphB2 protein levels throughout the intestinal epithelium.

      We recognize that directly testing EphB2’s role in murine colonic tumorsphere formation would require a new cohort of SLAP knockout mice treated with AOM/DSS for 90 days, which is not feasible in the short term. To address this, we will instead use human colorectal cancer models to assess how SLAP modulation affects the response of tumoroids derived from cell lines to EphB2 inhibition, providing complementary mechanistic insights.

      Overall, we believe these additions will strengthen the manuscript and more fully address the reviewer’s concerns.

      Reviewer #2 (Public review):

      Summary:

      Protein tyrosine kinases are subject to diverse regulatory mechanisms controlling their activity in normal situations. The authors previously identified SLAP (Src-like adaptor protein), a negative regulator of receptor tyrosine kinase (RTK) signaling, as a key suppressor of the cytoplasmic tyrosine kinase SRC in the normal colon and demonstrated that SLAP is downregulated in a majority of colorectal cancers (CRCs).

      In this study, the authors further explored SLAP functions in mouse models using constitutive and inducible epithelial-specific Slap deletion (villin-CreERT2 model). They found that loss of SLAP augments colonic epithelial cell proliferation and that induction of tumorigenesis by the AOM/DSS protocol mimicking CRC leads to more aggressive tumors in the absence of SLAP. This effect is apparently cell-autonomous as growth of normal and tumoral colonic organoids is SLAP-dependent in in vitro settings. Finally, the authors define that, in colon, SLAP represses EphB2, an RTK lying upstream of SRC, and show that inhibitors of EphB2 can partially limit tumorigenic development in vitro.

      Strengths:

      The manuscript is clearly and concisely written, making it easy to follow. The data obtained in the mouse models are very convincing.

      Weaknesses:

      Direct evidence that EphB2 is activated/phosphorylated in the absence of SLAP is lacking, as conclusions are only based on results obtained with inhibitors. Some other issues have to be addressed before acceptance, in particular, the relevance of the findings in CRC patients.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that our conclusions regarding the SLAP–EphB2–SRC signaling axis rely in part on pharmacological inhibition. As outlined in the manuscript, EphB2 was selected primarily as a proof-of-concept receptor to illustrate how SLAP may indirectly regulate SRC activity through modulation of upstream receptor tyrosine kinases. We note that the use of two distinct classes of EphB inhibitors supports the robustness of our observations.

      To further strengthen this aspect of the study, we will assess EphB2 phosphorylation status in SLAP-deficient conditions, which will provide more direct evidence of its activation state and its contribution to SRC signaling.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      (1) Analysis of signaling traces

      Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      We think the benefits of modeled signaling level are the conceptual accuracy to the extent possible with the data. It’s true that the assumptions brought-in may cause certain biases. We perform this and the simplest (raw data averaging, Fig.2). Intermediate results in between (such as the first derivative in Fig.3C) may correlate well or less well, but cannot be interpreted biologically.

      Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence." Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      Yes the segmentations measure intensity in a fixed volume inside a cell, therefore it’s a spatial average (concentration) and is susceptible to cell volume changes. This has been noted in the revision. The raw measurement does fluctuate and can decrease, we think the short-time-scale fluctuations are likely measurement variations/errors rather than underlying big changes in concentration.

      Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      Yes we agree. Unfortunately we don’t have the quantitative data required to better estimate Kaede mRNA stability. The timing of Cyc inhibition to the ceasing of ptch mRNA production is roughly estimated but not necessarily precise in this context.

      (2) Assignment of fates and correlations

      Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      This is a very insightful point. We did examine the posterior data again (cross-checked by 2 co-authors) to make sure the mixed situation has correct cell fate assignment. As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully. The heterogeneity argument is based on the verified tracking and final positioning of these cells.

      Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      We agree. Due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to enrich the tracks for this revision. We are aware of upcoming, independent studies with many more systematic tracks and analysis which will address these concerns. We have added the caveats the reviewer raised.

      Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      Thanks for these suggestions. We are limited by the measurement noise, coverage window of the traces and the number of tracks to make use of the full dynamics in a more informative manner.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here. We added this point to make our presentation more balanced.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      We’d refer readers to our earlier study Xiong et al., 2013 where ptch2:kaede, nkx2:gfp and olig2:gfp were plotted against position over time in single cell tracks. It was found that position was not a good predictor of signaling levels or cell fates at early stages when the cell fates were specified.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that. However, signaling dynamics is not necessarily a good function of position or time either, there is no evidence for that in our results here. The 83% correlation is thus striking for the posterior progenitors indicating a certain robust logic in the GRN to capture a strong (even short-lived) response to Shh, regardless of position or time. This is an interest possibility (we do not claim it a mechanism as we have not tested it with perturbations) that challenges the prevailing view in the field that these progenitors integrate Shh exposure over time, or that they acquire positional information by reading a gradient.

      The discussion has been modified to be more nuanced about these points.

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

      We quite agree. Together with the reviewer, we look forward to seeing the publication of some recent, independent progresses overcoming the challenges in our work by other colleagues.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Minor comments:

      y-axis label suddenly changes to Ptch2-reporter level in Figure 5. Is what is plotted different from what is seen as examples in Figure 3?

      Thanks! Figure 5 tracks are as Figure 3B, this has been annotated in the figure legends.

      There are random bounding boxes in some of the figures.

      Sometimes the m in "More dorsal" is stylized with a capital M and sometimes not. It is somewhat confusing as a name for cell types but it is fine if no alternative can be found.

      This study unfortunately does not include markers that distinguish the interneurons dorsal to pMNs. We categorized them collectively as “more dorsal”.

      Response-time is defined as "the amount of time with an above-basal Shh response". This seems to me as the definition of response duration. I would assume that response-time, means the time it takes until a response is first observed. Please consider changing this.

      We did not use “duration” because a response time course recorded in these tracks may include multiple durations (on and off). The duration of exposure/response has been specifically used in the field as a single period of response. So it’s a sum of active responding time here. Clarified in the text.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors address several possible setbacks of transforming the measured fluorescence intensity of the patched reporter into a readout of the Shh signaling activity over time, however, one aspect that isn't directly addressed is the potential effect of differences in the z position of analyzed cells. These could, at least in principle, be sufficient to introduce significant noise in the fluorescence measurements. Can the authors subset their datasets by initial, as well as average, z position and then re-examine the measured trends for both Shh activity and the intensity of the cell fate reporters used in the study?

      The zebrafish early neural plate/tube has a small thickness in z in dorsal-ventral imaging and the tissue is transparent. The depth-associated scattering contributes very little, if at all to the fluorescent signals in the imaged time window. This can be seen in the nuclear/membrane signal of the movies, which is largely uniform across the tissue in z in the neural tissue. It can also be seen that the notochord cells, further ventral, appears to be dimmer.

      (2) It is critical for the validity of this study that the intensity of the patched reporter introduced by the authors in 2012, and used again in this study, faithfully represents the signaling activity of Shh. In this study, the authors provide measurements of the transcriptional rate of Kaede and additional modeling for this purpose. However, an important point is to determine how sensitive is the reporter to changes in Shh signaling of different magnitudes?

      We consider this BAC reporter line a good (probably still the best live reporter) one as it resolves the endogenous gradient up to the dorsal interneuron domains (Huang et al., 2012, Xiong et al., 2013) and responds well to perturbations (Notch, Cyclopamine, etc). But it’s true that we don’t have information of how sensitive it responds to changes of different magnitude. As far as we know, there is no in vivo, single cell information of how Shh targets respond to signaling of different magnitudes.

      (3) To strengthen the previous point, it would be nice to extend the analysis in Figure 2, at least partially, using other readouts for Shh activity (e.g. GBS-GFP)?

      We have used a GBS-RFP line previously and found it to be lower resolution in terms of showing the DV gradient, compared to ptch2:kaede.

      (4) It is unclear to me what is the relevant time window during which cells respond to Shh in the anterior versus posterior domains to determine progenitor specification. This is a concern to me, since: i) the average heterogeneity of Shh activity seems to increase strongly in time (Figure 2A/C); and ii) it is important to exclude that the finding of heterogeneous relationship between Shh activity and fate choices is largely driven by later timepoints, where potentially its activity is no longer relevant for cell fate specification. Can this point be clarified when this data is introduced in the manuscript and further discussed?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that.

      (i) The ptch2:kaede reporter variability is higher in terms of magnitude (the signal gets brighter) in later times but the heterogeneity (overlap between difference cell fate groups) is lower in later times

      (ii) Similarly, the heterogenous relationship is more pronounced in early time points. Since we do not know exactly when the activity becomes no longer relevant (from our earlier studies we do think that the cells become specified early, when Shh signaling is noisy), we modelled the response profile and searched for a good predictor. The maximum response stands out, particularly as a good indicator for the posterior cells, suggests an early window/time of specification.

      Discussion has been modified to clarify these points.

      (5) Is the response of the patched reporter, as well as cell fate reporters, to defined concentrations of exogenously provided Shh heterogeneous, for instance, in in vitro experiments?

      Well-controlled (e.g., microfluidics and labeled Shh molecules) in vitro experiments will be fantastic future directions. Existing tissue explant + Shh dose approaches do not resolve the heterogeneity of exposure at single cell level but may be helpful in testing the limits and variabilities at different magnitudes.

      (6) The source of noise in this system is not entirely clear to me. The authors seem to attribute the heterogeneity they observe to the way cells respond to Shh, but can it be excluded that the morphogen profile is itself noisy to start with? It is currently difficult to distinguish between these two possibilities, given that the Shh activity reporter used in this study is itself a transcriptional output of the pathway. Can the distribution of Shh itself be analyzed (even if in immunostainings) during neural tube formation?

      Yes we fully agree. More quantitative analysis may help dissecting the sources of noise. The morphogen profile (particularly through time) will be great. Currently no reagent is available to achieve that. Studies using an engineered morphogen or tagged morphogen suggest that the pattern through tissue reasonably captures simple diffusion dynamics. However, at single cell level considerable randomness may still remain and difficult to quantitatively compare with still staining.

      (7) It is unclear to me how the authors define the ultimate cell fate of cells in their analysis in Figure 6. The brief description in the methods and in the manuscript seems to suggest that, in combination with marker expression, the cell position is used as a criteria to assign the fate to the progenitors - if this is the case, I guess the observed relationship in Figure 6 with LMDV distance is almost a control? This could be clarified for the readers.

      Yes indeed Figure 6 is a control as LMDV distances lead to final positions which form part of our determination of cell fates.

      As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully.

      The methods of fate determination are described in detail in methods.

      (8) The graphs in Figures 6 and 7 are difficult to interpret. What proportion, and absolute number, of cells are "mis specified" when the authors show the distinct colored lines in the pMN, LFP or more dorsal domains? How do the authors determine where each cell fate domain begins and ends to access for "mis-specified" cells? Can the authors also provide the corresponding experimental images in the figure?

      We apologize for the difficulties to interpret these figures. The graphs are a ranked list of all cells using the specified metric. The visual is to help generate an intuition of how mixed vs clear-cut the pattern is given the tested metric. They are not to be interpreted as the actual pattern in the tissue and there are no data images that show these patterns.

      (9) Given the experimental limitations/technical challenges discussed by the authors during the paper, the score of around 90% of predictability of cell fate choices is rather high in the anterior domain, suggesting a minor functional role for heterogeneity in this region. Even for the posterior domain, the score of 83% predictability based on the maximum response to Shh is still relatively high. In my view, this author's conclusions should be adjusted to make this difference clearer in the abstract and discussion, highlighting that the heterogeneity between Shh response and cell fate choices, particularly in the pMN fate, are stronger in the posterior domain affecting the precision of cell fate decisions particularly in this region. Can the authors further comment on potential mechanisms driving this difference?

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here.

      We have added the fact that the Shh response is still the main determinant of the pattern despite the heterogeneity in the Discussion. We also further discussed possibilities of the anterior posterior differences.

      (10) Following up from the previous point, the data in Figure 7 suggests that there might be different underlying mechanisms in how anterior and posterior cells interpret the Shh profile, with anterior cells potentially responding to the integrated concentration of Shh (since response time, average response, or maximum response to Shh all provide similar predictability scores for cell fate choices). In contrast, only the maximum response to Shh can provide a good prediction of posterior cell fate, consistent with a more instantaneous response to morphogen concentration (and thus potentially more error-prone measurement of the Shh profile?). This is a very interesting observation in my view. Could this be further tested?

      Thank you. Yes we found this very interesting too. We discussed the possibilities, including the reviewer’s suggestion that these cells may have different contexts or strategy to interpret the signal. It is also possible that the anterior cells use the same strategy (maximum response at an early time) and the subsequent response/duration do not matter to their fate commitment. A precise approach to shut down Shh response dynamics in single cells (e.g., optogenetics) will enable the test of these ideas. We hope following up studies will take such approaches.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Conceptual framing and interpretation:

      The central conclusion may require more precise framing to avoid potential overreach. The authors' interpretation equating "physical distance between TAD boundaries" with overall "TAD boundary architecture," and "transcriptional bursting events" with broader "gene activity," could benefit from clarification. This framing may not fully capture the temporal dynamics of transcription or the regulatory complexity within TADs. Furthermore, the broad conclusion of an uncoupled relationship appears to challenge extensive prior evidence from perturbation studies showing that disrupting TAD boundaries can alter gene expression. The authors' own observation of reduced gene activity upon RAD21 degradation suggests that global TAD disruption can affect transcription. A more precise and limited conclusion, acknowledging that their data demonstrate a lack of detectable correlation between boundary distance and bursting activity in their system, would be more accurate and help reconcile these findings with the existing literature.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16 of our Discussion, a separate section on the limitations of the study, noting that our conclusions are limited to TAD boundary distances and do not reflect the structure of TAD boundaries or of TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      (2) Technical methods and data presentation:

      (2.1) Accuracy and dimensionality of distance measurements: The manuscript does not clearly state whether distances are measured in 2D or 3D, nor does it sufficiently address precision limits. The stated Z-step size (1 µm) may be inadequate for accurately measuring sub-micron chromatin distances in 3D.

      We state in both the Results and Methods that our data represent 2D distances derived from maximal-intensity projections of 3D image stacks. We previously published a detailed analysis of the precision of this measurement approach applied to chromatin interactions and documented the effect of 2D vs 3D analysis on these types of measurements. This study by Finn et al., 2022 is cited in the text. We also show in Figure S3 and mention on p. 6 and 10 that we observe similar results using either 2D or 3D analysis.

      (2.2) Probe design and systematic error: The genomic coverage size of the BAC probes used for DNA FISH is not explicitly stated. Large probe coverage could inherently blur the precise spatial location of adjacent DNA loci. The reported average distance (~300 nm) may be influenced by the physical size of the probes, as well as systematic expansion or distortion introduced by sample fixation and FISH processing. Although such technical limitations are currently unavoidable, the authors should clarify how these factors might affect their ability to detect subtle distance changes.

      The genomic location and size of all probes are provided in Supplementary Table 1. We deliberately use relatively large BAC probes both to generate robust, highly reproducible signals and to eliminate effects arising from local chromatin behavior. In line with earlier characterization of BAC probes (Finn et al., Cell, 2019; Finn et al., Methods, 2022), we find a strong correlation between micro-C/Hi_C interaction frequency and distance measurements. Systematic errors such as sample fixation and FISH processing have previously been evaluated by comparison to live cell data (see Finn et al., 2019) and found to be negligible, especially as all our analyses involve pairwise comparisons, which would both be similarly affected by systematic errors. We discuss resolution limits due to probe size in our new section on study limitations on p. 16.

      (2.3) Data Visualization: The manuscript would benefit from including representative, zoomed-in regions of interest from the raw imaging data. This would allow readers to visually assess measured distance differences against background noise.

      Raw images for inspection at any magnification are available at https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      (2.4) Potential impact of resolution limits: In Figure 5, the micro-C data reveal a clear difference in interaction patterns inside versus outside the VARS2 locus TAD, yet the imaging data show no corresponding distance difference. This strongly suggests that the current imaging system, limited by optical resolution, probe size, and localisation accuracy, may be unable to resolve finer-scale spatial reorganizations associated with specific chromatin conformations (e.g., enhancer-promoter loops). The authors should explicitly discuss that their conclusion of "no coupling observed" may be constrained by the resolution and sensitivity of their method and does not preclude the possibility of detecting such associations with higher-precision measurements or in live-cell dynamics.

      We generally see good agreement between micro-C/Hi-C data and distance measurements. Specifically, we consistently find closer proximity of boundaries than non-boundaries and larger boundary distances for larger TADs than for smaller ones, as presented throughout the study. Contrary to the reviewer’s statement, this is also true for the VARS2 TAD, where we find statistically significant shorter boundary distances for boundary probes (350 nm) vs the outside control region (390 nm), which correlates with the difference in micro-C interaction score of 5847 vs 2308. These data are shown in Figure 3. Regardless, we mention the issue of resolution due to probe size in the study limitation section on p. 16.

      Reviewer #2 (Public review):

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences). In addition, the correlation analysis is asymmetric, primarily stratifying by transcriptional status and then comparing boundary distances. Given the central claim that boundary architecture does not influence gene activity, the analysis should be done from the opposite perspective (stratifying by boundary distance).

      We mention the limitations on resolution of our approach in our discussion of study limitations on p. 16. An example of an analysis of stratifying by boundary distance is presented in Figure S3C. The conclusion is the same as stratifying by activity status.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting but is lacking from the discussion.

      We mention the stronger effect of cohesion depletion compared to CTCF loss on gene expression in multiple locations in the Results and Discussion.

      On a related note, this approach primarily tests the role of boundary interactions rather than domain organization as a whole, and it should be acknowledged that internal domain structures are not directly assessed.

      We have modified statements throughout the manuscript to clearly indicate that our conclusions relate to boundary interactions rather than domain organization as a whole. We also discuss this in our section on study limitations.

      The comparison to work in other organisms (particularly the comparisons made to Drosophila) should be handled with care. The mechanisms underlying domain formation differ substantially across these systems, particularly regarding the differences in CTCF's role.

      We have modified our discussion of the data on Drosophila TADs, particularly as it relates to CTCF.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I couldn't locate the image data from figshare with the information provided (DOI: 10.6084/m9.figshare.30728354)

      The link has been updated

      https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      Reviewer #2 (Recommendations for the authors):

      Some of the conclusions overreach. I recommend revising the claims and discussion to focus solely on the proximity of boundaries, instead of TADs themselves. This would match better with your experiments.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16, a separate section on limitations of our study, noting that our conclusions are limited to TAD boundary distances and do not reflect on the structure of the TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      I do disagree with the interpretation of the data in some parts, particularly at the end, where you state that disruption of TADs does not impact gene activity. For example, "Altogether, these results demonstrate that disruption of TAD boundary architecture is insufficient to alter gene expression" doesn't seem to match the results. Sure, depletion of CTCF minimally impacted gene expression, but it also minimally impacted the boundary distances. I think it is interesting that depletion of RAD21 had a bigger impact on both gene expression and boundary distances, and this should be discussed.

      We have deleted this statement and now mention on p. 13 that RAD21 depletion affected gene expression, whereas loss of CTCF did not, and on p. 15 that loss of RAD21 had a greater impact on boundary distances than loss of CTCF. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      Related to this, I also recommend expanding the discussion of prior live-cell imaging work (ref 32) that showed that the fully formed CTCF loop is a rare event.

      We have expanded the discussion of prior live-cell imaging work in several locations.

      All the analysis is done from the perspective of the gene expression (e.g. group by expression and then measure distances). It would help to show that the inverse analysis is consistent (e.g. group by distances and measure gene expression).

      Analysis of data stratified by distance measurements is shown in Figure S3C.

      The discussion of the Drosophila work is strange, given that CTCF in Drosophila has a very different N-terminus, explaining why it doesn't really form loops. Sure, maybe it contributes to domains in some way, but probably no more than the dozens of other architectural proteins that have been found in that system. This work clearly focuses on CTCF-loop domains, so I would be specific about that. In the introduction, you do a good job of saying "in human cells, TADs are.... marked by binding sites for the CTCF protein". However, then you overgeneralize and state that TADs form via a process of loop extrusion. I think a simple statement before this to say that TADs in human cells have become somewhat synonymous with CTCF loop domains, and that is how you will use the term here. However, other organisms have TADs despite the lack of conservation of the CTCF protein.

      We have modified the text accordingly.

      On a related note, in the discussion, you cite two papers in Drosophila to state that "TADs form prior to the establishment of cell-type-specific gene expression programs", but that's not entirely accurate for those papers. They actually show that TADs occur coincident with ZGA, but loops form before that (ref 23: Espinola et al), or that there are indeed a few boundaries that show up before ZGA, but these correspond to RNA Polymerase (ref 24: Ing-Simmons et al.).

      We have corrected this statement.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Although the reviewers agree on the potential importance of this study, they have brought out multiple pertinent queries with respect to the interpretation of some of the results presented in the manuscript, that the authors should consider addressing. The reviewers have also suggested modifications that would increase the clarity of the manuscript.

      We appreciate the thoughtful evaluation of our manuscript by the reviewers and the editor. We are encouraged by their recognition of the importance of our study and have carefully considered all the points raised. In response, we have added new data and revised the text to address the concerns and improve the clarity of the manuscript. Our detailed responses to the reviewers’ comments are provided below.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rosero and Bai examined how the well-known thermosensory neuron in C. elegans, AFD, regulates context-dependent locomotory behavior based on the tactile experience. Here they show that AFD uses discrete cGMP signaling molecules and independent of its dendritic sensory endings regulates this locomotory behavior. The authors also show here that AFD's connection to one of the hub interneurons, AIB, through gap junction/electrical synapses, is necessary and sufficient for the regulation of this context-dependent locomotion modulation.

      Strengths:

      This is an interesting paper showcasing how a sensory neuron in C. elegans can employ a distinct set of molecular strategies and different physical parts to regulate a completely distinct set of behaviors, which were not been shown to be regulated by AFD before. The experiments were well performed and the results are clear. However, there are some questions about the mechanism of this regulation. This reviewer thinks that the authors should address these concerns before the final published version of this manuscript.

      Weaknesses:

      (1) The authors argued about the role of prior exposure to different physical contexts which might be responsible for the difference in their locomotory behavior. However, the worms in the binary chamber (with both non-uniformly sized and spaced pillars) experienced both sets of pillars for one hour prior to the assay and they were also free to move between two sets of environments during the assay. So, this is not completely a switch between two different types of tactile barriers (or not completely restricted to prior experience), but rather a difference between experiencing a more complex environment vs a simple uniform environment. They should rephrase their findings. To strictly argue about the prior experience, the authors need to somehow restrict the worms from entering the uniform assay zone during the 1hr training period.

      We agree that, in the original design, worms in the binary chamber experience a more complex physical environment while retaining access to both exploration and assay zones. We have therefore revised the manuscript to more clearly distinguish between behavioral differences due to exposure to a complex environment and modulation driven by prior experience.

      To directly test whether locomotion modulation can be sustained by prior physical experience in the absence of continued access to the exploration zone, we introduced a barrier-based assay that prevents worms from re-entering the exploration zone before locomotion is measured. The results section has been revised accordingly to explicitly address this point.

      Revisions to the manuscript:

      Lines 122-139: Added two paragraphs describing the new assay and summarizing the corresponding results.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.”

      Figure 1–Supplement 1: New figure showing the experimental design and behavioral results.

      (2) The authors here argued that the sensory endings of AFD are not required for this novel role of AFD in context-dependent locomotion modulation. However, gcy-18 has been shown to be exclusively localized to the ciliated sensory endings of AFD and even misexpression of GCY-18 in other sensory neurons also leads to localizations in sensory endings (Nguyen et. al., 2014 and Takeishi et. al., 2016). They should check whether gcy-18 or tax-2 gets mislocalized in kcc-3 or tax-1 mutants.

      As the reviewer suggested, we examined GCY-18 localization in wild type animals and in mutants with defective sensory microvilli using a split-GFP strategy (He et al., 2019). We generated a gcy18::gfp11×7 knock-in strain using CRISPR–Cas9 to visualize endogenous GCY-18 localization. Consistent with prior studies, GCY-18 localized strongly to the AFD dendritic ending in wild-type animals (Figure 4– Supplement 1A, A′, A′′), with an additional weaker signal detectable near the soma and axon (Figure 4– Supplement 1A′′′).

      In kcc-3 mutants, GCY-18 remained localized to the distal dendrite despite disruption of sensory microvillar morphology (Figure 4–Supplement 1B–B′′). Similarly, in ttx-1 mutants, which completely lack AFD sensory microvilli, GCY-18 still localized to the distal dendrite (Figure 4–Supplement 1C–C′′) and remained detectable near the soma and axon (Figure 4–Supplement 1C′′′).

      In the revised manuscript, we clarify both the implications and the limitations of these imaging experiments, noting that “although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption of sensory microvilli does not substantially alter GCY-18 localization within AFD.” The exact site at which GCY-18 functions to support locomotion modulation therefore remains an important open question for future investigation.

      Revisions to the manuscript:

      Figure 4-Supplement 1: Added a new figure reporting GCY-18 localization in wild type and mutant worms.

      Lines 268-280: Added a new paragraph reporting GCY-18 localization in wild type, kcc-3, and ttx-1 mutants and clarifying its relevance to the reviewer’s concern.

      “Given that gcy-18 is required for context-dependent locomotion modulation and that GCY-18 localizes to the distal dendrite of AFD, we next examined how disruption of sensory microvilli affects its localization in AFD. We used a split-GFP strategy to visualize endogenous GCY-18 [73]. A tandem array of seven GFP11 β-strands (GFP11x7) was inserted at the C-terminus of GCY-18 using CRISPR-Cas9. When complemented with GFP1-10, GCY-18::GFP11x7 fluorescence was strongly enriched at the AFD sensory microvilli near the nose (Fig. 4–Supplement 1A-A′′), consistent with previous reports [42,74,75]. In addition, weaker but reproducible GCY-18 signal was detected near the AFD soma and axon (Fig. 4–Supplement 1A′′′). Importantly, in kcc-3, which exhibit disrupted sensory microvilli, and ttx-1 mutants, which lack sensory microvilli, GCY-18 remained localized to the distal dendrite and was still detectable near the soma and axon (Fig. 4–Supplement 1B-B′′’ and 1C-C′′′). Although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption or loss of sensory microvilli does not substantially alter GCY-18 localization within AFD.”

      (3) MEC-10 was shown to be required for physical space preference through its action in FLP and not the TRNs (PMID: 28349862). Since FLP is involved in harsh touch sensation while TRNs are involved in gentle touch sensation, which are the neuron types responsible for tactile sensation in the assay arena? Does mec-10 rescue in TRNs rescue the phenotype in the current paper?

      We performed cell-specific rescue experiments of mec-10. Single-copy expression of mec-10 cDNA in either FLP neurons alone (egl-44p) or TRNs alone (mec-18p) did not restore context-dependent locomotion modulation (Fig. 5A). In contrast, co-expression in both FLP and TRNs (egl-44p::mec-10 + mec18p::mec-10), as well as expression from the mec-10 promoter, rescued the phenotype.

      These results indicate that input from multiple mec-10-expressing neurons, including both FLP and TRNs, is required for context-dependent locomotion adjustment. This requirement differs from spatial preference behavior, where mec-10 acts specifically in FLP (Han et al., 2017), suggesting distinct mechanosensory circuits are engaged by different tactile-driven behaviors.

      Revisions to the manuscript:

      Fig. 5A: Updated to include the cell-specific rescue data.

      Lines 317-331: Added a new paragraph describing these findings.

      “The mec-10 gene is expressed in several mechanosensory neurons, including the six touch receptor neurons (TRNs) and the polymodal nociceptors FLP and PVD [77,79]. To determine which neurons are required for tactile-dependent locomotion modulation, we expressed mec-10 cDNA under cell-specific promoters: mec-18p (TRNs) [80], egl-44p (FLP) [81], or mec-10p (TRNs, FLP, and PVD) [79]. Expression in either FLP or TRNs alone did not restore modulation, as worms carrying egl-44p::mec-10 (Δspeed: -11± 4%) or mec-18p::mec-10 (Δspeed: -13 ± 4%) transgenes showed significantly reduced Δspeed compared to wild type (Δ speed: N2: 33 ± 6%; p < 0.0001 for both; Fig. 5A). By contrast, mec-10 co-expression in both FLP and TRNs (Δspeed: 16 ± 4%), or expression from the mec-10 promoter (Δspeed: 23 ± 4%), restored Δ speed to wild type levels (p = 0.20 and p = 0.57, respectively; Fig. 5A). These findings indicate that mec10 expression across multiple mechanosensory neuron types is required for context-dependent locomotion modulation. It is also worth noting that, while both tactile-dependent locomotion modulation and previously reported spatial preference require FLP, only the former depends on TRNs. Together, these findings suggest that distinct subsets of mechanosensory neurons differentially contribute to behaviors shaped by tactile experience.”

      (4) The authors mention that the most direct link between TRNs and AFD is through AIB, but as far as I understand, there are no reports to suggest synapses between TRNs and AIB. However, FLP and AIB are connected through both chemical and electrical synapses, which would make more sense as per their mec10 data. (the authors mentioned about the FLP-AIB-AFD circuit in their discussion but talked about TRNs as the sensory modality). mec-10 rescue experiment in TRNs would clarify this ambiguity.

      We agree with the reviewer that there are no reported synapses between TRNs and AIB, and we have revised Fig. 5 and the corresponding text to clarify this point. In the revised manuscript, we removed any implication of a direct TRN-AIB connection and instead focus on the established FLP-AIB-AFD pathway, while considering potential indirect contributions from TRNs.

      As the reviewer suggested, we performed cell-specific mec-10 rescue experiments. Expression of mec-10 in either FLP alone or TRNs alone was insufficient to restore tactile-dependent locomotion modulation, whereas co-expression in both cell types rescued the phenotype (revised Fig. 5A). These results indicate that FLP is essential for this behavior, consistent with the known FLP-AIB-AFD connectivity, and that TRNs are also required.

      Given that TRNs lack direct synapses with AIB, TRN requirement suggests the involvement of indirect communication, likely mediated through modulatory mechanisms such as neuropeptide signaling. Accordingly, we have revised the model (revised Fig. 5C) and the corresponding text to clarify that tactiledependent locomotion modulation integrates inputs from multiple mec-10-expressing neurons and does not rely on a direct TRN-AIB synaptic connection.

      Revisions to the manuscript:

      Lines 334–345: Revised paragraph to clarify circuit logic and remove implication of direct TRN-AIB synapses.

      “Touch-sensitive neurons that express mec-10, including TRNs, FLP, and PVD, do not form direct synapses with AFD, suggesting that tactile information is relayed through intermediary neurons. Because the interneuron AIB receives synaptic input from FLP and forms electrical synapses with AFD, we hypothesized that AIB could serve as a conduit for mechanosensory signals to reach AFD. To test whether AIB is required for tactile-dependent modulation, we examined locomotion in worms with genetically ablated AIB neurons using npr-9p::caspase expression [82]. AIB-ablated worms failed to adjust locomotion speed, showing a near-complete loss of modulation (∆speed: -1 ± 5%) compared to wild type (30 ± 8%, p = 0.001, Fig. 5B). These results demonstrate that AIB is required for AFD-mediated tactile-dependent locomotion modulation. However, because mec-10-expressing TRNs are also required, additional pathways beyond AIB likely contribute to transmitting tactile information to AFD, potentially involving indirect synaptic connections through other interneurons or long-distance signaling via neuropeptides or other modulators (Fig. 5C).”

      Fig. 5: Updated to include new cell-specific mec-10 rescue data and revised model.

      (5) Do inx-7 or inx-10 rescue in AFD and AIB using cell-specific promoters rescue the behavior?

      Yes. We tested this during revision. Using the AFD-specific srtx-1b promoter, we expressed inx10 cDNA selectively in AFD neurons of inx-10 mutant worms. This manipulation significantly restored tactile-dependent locomotion modulation compared to non-transgenic inx-10 mutants (Fig. 6D), demonstrating that inx-10 expression in AFD alone is sufficient to rescue the behavioral defect.

      Revisions to the manuscript:

      Line 366-370: Added a description of the AFD-specific inx-10 rescue results.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Fig. 6D: Updated to include new cell-specific inx-10 rescue data.

      (6) How Guanylyl cyclase gcy-18 function is related to the electrical synapse activity between AFD and AIB? Is AFD downstream or upstream of AIB in this context?

      At present, the precise relationship between GCY-18 signaling and the AFD-AIB electrical synapse is not fully resolved. Given that AIB receives mechanosensory input from FLP, it is likely that AIB acts upstream of AFD during tactile-dependent locomotion modulation. However, because the AIB-AFD connection is mediated by gap junctions, communication could also be bi-directional, especially since small signaling molecules such as cGMP and Ca<sup>2+</sup> are known to diffuse through electrical synapses.

      We have therefore revised the manuscript to state explicitly that the directionality of information flow between AFD and AIB remains open, and that this will be an important question for future investigation (Line 455-458).

      “Together, these findings support a model in which AIB functions as a hub neuron that relays mechanosensory input from FLP to AFD to modulate locomotion (Fig. 5C). However, because electrical synapses are often bidirectional, information flow may also occur in the opposite direction, from AFD to AIB.”

      Reviewer #2 (Public review):

      Summary:

      The goal of the study was to uncover the mechanisms mediating tactile-context-dependent locomotion modulation in C. elegans, which represents an interesting model of behavioral plasticity. Starting from a candidate genetic screen focusing on guanylate cyclase (GCY) mutants, the authors identified the AFDspecific gcy-18 gene as essential for tactile-context-dependent locomotion modulation. AFD is primarily characterized as a thermo-sensory neuron. However, key thermosensory transduction genes and the sensory ending structure of AFD were shown here to be dispensable for tactile-context locomotion modulation. AFD actuates tactile-context locomotion modulation via the cell-autonomous actions of GCY-18 and the CNG-3 cyclic nucleotide-gated channel, and via AFD's connection with AIB interneurons through electrical synapses. This represents a potentially relevant synaptic connection linking AFD to the mechanosensory-behavior circuit.

      Strengths:

      (1) The fact that AFD mediates tactile-context locomotion modulation is new, rather surprising, and interesting.

      (2) The authors have combined a very clever microfluidic-based behavioral assay with a large set of genetic manipulations to dissect the molecular and cellular pathways involved. Rescue experiments with singlecopy transgenes are very convincing.

      (3) The study is very clearly written, and figures are nicely illustrated with diagrams that effectively convey the authors' interpretation.

      Weaknesses:

      (1) Whereas GCY-18 in AFD and the AFD-AIB synaptic connection clearly play a role in tactile-context locomotion modulation, whether and how they actually modulate the mechanosensory circuit and/or locomotion circuit remains unclear. The possibility of non-synaptic communication linking mechanosensory neurons and AFD (in either direction) was not explored. Thus, in the end, we have not learned much about what GCY-18 and the AFD-AIB module are doing to actuate tactile context-dependent locomotion modulation.

      We agree with the reviewer that although GCY-18 in AFD and the AFD-AIB connection are clearly required for tactile context-dependent locomotion modulation, the precise mechanisms by which they influence mechanosensory and locomotor circuits remain unresolved. In particular, the possibility of nonsynaptic communication or bidirectional signaling between mechanosensory neurons and AFD cannot be addressed by the current experiments and warrants future investigation.

      At the same time, we believe this study reveals several previously unrecognized aspects of tactiledependent locomotion modulation that provide a foundation for future mechanistic investigation.

      Specifically, we show that (i) GCY-18 functions in AFD to support tactile-dependent locomotion modulation; (ii) the cGMP-gated channel TAX-4, required for thermosensation, is dispensable for this process, whereas CNG-3 is required, revealing functional specialization within AFD; (iii) the interneuron AIB is necessary for this modulation; and (iv) restoring a single electrical connection between AFD and AIB using mammalian Cx36 is sufficient to rescue tactile-dependent modulation in innexin mutants.

      Accordingly, we now explicitly state in the revised Discussion that “a limitation of this study is that the directionality and mode of information flow between AFD and AIB remain unresolved, and defining this relationship will be an important goal for future investigation” (Line 472-475).

      (2) The authors only focused on speed readout, and we don't know if the many behavioral parameters that are modulated by tactile context are also under the control of AFD-mediated modulation.

      We used locomotion speed as the primary behavioral readout because it provides a robust measure for detecting whether behavior is modified by prior tactile experience, rather than to capture the full spectrum of motor outputs. This strategy is often used to assess experience-dependent behavioral plasticity across sensory modalities and enabled us to uncover the unexpected role of AFD in tactile-dependent plasticity.

      In the revised manuscript, we expanded our analysis to include additional behavioral parameters. As described in the Results, AFD-ablated worms showed a complete loss of context-dependent modulation not only in speed, but also in idle time and turning frequency, with no detectable differences between uniform and binary chambers (Fig. 4E). These data strengthen the conclusion that AFD broadly supports tactiledependent behavioral modulation rather than selectively affecting a single locomotor parameter.

      Revisions to the manuscript:

      Fig. 4E: Revised panel to include additional locomotion parameters, including idle time and turning frequency, in wild type and AFD-ablated worms.

      Lines 283–285: Expanded the results to describe changes in locomotion speed, idle time, or turning frequency of AFD-ablated mutant worms. “These animals showed no detectable differences between uniform and binary chambers in locomotion speed, idle time, or turning frequency (Fig. 4E).”

      (3) The AFD-AIB gap junction reconstruction experiment was conducted in an innexin double mutant background, in which the whole nervous system's functioning might be severely impaired, and its results should be interpreted with this limitation in mind.

      We appreciate the reviewer’s concern that the innexin double-mutant background may broadly affect nervous system function, and we agree that loss of innexins is not restricted to the AFD-AIB synapse and could introduce global circuit perturbations.

      Importantly, however, the specificity of the rescue is informative. In an innexin double-mutant background, where electrical coupling is broadly disrupted, re-establishing a single electrical synapse between AFD and AIB using Cx36 was sufficient to restore tactile-dependent locomotion modulation (Fig. 6D). The ability of a targeted AFD-AIB connection to rescue behavior despite the absence of many other electrical synapses argues against a purely global network defect and instead identifies the AFD-AIB electrical synapse as a critical locus for this modulation.

      To further address this concern, we performed an additional rescue experiment in a less perturbed genetic background. In the revised manuscript, we show that AFD-specific expression of inx-10 rescues locomotion modulation in inx-10 single mutants (Fig. 6D). Together, these complementary rescue approaches, one restoring endogenous innexin function in AFD and the other reconstituting an electrical synapse using Cx36, support the conclusion that AFD-AIB electrical coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery of global circuit function.

      Revision to the manuscript:

      Fig. 6D and Lines 366-370: Added new data and revised text showing that AFD-specific inx-10 expression restores tactile-dependent locomotion modulation.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Reviewer #3 (Public review):

      Summary:

      Rosero and Bai report an unconventional role of AFD neurons in mediating tactile-dependent locomotion modulation, independent of their well-established thermosensory function. They partially elucidate the signaling mechanisms underlying this AFD-dependent behavioral modulation. The regulation does not require the sensory dendritic endings of AFD but rather the AFD neurons themselves. This process involves a distinct set of cGMP signaling proteins and CNG channel subunits separate from those involved in thermosensation or thermotaxis. Furthermore, the authors demonstrate that AIB interneurons connect AFD to mechanosensory circuits through electrical synapses. They conclude that, beyond its primary function in thermosensation, AFD contributes to context-dependent neuroplasticity and behavioral modulation via broader circuit connectivity.

      While the discovery of multifunctionality in AFD is not entirely unexpected, given the limited number of neurons in C. elegans (302 in total), the molecular and cellular mechanisms underlying this AFD-dependent behavioral modulation, as revealed in this study, provide valuable insights into the field.

      Strengths:

      (1) The authors uncover a novel role of AFD neurons in mediating tactile-dependent locomotion modulation, distinct from their well-established thermosensory function.

      (2) They provide partial insights into the signaling mechanisms underlying this AFD-dependent behavioral modulation.

      (3) The neural behavior assays utilizing two types of microfluidic chambers (uniform and binary chambers) are innovative and well-designed.

      (4) By comparing AFD's role in locomotion modulation to its thermosensory function throughout the study, the authors present strong evidence supporting these as two independent functions of AFD.

      (5) The finding that AFD contributes to context-dependent behavioral modulation is significant, further reinforcing the growing evidence that individual neurons can serve multiple functions through broader circuit connectivity.

      Weaknesses:

      (1) Limited Behavioral Assays: The study relies solely on neural behavior assays conducted using two types of microfluidic chambers (uniform and binary chambers) to assess context-dependent locomotion modulation. No additional behavioral assays were performed. To strengthen the conclusions, the authors should validate their findings using an independent method, at the very least by testing AFD-ablated animals and gcy-18 mutants with a second behavioral approach.

      The reviewer points out that the original study relied on locomotion assays in two microfluidic environments (uniform and binary chambers) and suggests validation using an independent behavioral approach, particularly for AFD-ablated animals and gcy-18 mutants.

      To address this concern, we developed an independent behavioral assay in which the exploration and assay environments are physically separated by a removable barrier (Figure 1–Supplement 1A). In this design, worms first explored distinct physical settings, after which a barrier was inserted to confine them to an identical assay zone. This approach allowed us to directly test whether context-dependent locomotion modulation can be maintained when worms are prevented from re-entering the exploration environment and must rely solely on prior experience.

      Using this assay, we found that wild-type worms that had previously explored environments matching the assay zone moved significantly faster than those that had explored non-matching environments (Figure 1– Supplement 1B-C). These results demonstrate that context-dependent locomotion modulation is retained even when ongoing sensory input from the exploration zone is eliminated, independently validating our original findings using a distinct behavioral paradigm.

      Further, using this same assay, we found that locomotion modulation was significantly impaired in both gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2A). Together, these results provide independent behavioral evidence supporting the conclusion that AFD and gcy-18 are required for contextdependent locomotion modulation.

      Revision to the manuscript:

      Figure 1–Supplement 1A: Added schematic and results from the removable-barrier assay in wild type animals.

      Lines 120-137: Added corresponding Results text describing the new assay and wild-type behavior.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.” Figure 4–Supplement 2A: Added data for gcy-18 mutants and AFD-ablated worms in the removable barrier assay.

      Lines 288-296: Added text describing behavioral defects in gcy-18 mutants and AFD-ablated worms using the new assay.

      “Building on our finding that locomotion modulation can be driven by prior physical experience even after worms are prevented from re-entering the exploration zones, we next tested whether AFD is required for this modulation using chambers in which the exploration and assay zones were separated by a removable barrier (Fig. 1–Supplement 1A). Under these conditions, locomotion modulation was significantly reduced in AFD-ablated worms (∆speed: -AFD = 1 ± 6% vs. N2 = 23 ± 7%; p = 0.036; Fig. 4–Supplement 2A). Similarly, gcy-18 mutants showed defective locomotion modulation (∆speed: gcy-18 = -1 ± 8% vs. N2 = 23 ± 7%; p = 0.034; Fig. 4–Supplement 2A). These results indicate that AFD and gcy-18 are required to generate locomotion modulation in response to recent physical experience, even when continued access to surrounding environments is restricted.”

      (2) Clarity in Behavioral Assay Methodology: The methodology for conducting the behavioral assays is unclear. It appears that worms were free to move between the exploration and assay zones, with no control over the duration each worm spent in either zone. This lack of regulation may introduce variability in tactile experience across individuals, potentially affecting the reproducibility and quantitativeness of the method. The authors should clarify whether and how they accounted for this variability.

      In the primary assay, worms were allowed to move freely between the exploration and assay zones for one hour, and each animal’s tactile experience depended on its exploratory trajectory. To address the resulting variability, we performed an a priori power analysis, which determined that approximately 160 worms distributed across more than 20 chambers per condition were sufficient to obtain reliable populationlevel measurements. This sampling strategy was applied consistently across all experiments. Accordingly, analyses emphasize well-powered population means rather than individual trajectories, ensuring robust and reproducible comparisons despite variability in individual experience.

      In addition, as described above, we developed a removable-barrier assay that eliminates variability from ongoing exploration by confining worms to the assay zone after a defined exploration period. The consistency of behavioral effects across both assays further supports the robustness and reproducibility of the approach.

      (3) Potential Developmental and Behavioral Confounds in Mutant Analysis: Several neuronal mutant strains were used in this study, yet the effects of these mutations on development and general behavior (e.g., movement ability) were not discussed. Although young adult worms were used for behavioral assays, were they at similar biological ages? To rule out confounding factors, locomotion assays assessing movement ability should be conducted (see reference PMID 25561524).

      To address the possibility that behavioral phenotypes in mutant strains arise from developmental defects or impaired general locomotion, we directly measured locomotion speed on agar plates and body length in gcy-18 mutant and AFD-ablated worms. Neither genotype showed defects in basal locomotion speed or body length compared to wild type animals (Figure 4–Supplement 2B-C), indicating that the observed modulation defects are not explained by impaired development or gross motor ability.

      To further control for developmental variability, all behavioral assays were performed using agesynchronized populations. Animals were selected at a defined gravid adult stage, identified by the presence of 5-10 eggs arranged in a single row within the gonad. All mutant strains reached this developmental stage approximately three days after egg laying, comparable to wild type animals.

      Revision to the manuscript:

      Figure 4–Supplement 2B-C: Added quantification of locomotion speed on agar plates and body length for gcy-18 mutants and AFD-ablated worms.

      Lines 297-304: Added text describing the data presented in Figure 4–Supplement 2B-C.

      “Finally, to determine whether the modulation defects observed in gcy-18 mutants and AFD-ablated worms could be attributed to developmental abnormalities or gross motor impairments, we measured locomotion speed and body length on standard NGM plates. Both day-1 adult AFD-ablated worms (speed: 281 ± 10 µm/s; p = 0.33; body length: 1.12 ± 0.01 mm; p = 0.76) and gcy-18 mutants (speed: 291 ± 13 µm/s; p = 0.22; body length: 1.15 ± 0.02 mm; p = 0.86) showed locomotion speeds and body lengths comparable to wild type controls (speed: 252 ± 30 µm/s; body length: 1.14 ± 0.02 mm; Fig. 4–Supplement 2B, C). These results indicate that the loss of context-dependent locomotion modulation is not due to developmental defects or gross impairments in locomotion.”

      (4) Definition and Baseline Measurements for Locomotion Categories: The finding that tax-4 and kcc-3 contribute to basal locomotion but not to context-dependent locomotion modulation is intriguing. The authors argue that distinct mechanisms regulate these two processes; however, the study does not clearly define the concepts of "basal locomotion" and "context-dependent locomotion," nor does it provide baseline measurements. A clear definition and baseline data are needed to support this conclusion.

      We define basal locomotion as the locomotion speed of worms measured in the binary chamber, where wild-type animals consistently exhibit lower locomotion rates. Measurements from the binary chamber therefore serve as the baseline reference for locomotion speed in our microfluidic assays. Context-dependent locomotion modulation is defined as the quantified difference in locomotion speed between worms in uniform chambers and those in binary chambers. These definitions are now stated in:

      Lines 199-201: “We examined the locomotion speed of mutant worms in the binary chambers, which we refer to as the basal speed because wild type worms consistently move slowest in this environment.”

      Lines 645-46: “Asterisks above horizontal black lines indicate statistically significant differences in basal speed, defined as speed of worms in the binary chamber”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The availability of strains has not been mentioned. This should be addressed.

      The revised Methods section now includes a complete list of strains used in this study, and we have added a statement indicating that all strains are available upon request.

      Minor comment:

      Figure 1C - it should be Idle, not Idel.

      We have corrected the y-axis label in Figure 1C to ‘Idle.’

      Reviewer #2 (Recommendations for the authors):

      This is an interesting and well-written article, which I greatly appreciated reading. There are a few concerns that the authors should address, in my opinion, to provide a more complete and convincing story.

      Major points:

      (1) Maybe the material transmitted to me was incomplete, but I did not find the gcy gene screen results. It seems important to present the screen results in full, together with the description of the alleles tested for the 24 gcy genes.

      The revised manuscript now includes the complete results of the gcy mutant screen in Figure 2– Supplement 1, with the alleles tested for all 24 gcy genes listed in Table S1.

      (2) I did not find the actual p-values, sample sizes for each condition, or raw data; nor a data availability statement indicating where to retrieve these.

      Statistical significance is indicated by asterisks in all figures, with definitions provided in each figure legend (n.s., p > 0.05; *, p < 0.05; **, p < 0.01; ***, p < 0.001). Sample sizes are shown as individual data points in the plots, and we have now added explicit n values to each figure legend for clarity. A Data Availability Statement has also been added to indicate where the raw data can be accessed. Where possible, we have included exact p-values. For analyses using Tukey-Kramer post hoc tests, p-values are reported to four decimal places, reflecting the output limits of the statistical software used.

      (3) It is not clear why the authors only quantified animal speed for most of the study. What about idle time, turns, and reversals? This choice limits the reach of the study, as we only partly understand what AFD is doing, notably to explain the phenotype in the preference assay.

      Data on idle time, turning frequency, and reversal frequency for wild-type worms are now included in Figure 1F. In addition, we present new data showing that AFD ablation disrupts context-dependent modulation of locomotion speed, idle time, and turning frequency (Figure 4E).

      (4) Figure 2D and related text: these conclusions are based on a single mutant analysis. Were the millionmutation project lines outcrossed? It would be much more convincing if more gcy alleles were tested (this should be relatively easy since classical alleles are available at the CGC for gcy-8 and gcy-18).

      The million-mutation project lines used in this study were outcrossed prior to analysis. In addition, we confirmed that the observed defects were specifically due to loss of gcy-18 function by rescuing the phenotype through expression of gcy-18 cDNA under AFD-specific promoters. This cell-specific rescue shows that the behavioral defects arise from disruption of gcy-18 rather than from background mutations.

      (5) It is hard to interpret the speed phenotype when the authors switch between Delta speed and absolute speed display from one figure to another, or even from one panel to another. If only tax-4 and kcc-3 display a constitutive speed phenotype, then there should be no problem showing the absolute speed data in every panel. This is important to convince the reader that major speed changes in mutants are not biasing the interpretation based on Deltas. Indeed, if some mutants move very fast, there might be a ceiling effect. Conversely, if they move very slowly, there might be a 'sickness' effect. Both effects could prevent seeing a tactile-context-dependent modulation, and the results would need to be interpreted much more carefully. Providing the full view on absolute speed levels would also really help support the whole discussion paragraph about the differential regulation of constitutive versus context-dependent locomotion (from L339 onward).

      We focus on ∆speed because it directly quantifies experience-dependent locomotion modulation relative to each strain’s own baseline, making it an appropriate metric for comparing tactile plasticity across genotypes. This approach avoids confounding effects from strain-specific differences in overall locomotion levels.

      At the same time, we agree that absolute locomotion speed is important to consider when interpreting behavioral phenotypes. To address this, we added plate-based locomotion speed and body length measurements for two key genotypes that lack modulation, gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2B–C). Both exhibit normal locomotion on agar plates, indicating that their defects in tactiledependent modulation are not due to impaired motor ability or general sickness.

      In addition, among the mutants tested in microfluidic chambers, tax-4 mutants display elevated basal speed yet retain robust context-dependent modulation, indicating that ceiling effects do not limit detection of modulation.

      (6) The gap junction expression is a nice experiment. But there is a major limitation that should be stated: the electrical synapse re-construction is made in a double mutant background in which the whole animal circuitry might be severely affected. It might well be that the restoration of behavioral plasticity represents something totally irrelevant to wild-type nervous system functioning. A cell-specific innexin knockout is needed to fully support the relevance of the AFD-AIB connection.

      We agree that reconstruction of an electrical synapse in an innexin double-mutant background carries the limitation that global circuit function may be broadly affected. To address this concern, we performed an additional rescue experiment in a less perturbed genetic background.

      As described above, we show that AFD-specific expression of inx-10 is sufficient to restore tactiledependent locomotion modulation in inx-10 single mutants (Fig. 6D). This cell-specific rescue does not rely on a double-mutant background and converges on the same outcome as the Cx36-based electrical synapse reconstruction. Together, these complementary approaches support the conclusion that restoring AFD-AIB coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery from global circuit disruption.

      (7) How was developmental age controlled? It seems that all genotypes were grown for a fixed duration (72h). Some mutants, like gcy-8, might grow slower. It would be useful to at least provide control data in wildtype animals showing that behavioral performance is similar even in slightly younger animals (covering the developmental age of the youngest mutant).

      Developmental age was controlled by strict age synchronization and staging criteria rather than growth duration alone. Worms were synchronized by allowing 40-50 young adults to lay eggs on OP50-seeded NGM plates for two hours, after which adults were removed. Developmental stage was further assessed by gonadal morphology, and only young adult animals with 5-10 eggs arranged in a single row were selected for behavioral assays. Using these criteria, all strains, including mutants, consistently reached the assayed stage approximately three days after egg laying, comparable to wild type animals.

      To further address the possibility that subtle developmental differences could influence behavior, we measured locomotion speed on agar plates and body length for genotypes that show defects in contextdependent modulation. gcy-18 mutants and AFD-ablated worms exhibited normal locomotion rates and body size, indicating that their behavioral phenotypes are unlikely to arise from developmental delay or impaired general motor ability. These control data are now included in the revised manuscript (Figure 4– Supplement 2B–C).

      (8) Plasmid construction description is entirely lacking.

      Description of plasmid construction has been added to the revised Methods.

      Minor points:

      (1) 'Context-dependent locomotion' should be replaced by 'tactile context-dependent locomotion' or something similar throughout the manuscript when referring to the impact of the pillar environment.

      Presently, this phrasing shortcut makes the communication too vague throughout, and even confusing when presenting the result of supplementary Figure 2 (where both thermal and tactile contexts are manipulated).

      We appreciate this suggestion and have revised the terminology for clarity where appropriate. Prior to introducing the mechanosensory origin of the modulation (that is, before presenting the mec-10 data), we retain the broader term “context-dependent modulation” to avoid presupposing a tactile mechanism before it is experimentally established.

      (2) L97: Suggested change along the same lines: "prior experience" -> "prior tactile experience".

      We have made this change as suggested.

      (3) Figure 1A: Would the author consider swapping the order of conditions displayed in this diagram? It would make more sense to have the same left-to-right order in the whole figure with the binary chamber on the left, particularly since the author describes the results considering the binary chamber as the 'reference point'.

      The order of chambers in Figure 1A has been revised as suggested, with the binary chamber now shown on the left.

      (4) Figure 1C: 'idel' typo in the axis label.

      The y-axis label has been updated from “idel” to “idle.”

      (5) Without AFD-specific manipulations, the data with tax-4 and tax-2 mutants provide limited information regarding TAX-4 and TAX-2 role in AFD. It should be explicitly mentioned in the Results section that they might work in other neurons.]

      The revised manuscript now explicitly states that the tax-2(p694) allele affects multiple neurons, including BAG, ASE, ADE, and AFD (Lines 421–422).

      (6) L220-222: The strict meaning of this sentence implies that one attributes a role to AFD in controlling constitutive locomotion, but none of the presented data directly shows this (both kcc-3 and tax-4 mutant phenotypes could arise from additional neurons, regardless of any perturbation in AFD). This should be corrected.

      To avoid implying that AFD directly controls constitutive locomotion, we have removed the sentence in question, “Together, these findings suggest that the role of AFD neurons in modulating context-dependent locomotion is distinct from their thermosensory functions and differs from the mechanisms controlling basal locomotion”, from the revised manuscript.

      (7) L328-329: Overstatement. Without AFD-specific manipulation of TAX-2 and TAX-4, the different mutant phenotypes could be due to different cell types, rather than different protein pairs in the channel heteromers. I would recommend addressing this alternative possibility directly in the discussion, rather than focusing only on one (I agree, very cool) possibility.

      We have clarified this point in the revised text. We now explicitly note that the tax-2(p694) mutation affects tax-2 expression in multiple neurons (AFD, BAG, ASE, and ADE) (Lines 421–422).

      Reviewer #3 (Recommendations for the authors):

      (1) Clarification of inx Gene Expression Analysis (Lines 270-271): The authors should specify how the expression of inx genes in individual neurons was identified.

      The revised manuscript now specifies that innexin expression patterns were identified using the CeNGEN single-cell transcriptomic database (Lines 352–354).

      (2) Cx36 Expression in AFD and AIB (Lines 287-288): Further clarification is needed on how Cx36 expression was achieved in AFD and AIB.

      We have clarified that Cx36 was expressed specifically in AFD using the srtx-1b promoter and in AIB using the inx-1 promoter, as stated in the main text (Lines 372–373) and the Fig. 6 legend.

    1. Author Response:

      Public Review:

      We thank you and the reviewers for the thoughtful and constructive comments. The feedback helps us strengthen the manuscript substantially, and we plan to address the key points in the revised version as follows.

      Reviewer #1 (Public review):

      First, in response to the request for a clearer biological interpretation of the pathway enrichment results, we will expand the Discussion to more directly integrate these findings with the observed life-history divergence between strains.

      Second, we agree with the concern regarding the phylogenetic inference of PxSODC. We will therefore re-infer the phylogeny using a model-based Maximum Likelihood approach implemented in IQ-TREE, and, in the absence of an appropriate outgroup, the revised tree will be presented as unrooted.

      Third, to address the suggestion for a structural explanation of the mutational effects, we will add new structural analyses using AlphaFold modeling and 100 ns molecular dynamics simulations of the wild-type and mutant PxSODC proteins across three physiologically relevant temperatures.

      Reviewer #2 (Public review):

      First, we will restructured the Results and streamlined the presentation to better emphasize the central narrative. Extensive descriptive datasets will be moved to the Supplementary Materials, and the rationale linking different analytical layers will be stated more explicitly.

      Second, we will also revise the manuscript to better frame the ecological relevance and limitations of the experimental design. Specifically, we will clarify that the thermal selection regimes were chosen to reflect ecologically relevant extremes for the source population from subtropical Fuzhou, where summer and winter temperatures can approach the ranges used in the experiment. We will further explain that the cycling temperature treatments were designed to approximate severe but naturally occurring diurnal fluctuations.

      Third, in response to concerns about statistical rigor and reproducibility, we will substantially expanded the statistical methods throughout the manuscript. The revised version will provide a clearer description of the analyses used for each dataset, including sample sizes, comparison structure, and statistical thresholds. We will also clarify the application of multiple-testing correction for both transcriptomic and metabolomic analyses, specified the criteria used in network analyses, and more clearly distinguished the statistical approaches used for pairwise versus multi-group comparisons.

    1. Author Response:

      We thank the reviewers and editors for their thoughtful and constructive assessment. We are encouraged that the reviewers viewed the combination of retinal bouton imaging, collicular neuron imaging, and depth-resolved electrophysiology, together with the comparison to retinal geometric models, as a strength of the study. As the reviewers note, our findings are consistent with previous in vitro studies showing topographic organization of tuning in the retina and with recent work demonstrating the precision of retinotopic mapping from retina to superior colliculus (SC). In revision, we will refine our definition of tuning, sharpen our claims about the spatial organization across SC and its correspondence to retinal topography, and make clearer our aim of reconciling seemingly opposing findings in the literature. In addition, we will provide a detailed response to all other points raised by the reviewers.

      A central point raised in the reviews concerns our definition of direction- and orientation-selective cells. We agree that relying only on statistical significance is not sufficient for our purposes, because the resulting classification can depend on recording duration and statistical power. In the revised manuscript, we will therefore introduce thresholding criteria for direction and orientation selectivity indices (DSI and OSI) in addition to significance-based testing. We will also make clearer that our primary question is which stimulus directions and orientations are represented at a given receptive field location, rather than the distribution of preferences among neurons classified as purely direction- or orientation-selective.

      We will also revise the text to define more precisely what our data do and do not establish about the large-scale organization across SC. Our intended conclusion is not that we identify a novel global organization, which would require sampling a larger portion of visual space, but rather that the regions we sampled are not well explained by previously proposed global maps in which each visual field location is dominated by a single tuning preference and the same organization is conserved across individuals. Instead, our data are more consistent with a retinal organization of biases toward specific directions and orientations that vary systematically across visual space.

      We will further clarify how we quantified the correspondence between our data and the previously established retinal model of direction and orientation tuning. In the current manuscript, we report the errors between model predictions and measured tuning preferences at the corresponding visual field locations. We then assess model performance by comparing the distribution of these errors with the errors obtained from two surrogate datasets: one in which the correspondence between visual field location and tuning preference is destroyed, and one in which the prior distribution of tuning preferences is assumed to be uniform. In the revised manuscript, we will make the interpretation of this comparison more explicit, so that the reported errors are clearly presented as the relevant effect-size measure alongside significance.

      Finally, we appreciate the reviewers’ concern that the manuscript may currently emphasize disagreement with previous studies too strongly. We will revise the Discussion to better acknowledge where our data support some degree of local bias or weak clustering, while clarifying that we do not find evidence for a robust, stereotyped global map that is consistent across animals. Our goal is to sharpen the manuscript so that it better reconciles seemingly divergent findings in the literature rather than setting them in opposition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      It is important to make a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment.

      eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.

      Given our response below, regarding the rigor and “completeness” of our work, we do not feel that an editorial judgement of “leaving parts of the evidence incomplete” is justified.

      We also note that another recent experimental paper has validated essentially every prediction made in our eLife paper: https://www.biorxiv.org/content/10.64898/2026.01.05.697065v1

      We thus disagree that the evidence we have presented in our paper is incomplete.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement. 

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      We note (also in the manuscript) that it is difficult to compare the timescales obtained from coarse-grained MD simulations and experiments (“real time”) given that, by design, the CG simulations are accelerated to greatly enhance sampling. However, we can qualitatively compare the timescales of different CG simulations (without directly comparing the corresponding experimental timescales).

      We agree with the reviewer that the starting point of NPC-capsid and capsid-only simulations is different, as is the biological environment in which the rupture occurs. When analyzing the NPC-only and capsid-only simulations, what was striking to us was that at the NPC the capsid-LEN complex ruptures in a multicomponent environment, where several FG-NUPs compete to displace the LENs. It is well established in experiments that LEN has a detrimental effect on capsid integrity.

      In Figure 2, we plot the number of LEN molecules as a function of CG simulation time. The initial capsid-LEN complex was equilibrated without NPC and then placed at the cytoplasmic end of the NPC for docking. The number of LEN molecules for the capsid-only simulations and the NPC-docked simulations is nearly identical, and an insignificant number of LEN molecules unbind at the NPC. Hence, we added the following clarification:

      Page 10, paragraph 11

      “Note that the number of LEN molecules bound to the capsid for the free capsid and NPCdocked capsids are nearly identical. Hence, the disparity in timescale of lattice rupture is not only because of the effect of LEN on capsid lattice properties.”

      Is the time really comparable, given that the simulations have different starting points?

      Yes, the CG timescales of both the NPC and freely diffusing capsid unbiased simulations are comparable, since they were done using identical simulation settings.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      Our implicit solvent CG MD simulations are designed to access timescales far beyond the capabilities of the fully atomistic simulations. We reiterate here that it is difficult to directly compare the timescales obtained from CG MD simulations and experiments.

      As described in the text, there are 12 pentamers in the capsid (7 in the wide end and 5 in the narrow end). For the narrow end to rupture, all 5 pentamers should progressively dissociate. In our unbiased simulations (Fig. S5), in 25 us of CG time, we observe (partial) dissociation of one or two pentamers. Hence, our unbiased CG simulation timescales were not long enough to observe rupturing of the narrow end.

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      We modified (see below) the main text to include the details.

      Page 4, paragraph 1

      “We model LEN and CA interactions such that LEN molecules can only bind to CA hexamers, and all interactions to CA pentamers are turned off, as in experiments, CA selectively associates with hexamers (25, 36).”

      Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which Lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling and thus mislead future readers. 

      We strongly disagree with these statements, and they do not reflect the facts. We provide a rebuttal to these statements in the “Author Response” statements below.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24). 

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008). 

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459). 

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122). 

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776). 

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537). 

      The goal of our simulations (in combination with experiments from the Pathak group) is to provide molecular-level insight into the sequence of events of NPC docking of capsid and the effect of LEN binding leading to sequential dissociation of pentamers and leading to rupturing of the narrow end of the cone-shaped capsid. We also compare the events leading to capsid rupture at the NPC with the same for a freely diffusing capsid, akin to that in cytoplasm. The reviewer should carefully read the abstract of our paper. In fact, the above are all papers that present qualitative experimental results that help validate our model, but they do not provide details on the molecule-scale events. For example, the paper (10.1073/pnas.2420497122 written by our coauthors in the Pathak group) is extensively used to compare the behavior of LEN-bound capsid in the cytoplasm.

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models. 

      We strongly disagree with this assessment. Our CG NPC model is largely a “bottomup” model derived from molecular scale interactions sampled in atomistic simulations (see our previous paper in PNAS https://doi.org/10.1073/pnas.2313737121). The reviewer appears to be ignorant of the “bottom-up” approach based on rigorous statistical mechanics to derive moleculescale model (please refer to a detailed review on bottom-up coarse-graining: J. Chem. Theory. Comput., 2022, 18. 5759-5791).

      Using the “bottom-up” CG model of the NPC, we predicted several molecular-level details of capsid import and docking to the NPC. Our key predictions were that there is an intrinsic capsid lattice elasticity and also the pleomorphic nature of the NPC channel is key for successful capsid docking https://doi.org/10.1073/pnas.2313737121). Our computational predictions have benn, for example, validated in a recently published paper by an experimental group: Hou, Z., Shen, Y., Fronik, S. et al. HIV-1 nuclear import is selective and depends on both capsid elasticity and nuclear pore adaptability. Nat Microbiol 10, 1868–1885 (2025). https://doi.org/10.1038/s41564025-02054-z). Our work is an excellent example of how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biological processes.

      We have now added the following statement:

      Page 3, Paragraph 1

      “Importantly, the computational predictions of capsid docking to the NPC central channel have been recently validated in a HIV-1 core import at the NPC using cryo-ET (33), demonstrating how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biomolecular processes.”

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      The reviewer fails to recognize that the statement is based on the experimental results of LEN-bound capsid that remains bound to the NPC after rupture and fails to translocate to the nuclear side (from the Pathak group in the section “Ruptured LEN-viral complexes remain bound to the NPC”). The Reviewers’ comment is incorrect. 

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      The concept of elasticity can extend down to the mesoscopic scale. Many examples can be found in the large number of elastic network models (ENMs) of proteins published by many authors. The reviewer also fails to comprehend the meaning of the effective spring constants in the HeteroENM model and how they relate to the response of the capsid to stress (e.g., in the NPC). Note, in the NPC central channel, the capsid encounters several nucleoporins (including disordered FG Nucleoporins that not have specific interactions to rest of the proteins), and also a confined environment. This environment can exert inward stress to the capsid, which is also reflected in stress on the capsid lattice. Furthermore, the cited computational AFM studies are very far from a realistic in vivo or even in vitro set of conditions. In contrast, our study presents a realistic environment which the capsid will encounter in NPC, and then these predictions are validated by experimental results.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations. 

      We presented the data of formation of striated patterns of lattice stress in the capsid that runs from capsid narrow end to the wide end in coarse-grained model (https://doi.org/10.1073/pnas.2313737121), and atomistic model (https://doi.org/10.1073/pnas.2117781119). Both of our papers are extensively cited in the current manuscript. Also, when the capsid is ruptured, one cannot visualize the striated patterns.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      Reviewer comment: “This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim.” – This comment by reviewer is deeply flawed and we strongly disagree. In our CG model there is no restriction on the number of LEN molecules that can bind to a CA hexamer. We again restate that, the experimental results on LEN binding to CA hexamers and inability of LEN to bind to pentamers were used as no allatom (AA) forcefield yet exists.

      The steric effect of the lack of third LEN binding to a hexamer is a likely hypothesis (which one is allowed to make). More importantly, an investigation of the steric effect of LEN binding to the CA hexamer is not the main goal of the manuscript.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      We disagree with this spurious statement, and we see no real contradiction. We have now added a minor clarification that LEN can inhibit efficient capsid binding at significantly high concentration.

      Page 6, Paragraph 1

      “Therefore, at significantly high concentration LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm.”

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      This is both a misrepresentation of the simulations and a failure to understand them (as well as the supporting experiments) on the part of the reviewer. In the presence of LEN, the hexameric lattice is hyperstabilized. In contrast, the pentamers are not. As a consequence, the pentamers are dissociated. The pentamers at the narrow end are dissociated first, due to high curvature. The reviewer, from a point of being uninformed, simply speculates on what they think should happen. Moreover, as emphasized earlier and which the reviewer fails to comprehend is that ours is a “bottom-up CG model” so it predicts, not builds in, these effects.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      Unfortunately, the reviewer appears to be quite uninformed on the WTMetaD method and what it does. The chosen collective variable (CV) in our case is the coordination variable and the MetaD samples along that variable (the conditional free energy) as it is designed to do. The reviewer may wish to educate themself by reading Dama et al (https://doi.org/10.1103/PhysRevLett.112.240602). We also note that “emergent properties” are not along some other, uncoupled coordinate.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsidNPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation. 

      We disagree with this statement and point out where we have cited other work, including the ones mentioned above. However, our CG model is a largely bottom-up CG model which differs from other more ad hoc CG approaches (and some well-known CG models). We do not wish to emphasize the obvious flaws in those other CG approaches and models, since that is not the focus of our manuscript.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading. 

      The movie in question (capsid docking to NPC without LEN) is essential for comparison of LEN-binding dynamics. Different from our previous paper, we simulated significantly longer timescales of capsid docking and performed several additional analyses that is relevant to this paper. Moreover, the first section of the result is titled “Coarse-grained modeling and simulation”, hence we only present a summary of the CG models and key validation steps in this section.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      These simulations are the average from two replicas. “For all the plots, the solid lines are the mean values calculated from the time series of two independent replicas, and the shaded region is the standard deviation at each timestep.” This was mentioned in the original figure caption.

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      It is difficult to discern a high curvature region of the capsid due to object overlap of different regions of the capsid. This is likely a case of “perspective distortion” in image processing.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsidNPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions. 

      This is due to the rate at which trajectory frames are saved for movie generation for faster processing of the movies. We added the following in movie caption: 

      “The movie frames correspond to snapshots every 250000 𝜏<sub>CG</sub>.” 

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      We prefer to continue with the current normalization, since typically in the HIV-1 literature capsids are represented as a collection of hexamers and pentamers (rather than total number of CA monomers).

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid. 

      We corrected the statement as follows:

      “Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate from the high curvature to low curvature end of the capsid.”

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      We only show LEN in Figure 1, and in rest of the figures the bound LEN molecules are not shown for clarity. The defects are shown in a darker shade of orange (amber). 

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      We have corrected the erroneous caption in Fig. S3. The color scheme in Fig. 2 and Fig. S3 are now consistent.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      We modified the statement as follows:

      “CG MD simulations complemented by the outcome of live-cell imaging demonstrate that LENtreated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated with the NPC after rupture.”

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      In our previous paper, the difference from condensation state of RNP complex appear only in the pill-shaped capsid, and not in the cone-shaped capsid. In this manuscript, we only investigated the cone-shaped capsid.

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      “What multitude of unfavorable interactions are the authors referring to?” We have now added the following sentence to clarify

      “Here we denote unfavorable CA-LEN interactions as all interactions other than the electrostatic and van der Waal interactions that lead to CA-LEN binding (17).”

      “In the distorted lattice, there is an increase of void volume is based on standard solid-state physics understanding. We added the word “likely” in the statement. “. This can likely be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed (41).”

      Moreover, in one of our previous manuscripts, we established that compressive or expansive strain induces more closely packed or expanded lattice (A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022)).

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      The ideal lattice packing definition is provided in our previous manuscripts: 1. A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022), 2. A. Hudait, G. A. Voth, HIV-1 capsid shape, orientation, and entropic elasticity regulate translocation into the nuclear pore complex. Proceedings of the National Academy of Sciences 121, e2313737121 (2024).

      These manuscripts are cited in the previous statement. The ideal lattice packing is defined based on lattice separations in each core (in cryo-ET and atomistic simulations) using a local order parameter, which measures the near-neighbor contacts of a particle. Moreover, the ideal packing reference is calculated from all available capsid shapes (cone, ellipsoid, and tubular), and takes into account different curvatures.

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      We have added a statement as follows:

      “Based on our analysis, we hypothesize that LEN binding hyperstabilzes the CA hexamerhexamer interactions relative to CA hexamer-pentamer interaction.”

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating. 

      We have now changed “uncoating” to “rupture” throughout the manuscript.

      (o) The authors state: " LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      We have modified the statement as follows

      “We hypothesize that LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC.”

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamerhexamer interactions, as well as more nucleation of defects at the hexamer-hexamer Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We have fixed the run-on sentence.

      The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We disagree with the reviewer. The statement was intended to provide a comparison between free capsid and NPC-bound capsid.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      We disagree with the interpretation of the reviewer. Our interpretation based on our results is LEN binding accelerates capsid rupture (from pentamer-rich high curvature ends), and the rest of the broken hexameric lattice is hyperstabilized. Ultimately, lattice rupture will lead to release the RNP, and hence the intended goal of the drug is achieved.

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

      The simulations were performed in LAMMPS, which is open source. This software is already stated in the Methods. Input data is provided upon request.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: In part B, it appears the middle panel was screenshotted from a ppt, given the red line underneath Lenacapavir. You can export it to an image instead.

      The figure is fixed.

      (2) Figure 6: In part A, the LEN_d in the graph is illegible. Also, in the panel next to it, it also appears to have been screenshotted from a ppt.

      The figure is fixed.

      (3) Page 6: There's an errant quotation mark at the end of a paragraph.

      Removed the errant quotation

      Reviewer #2 (Recommendations for the authors):

      The code used to perform bottom-up solvent-free CG modeling simulations is not made available.

      This is not true. LAMMPS was used as stated in Methods.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The presentation and especially main-text illustrative material seem to focus disproportionately on MacAB-TolC-YbjP complex, and the AcrABZ-TolC-YbjP is relegated to supplementary data which is somewhat confusing. There is no high-resolution side view of the AcrABZ-TolC-YbjP side-by-side to MacAB-TolC-YbjP which may be helpful to spot parallels and differences in the organisation of the two systems.

      This was previously presented in Supplementary Figure S2. However, because the models were shown at a small scale, we have now included the comparison in a main manuscript (Figure 4). This figure presents AcrABZ-TolC-YbjP and MacAB-TolC-YbjP side-by-side, a structural alignment of TolC-YbjP in the two pumps, and close-up views of the interaction interface.

      Supplementary Figure 2 may also be better presented in the main text, as it shows specific displacements of residues upon binding of the YbjP relative to the apo-complexes, although this can be left at the authors' discretion.

      We added more text to describe the displacements of residues upon YbjP binding: ‘Nonetheless, the side chains of a few residues in TolC, which mainly correspond to positively charged amino acids (R18, R24, K214, R227, R234), reorient to interact with the YbjP lipoprotein partner (Figure 2B).’

      Reviewer #1 (Recommendations for the authors):

      The work is of high quality and requires minimal modifications, which are mentioned as suggestions above and are mostly connected to the illustrative material.

      One additional suggestion, which is connected to the earlier BioRxiv preprint, the data seen in Fig 6 of the preprint seems to have been edited out from the current version, and perhaps can be included in a revised version, as it seems to support the "rapid adaptation under stress" role for YbjP, which currently is only speculatively mentioned in p.11, line 365 of the manuscript.

      We acknowledge that the BioRxiv preprint Figure 6 can support the rapid adaptation under stress role for YbjP. However, upon sequencing the ΔybjP strain from the Keio collection used in the preprint, we identified a large deletion in the yecT-flhD region. We therefore generated a new ΔybjP strain without the yecT-flhD deletion and repeated the experiment. However, the results with the corrected strain did not support the previous conclusion, and these data were consequently removed in the current manuscript.

      Reviewer #2 (Public review):

      In Figure 3C, the experiment performed with AcrA is clear and the extra band appears at the proper size. On the right panel, it is clear that the crosslink doesn't work when pBPA is placed on residues too far from TolC. Only when introduced on N113 or T110 does a band appear.

      This is in accordance with an interaction in vivo. Nevertheless, 17 + 54 = 71kDa, which is more than the two bands appearing on the gel. This difference in size migration can occur, but it is not clear when looking at Figure S3. In Figure S3a, the purified proteins are highlighted at approximately the expected size (≈20kDa instead of 17 for YbjP and between 56 and 60kDa in two bands for TolC instead of 54kDa). On the right panel, it seems that the bands are present exactly at the same position, instead of an upper band as expected for the crosslinked YbjP-TolC (at 71kDa). It would be clearer if having the control of the same sample without illumination, revealed by anti-TolC, to see the difference.

      We thank the reviewer for pointing out this discrepancy. We identified an error in the molecular weight ladder, as one band was missing. This has now been corrected: YbjP migrates just below 17 kDa, consistent with Figure 3C. In addition, we previously reported a size of 54 kDa for TolC, whereas matured TolC, after signal peptide cleavage, is actually 52 kDa.

      We believe that the differences in the apparent molecular weight observed in Figures 3A, 3C and S3 (now S2) mainly result from tagging and post-translation modifications.

      In Figure 3A, we used the soluble construct His-YbjP<sub>28-1711</sub> (theoretical M<sub>w</sub> ~18 kDa), as also done for the controls in Figures 3C and S3 (now S2). However, for the crosslinking samples, we used full-length His-tagged YbjP, which carries a post-translational lipid modification (theoretical M<sub>w</sub> ~19 kDa, considering the protein lipidation). The presence of the lipid chains alters the migration as this species migrates at ~15 kDa (Fig 3A). Increased hydrophobicity, due here to YbjP lipidation, could accelerate the migration (Emmanuel et al. 2025 FEBS Open Bio).

      In Figure 3A, we used the TolC-FLAG whose apparent M<sub>w</sub> is ~52 kDa, as previously reported (Fig S3, Fitzpatrick et al. 2017). In Figure S3 (now S2), we used His-tagged TolC (theoretical M<sub>w</sub> 55 kDa) for the control, which migrates above 56 kDa. In the crosslinking samples, however, we detect tag-free, endogenous TolC, with a theoretical M<sub>w</sub> of ~51 kDa.

      In conclusion, the crosslinked complex composed of lipidated FL YbjP (~15 kDa) and endogenous TolC (~51 kDa) would be expected to migrate at ~66 kDa, which is consistent with what is observed in Figures 3C and S3 (now S2).

      A second point that could be discussed further is the comparison of the structure of the pump in the presence of the peptidoglycan with the images previously obtained by tomography. It is not totally clear to me if YbjP could have been positioned in these maps.

      There is density corresponding to YbjP in the map obtained in the presence of peptidoglycan. To improve clarity, we have specified the location of the peptidoglycan relative to the pumps in the revised Figure 4, and Supplementary Figure S4, together with the position of YbjP. In both figures, the lipoprotein appears distant from the peptidoglycan density.

      Reviewer #2 (Recommendations for the authors):

      In addition, please add explanations in the legend of Figure 3C concerning the structures.

      We added the following description of the structures: ‘As shown underneath, AcrA residues Q136 and Y137, proximal to TolC in the structure of the AcrABZ-TolC pump (PDB 5NG5), were replaced by pBPA. For YbjP, the two residues N113 and T110 proximal to TolC in the MacAB-TolC-YbjP complex (PDB 9QGY) and the three residues N43, N90 and H104 distal to TolC were mutated.’

      It would be clearer if having the control of the same sample without illumination, revealed by anti-TolC, to see the difference.

      As the amount of crosslinked material is low, samples were enriched via His-tag purification of YbjP prior to Western blotting. In the absence of illumination (see sample N113, UV-), no crosslink would be formed, and therefore TolC would not be co-purified.

      In addition, some typo errors have been noted.

      Table S1 minus is missing for the defocus range for AcrABZ-TolC-YbjP.

      Thank you for noting the typo. We have added the minus sign.

      Table S3, please specify what is N in the legend.

      N is the stoichiometry parameter, which is now specified in the table legend.

      Line 237, I suppose it has to refer to Figure S6, not S5.

      Thank you for noting the error. We have verified the text matches the figures here and in the entire manuscript.

      Several errors are present in the legend of Figure 6.

      No letters are indicated for the different panels; line 841 must be C, F and I; the indicated colors for the differentially expressed proteins do not correspond to the volcano plots.

      Thank you for suggesting the improvements for the labels. We have modified the plot accordingly.

      Reference Glavier 2020 has been cited as Glacier on line 72.

      We have modified the writing accordingly and checked the reference.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      We thank the reviewer for their time and effort in improving the quality of our manuscript.

      We would like to point out that the results presented in the previous Fig. S12 (now Fig. S13) are from different ages of the mice and restricted to the prefrontal cortex, compared to the previous report (Florence Authier, et al., Dis Model Mech. 2023) where we showed OGT and OGA mRNA/protein expression in total brain homogenates. In this previous study, we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the brains of 3 months old mutant C921Y compared to WT controls. However, in our current study (Figure S12, now S13), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice. Therefore, a direct comparison of findings from total brain vs. prefrontal cortex would be speculative. In our present work, OGT protein levels are not changed in the pre-frontal cortex, while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly.

      It is plausible that the different levels of OGT protein expression in total brain (previous study) and prefrontal cortex (current study) potentially reflect regional differences in the regulation of OGT protein levels/stability, since OGT mRNA levels are increased in both cases. This notion is also supported by additional analyses in three other brain regions (hippocampus, striatum and cerebellum) and these data are now included in Figures S13 and S14.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      Following the reviewers’ suggestion, we have performed additional analyses to identify the cellular composition of the observed nodular dysplasia using neuronal and glial markers. These new analyses indicate that the nodular collections in the layers II/III were predominantly neurons, for example see cresyl violet (Fig. 6E). Moreover, we have also performed immunofluorescence imaging using NeuN and GFAP (Fig. 6G-H), which reflect that the dystrophic collections are predominantly neurons. To further corroborate these findings, we have also performed multiplex IHC analyses, presented in Fig. S12, which indicate that: i) the nodular cortical malformations were populated by neurons and oligodendrocytes and ii) predominantly affected layers II-V, as reflected by the distribution of neuronal markers Reelin and POU class 3 homeobox 2 (POU3F2), and collectively (Fig. 6 and Fig. S12) reflect neuronal disorganisation due to migration defects rather than differentiation defects. We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular features; however, tissue from defined stages of development is not available. 

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work. However, we respectfully point out that the inference that altered proteins must themselves be O-GlcNAc modified is not necessarily correct. For instance, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins. Nevertheless, we have performed further experiments to explore whether Wnt/β-catenin and mTOR signalling are functionally affected, as pointed out by the referee. In the qPCR analyses, we did not observe significant changes in expression of Wnt target genes (Cdkn1a, Ccnd1, Myc, Ramp3, Tfrc), neither in protein levels of key proteins involved in Wnt/β-catenin (non-phosphorylated β-catenin) and mTOR (phosphorylated rpS6) signalling by western blots (data not shown). These results suggest that both pathways are not functionally deregulated in prefrontal cortex of adult OGT<sup>C921Y</sup> mice to a significant extent.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular dynamics; however, tissue from defined stages of development is not available. As stated above, we want to share our current findings with the scientific and patient communities in a timely manner, and the suggested experiments could form the foundation of a follow up study in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      Thank you for the comment. To avoid misleading the readers, we have removed panel A from the previous version of Figure 8 and updated the version of record.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Statistics including exact p-values have been included in the main text for all key questions where appropriate.

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1F, the y-axis labels and scale values are partially obscured by graphical elements, compromising accurate interpretation of the data range.

      Panel 1F has been adjusted to make the y-axis label visible.

      (2) Regarding the histological analyses in Figure 6, the current H&E staining and Luxol Fast Blue myelin staining results lack age-matched wild-type control samples processed in parallel, which undermines experimental comparability. To enhance methodological rigor, control group staining results should be displayed adjacent to each experimental group image.

      The original Figure 6 already contained comparison between WT and OGT<sup>C921Y</sup> tissues. The Figure has been updated with additional data from the WT and C921Y mutant groups shown side by side.

      Reviewer #2 (Recommendations for the authors):

      (1) I believe that Figures S1 and S2 were switched during the submission. The legends are correct, so the authors should just be careful with the order when they upload the final versions.

      Figures S1 and S2 have been re-ordered.

      (2) On page 18, the authors state, "Although no significant changes in the expression of OGT were observed in OGTC921Y cortex (Figure S12A, C), there was a significant increase in OGT/OGA protein ratio in OGTC921Y mice (Fig. S12D). As a functional consequence, global O-GlcNAcylation of proteins in the brain was drastically impaired in the OGTC921Y brain compared to WT (Figure S12E, F).

      To me, this statement suggests that the incorrect ratio of OGT to OGA is responsible for the altered O-GlcNAc levels. I think this is missing important information. The authors are, I'm sure, aware that OGT and OGA expression is linked to O-GlcNAc levels. I think it would be better to describe the situation here as the tissue attempting to respond to lower OGT activity by lowering OGA levels. However, the tissue is not fully successful, resulting in lower overall O-GlcNAc levels as seen by RL2. If the difference were only driven by the OGT/OGA ratio, one would expect increased O-GlcNAc levels due to decreased OGA. I think it is important to point out more details here for non-expert readers.

      Thank you for the insightful comment, we have included these aspects in the revised text, please see page 20.

      (3) I am a little surprised that the authors did not explore differences in O-GlcNAc-modified proteins through a more targeted enrichment of these proteins for analysis of potential modification differences, in addition to just changes in protein abundance.

      We agree that these experiments would further strengthen the work. However, it is not known yet whether OGT-CDG is caused by loss of O-GlcNAc modification on specific proteins or due to as yet to decipher mechanisms (e.g. OGT interactome, HCF1 processing, feedback on OGA levels) which we are not able to confirm in the current manuscript. Therefore, as a starting point, we have performed whole proteome analysis to establish candidate hypothesis which could lead to discovering cellular and molecular mechanisms underlying OGT-CDG. Lastly, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Besson et al. investigate how environmental nutrient signals regulate chromosome biology through the TORC1 signaling pathway in Schizosaccharomyces pombe. Specifically, the authors explore the impact of TORC1 on cohesin function - a protein complex essential for chromosome segregation and transcriptional regulation. Through a combination of genetic screens, biochemical analysis, phospho-proteomics, and transcriptional profiling, they uncover a functional and physical interaction between TORC1 and cohesin. The data suggest that reduced TORC1 activity enhances cohesin binding to chromosomes and improves chromosome segregation, with implications for stress-responsive gene expression, especially in subtelomeric regions.

      Strengths:

      This work presents a compelling link between nutrient sensing and chromosome regulation. The major strength of the study lies in its comprehensive and multi-disciplinary approach. The authors integrate genetic suppression screens, live-cell imaging, chromatin immunoprecipitation, co-immunoprecipitation, and mass spectrometry to uncover the functional connection between TORC1 signaling and cohesin. The use of phospho-mutant alleles of cohesin subunits and their loader provides mechanistic insight into the regulatory role of phosphorylation. The addition of transcriptomic analysis further strengthens the biological relevance of the findings and places them in a broader physiological context. Altogether, the dataset convincingly supports the authors' main conclusions and opens up new avenues of investigation.

      Weaknesses:

      While the study is strong overall, a few limitations are worth noting. The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification. The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect. The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings. Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions.

      We thank the reviewer for his overall positive assessment and constructive criticism. We broadly agree with the few limitations he pointed out, which we will comment on below.

      (1) The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification.

      The basis of our study was to search for suppressor mutants, a situation in which an unviable strain becomes viable. It turns out that the suppressor mutants affect TORC1, necessarily in a partial manner given that TORC1 kinase activity is essential for proliferation. Likewise rapamycin partially inhibits TORC1 and does not prevent proliferation of wild-type S. pombe cells. TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. In addition, it is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al., 2013, PMID: 23888043). Therefore, both hypomorphic TORC1 genetic mutants and rapamycin treatment result in partial inhibition of TORC1 kinase activity. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. Nevertheless, both conditions suppress the heatsensitive phenotype of the mis4 mutant, although the suppressor effect of rapamycin is weaker. Consequently, some phosphorylation sites involved in mis4-ts suppression may behave similarly in rapamycin and TORC1 mutants (i.e. Psm1-S1022), while others (i.e. Mis4-183) may behave differently.

      It is clear that there are phenotypic differences between the suppression of mis4-ts by rapamycin treatment or by genetic alteration of TORC1. This can be seen also in our ChIP analysis of Rad21 distribution at CARs. The trend is upward, but the pattern is not identical. We have added the following text to summarize the above considerations:

      “It is important to note at this stage that, although rapamycin and TORC1 mutants both decrease TORC1 kinase activity, the two are not equivalent. The mechanisms by which TORC1 kinase activity is reduced are different, and TORC1 mutants suppress the mis4G1487D phenotype more effectively than rapamycin. It is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al, 2013). TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. It is therefore remarkable that negative regulation of TORC1 by rapamycin or a genetic mutation both alleviate mis4G14878D phenotypes and have a fairly similar effect on cohesin dynamics.”

      (2) The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect.

      The genetic and biochemical analyses provided in this study show that the AGC kinases Sck1 and Sck2 influence cohesin phosphorylation and function. Whether Sck1, Sck2 or TORC1 directly phosphorylates cohesin components are the next questions to address. The fact that the phosphorylation of Psm1-S1022 and Mis4-S183 were never abolished in the sck1-2 mutants may suggest they are indirectly involved. This should be taken with caution because we have been using deletion mutants. In this situation, cells adapt and other kinases may substitute, at least partially (Plank et al, 2020, PMID: 32102971). Asking whether cohesin components display consensus sites for AGC kinases is a complementary approach. The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this kind of information should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020, PMID: 32102971). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent. In summary, yes, the changes may be indirect, that remains to be seen, but in any case they are influenced by TORC1 signalling. The following paragraph was added:

      “The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1-S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent.”

      (3) The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings.

      It is true that we focused our attention on mip1-R401G, which is present in all the experiments presented. That said, other alleles were used in one or more figures. Five mip1 alleles and one tor2 allele were identified as mis4-ts suppressors (Fig. 1). We have also shown that another mip1 allele, mip1-Y533A, created by another group (Morozumi et al, 2021), is also a suppressor of mis4-ts and affects the phosphorylation of Mis4-S183 and Psm1-S1022 (Fig. 1, Figure 5—figure supplement 1). To this we can add the effect of mutants that render TORC1 hyperactive (Fig. 1E, Fig. 2H) as well as AGC kinase mutants (Figure 5—figure supplement 3.). And finally, the effect of rapamycin. So yes, mip1-R401G has been used extensively, but we have still broadly covered the TORC1 signalling pathway.

      (4) Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions

      We agree that studying the dynamics of cohesin, genome folding and gene expression in relation to nutrient availability is a very exciting topic, and we hope to address these issues in detail in the future.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors follow up on a previous suppressor screen of a temperaturesensitive allele of mis4 (mis4-G1487D), the cohesin loading factor in S. pombe, and identify additional suppressor alleles tied to the S. pombe TORC1 complex. Their analysis suggests that these suppressor mutations attenuate TORC1 activity, while enhanced TORC1 activity is deleterious in this context. Suppression of TORC1 activity also ameliorates chromosome segregation and spindle defects observed in the mis4-G1487D strain, although some more subtle effects are not reconstituted. The authors provide evidence that this genetic suppression is also tied to the reconstitution of cohesin loading. Moreover, disrupting TORC1 also enhances Mis4/cohesin association with chromatin (likely reflecting enhanced loading) in WT cells, while rapamycin treatment can enhance the robustness of chromosome transmission. These effects likely arise directly through TORC1 or its downstream effector kinases, as TORC1 co-purifies with Mis4 and Rad21; these factors are also phosphorylated in a TORC1-dependent fashion. Disrupting Sck2, a kinase downstream of TORC1, also suppresses the mis4-G1487D allele while simultaneous disruption of Sck1 and Sck2 enhances cohesin association with chromatin, albeit with differing effects on phosphorylation of Mis4 and Psm1/Scm1. Phosphomutants of Mis4 and Psm1 that mimic observed phosphorylation states identified by mass spectrometry that are TORC1-dependent also suppressed phenotypes observed in the mis4-G1487D background. Last, the authors provide evidence that the mis4-G1487D background and TORC1 mutant backgrounds display an overlap in the dysregulation of genes that respond to environmental conditions, particularly in genes tied to meiosis or other "stress".

      Overall, the authors provide compelling evidence from genetics, biochemistry, and cell biology to support a previously unknown mechanism by which nutrient sensing regulates cohesin loading with implications for the stress response. The technical approaches are generally sound, well-controlled, and comprehensive.

      Specific Points:

      (1) While the authors favor the model that the enhanced cohesin loading upon diminished TORC1 activity helps cells to survive harsh environmental conditions, as starvation of S. pombe also drives commitment to meiosis, it seems as plausible that enhanced cohesin loading is related to preparing the chromosomes to mate.

      (2) Related to Point 1, the lab of Sophie Martin previously published that phosphorylation of Mis4 characterizes a cluster of phosphotargets during starvation/meiotic induction (PMID: 39705284). This work should be cited, and the authors should interrogate how their observations do or do not relate to these prior observations (are these the same phosphosites?).

      We agree this is a possibility and the following paragraph was added in the discussion section:

      “TORC1-based regulation of cohesin may be relevant to preparing cells for meiosis. Since nitrogen deprivation stimulates meiosis initiation, subsequent TORC1 down-regulation may regulate the cohesin complex, preparing the chromosomes for fusion and meiosis. A recent phosphoproteomic study conducted by Sophie Martin's laboratory showed that Mis4-S107 phosphorylation increases during cellular fusion (Bérard et al, 2024). It is unknown whether the phosphorylation of S107 is controlled by TORC1 signalling. As the phosphorylation of Mis4-S183 and Psm1-S1022 was not detected in these experiments, the potential involvement of the TORC1-cohesin axis in the sexual programme remains to be investigated.”

      (3) It would be useful for the authors to combine their experimental data sets to interrogate whether there is a relationship between the regions where gene expression is altered in the mis4-G1487D strain and changes in the loading of cohesin in their ChIP experiments.

      (4) Given that the genes that are affected are predominantly sub-telomeric while most genes are not affected in the mis4-G1487D strain, one possibility that the authors may wish to consider is that the regions that become dysregulated are tied to heterochromatic regions where Swi6/HP1 has been implicated in cohesin loading

      We agree that it would be interesting to see if there are correlations between cohesin positioning, heterochromatin and gene expression. That said, this would need to be done at the whole-genome level and include many other parameters (genome folding, histone modifications, Pol2 occupancy). These issues require substantial investment and may be addressed in a follow-up project.

      (5) It would be helpful to show individual data points from replicates in the bar graphs - it is not always clear what comprises the data sets, and superplots would be of great help.

      We verified that the figure captions clearly indicate the data sets considered, their mean, standard deviation, and statistical analysis method. As for the type of plot, we used the tools at our disposal.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Besson et al. investigate how the nutrient-responsive TORC1 signaling pathway modulates cohesin function in S. pombe. Using a genetic screen, the authors identify TORC1 mutants that suppress the thermosensitive growth defects of a cohesin loader mutant (mis4-G1487D). They show that reducing TORC1 activity-either genetically or pharmacologically-enhances cohesin binding to chromosomal sites (CARs), improves chromosome segregation, and alters the phosphorylation state of cohesin and its loader. They also show, through coimmunoprecipitation, that TORC1 and cohesin physically associate, and that this functional interaction extends to the transcriptional regulation of stress-responsive, subtelomeric genes. Together, the data suggest that environmental cues influence chromosome stability and gene expression via a TORC1-cohesin axis.

      Overall, the study is well-supported by thoughtful genetic epistasis analyses and a combination of genetic, biochemical, cell biological, and transcriptomic approaches. While not all data are equally strong, the cumulative evidence convincingly supports the authors' conclusions.

      Specific Concerns and Suggestions

      (1) Figure 2A - Division rates of wild-type and mip1-R401G cells are missing and should be provided for proper comparison.

      This is now done in revised Figure 2A. We also made a change in the manuscript, replacing “The mip1-R401G mutation efficiently suppressed the proliferation and viability defects (Figure 2A)” by “The mip1-R401G mutation efficiently attenuated the proliferation and viability defects (Figure 2A)”, to acknowledge the fact that the proliferation rate did not return to wild-type levels.

      (2) Figure 3 - Figure Supplement 1 - The authors claim that "Rapamycin treatment during a single cell cycle provoked a similar effect although less pronounced." However, for most CARs, the effect appears insignificant. This should be acknowledged in the text.

      The text has been changed accordingly:

      “Rapamycin treatment during a single cell cycle provoked a similar stimulation of Rad21 binding at CARs (Figure 3—figure supplement 1), albeit with noticeable differences. In mis4+ cells, both mip1-R401G and rapamycin induced a significant increase in Rad21 binding at several CARs (tRNA-left, cc2, 3323, NTS, Tel1-R). However, some CARs that exhibited increased Rad21 binding in the mip1 mutant did not respond significantly to rapamycin (dg2-R, tRNA-R). Conversely, rapamycin (but not mip1-R401G) induced a significant increase in Rad21 binding at imr2-L and CAR1806 (Figure 3D and Figure 3— figure supplement 1). In the mis4-G1487D mutant background, mip1-R401G induced a significant increase in Rad21 binding at all examined sites (Figure 3B). Similarly, rapamycin did increase Rad21 binding at all sites but only at the Tel1-R site did this reach statistical significance (Figure 3—figure supplement 1).”

      (3) Figure 4 - The analysis of interactions between TORC1 and the cohesin complex is somewhat limited. The authors may wish to test interactions between Mip1 and cohesin subunits (e.g., Rad21). More interestingly, it would be valuable to explore whether MIP1 mutations that suppress cohesin mutants affect the interaction between Tor2 and Rad21.

      We have added some additional data that answer this question (Figure 4—figure supplement 1) and a paragraph in the manuscript:

      “Tor2, the kinase subunit of TORC1, is particularly well detected in Rad21 and Mis4 coimmunoprecipitation experiments (Figure 4 and Figure 4—figure supplement 1). To determine whether the R401G mutation in Mip1 affects these interactions, coimmunoprecipitation experiments were repeated in both the mip1-R401G and mip1+ contexts. The data obtained indicate that Tor2 co-immunoprecipitation with Mis4 and Rad21 is largely unaffected by the mip1-R401G mutation (Figure 4—figure supplement 1). If mip1-R401G affects the regulation of cohesin by TORC1, this does not appear to stem from a gross defect in their interaction, at least at this level of resolution.”

      (4) Figure 5 - There appears to be a lack of correlation between cohesin subunit phosphorylation in TORC1-reducing mutants and in response to rapamycin. The reason for this discrepancy is unclear.

      This point was addressed in the previous section (Public review, reviewer 1, point 1). The response is pasted below:

      The basis of our study was to search for suppressor mutants, a situation in which an unviable strain becomes viable. It turns out that the suppressor mutants affect TORC1, necessarily in a partial manner given that TORC1 kinase activity is essential for proliferation. Likewise rapamycin partially inhibits TORC1 and does not prevent proliferation of wild-type S. pombe cells. TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. In addition, it is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al., 2013, PMID: 23888043). Therefore, both hypomorphic TORC1 genetic mutants and rapamycin treatment result in partial inhibition of TORC1 kinase activity. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. Nevertheless, both conditions suppress the heatsensitive phenotype of the mis4 mutant, although the suppressor effect of rapamycin is weaker. Consequently, some phosphorylation sites involved in mis4-ts suppression may behave similarly in rapamycin and TORC1 mutants (i.e. Psm1-S1022), while others (i.e. Mis4-183) may behave differently.

      It is clear that there are phenotypic differences between the suppression of mis4-ts by rapamycin treatment or by genetic alteration of TORC1. This can be seen also in our ChIP analysis of Rad21 distribution at CARs. The trend is upward, but the pattern is not identical. We have added the following text to summarize the above considerations:

      “It is important to note at this stage that, although rapamycin and TORC1 mutants both decrease TORC1 kinase activity, the two are not equivalent. The mechanisms by which TORC1 kinase activity is reduced are different, and TORC1 mutants suppress the mis4G1487D phenotype more effectively than rapamycin. It is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al, 2013). TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. It is therefore remarkable that negative regulation of TORC1 by rapamycin or a genetic mutation both alleviate mis4G14878D phenotypes and have a fairly similar effect on cohesin dynamics.”

      (5) The phosphorylation sites examined on cohesin subunits are not canonical AGC kinase consensus motifs, suggesting they are unlikely to be direct targets of Sck1 or Sck2. I suggest that this point should be mentioned in the manuscript.

      This is now done:

      “The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1-S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent.”

      (6) Figure 5 - Figure Supplement 3 - The reduction in Psm1 phosphorylation in the sck1Δ sck2Δ double mutant is not convincing without replicates and statistical analysis.

      This is now done and the data are presented in Figure 5—figure supplement 3. Panel D shows the data for Psm1-S1022p and Panel E for Mis4-S183p. Each graph shows the mean ratios +/- SD from 3 experiments.

      (7) Figure 5C - It would be helpful if the authors validated the effect of pef1 deletion on Mis4 phosphorylation by Western blotting, rather than relying solely on mass spectrometry data.

      This is now done. The data appears in Figure 5—figure supplement 2, panel B.

      (8) The statement: "The frequency of chromosome segregation defects of mis4‐G1487D was markedly reduced in a sck2‐deleted background and further decreased by the additional deletion of sck1 (Figure 5-figure supplement 3)" is not supported by the data. According to the figure, the difference between sck2Δ and sck1Δ sck2Δ is not statistically significant.

      The sentence was changed to:

      “The frequency of chromosome segregation defects in the mis4-G1487D strain remained unchanged in a sck1-deleted background, but was significantly reduced when either the sck2 or both the sck1 and sck2 genes were deleted (Figure 5—figure supplement 3).”

      (9) Figure 6A - The data shown are not convincing. The double mutants carrying the phosphomimetic and phospho-null psm1 alleles should be shown on the same plate for direct comparison.

      This is now done. The new data are shown Figure 6A.

      (10) Figure 6E - The wild-type control is missing. Including it would provide an essential reference point to assess whether the mutants rescue cohesin binding to wild-type levels.

      This is true that the effects were small when compared to wild-type but still significant when compared to mis4-G1487D. The comparison with wild-type is now available in Figure 6—figure supplement 1 and the paragraph was modified accordingly:

      “Cohesin binding to CARs as assayed by ChIP tend to increase for the mutants mimicking the non-phosphorylated state and to decrease with the phospho-mimicking forms (Figure 6E). The rescue of mis4-G1487D by the non-phosphorylatable form was modest but significant, notably within centromeric regions (imr2-L, dg2-R) and at the telomere (Tel1-R) site (Figure 6E and see Figure 6—figure supplement 1 for comparison with wild-type levels). Conversely, the mutant mimicking the phosphorylated state displayed a significant reduction of Rad21 binding at those sites as well as to several other sites at the centromere (cc2, tRNA-R), CAR2898, and at the ribosomal non-transcribed spacer site NTS).”

      Limitations of the Study (not requiring additional experiments for publication, but worth noting).

      (11) The authors suggest that nutrient status affects cohesin, but this is not directly demonstrated-e.g., by comparing growth or cohesin dynamics or phosphorylation under defined nutrient conditions. That said, the paper is sufficiently detailed to allow this question to be addressed in follow-up work.

      We agree that studying the dynamics of cohesin, genome folding and gene expression in relation to nutrient availability is a very exciting topic, and we hope to address these issues in detail in the future.

      (12) The upstream signaling cascade remains unresolved. The identity of kinases downstream of TORC1 (e.g., whether Sck1/Sck2 or other factors are responsible) and whether TORC1 directly phosphorylates Mis4 or Psm1 are not established.

      This is something we can all agree on, and it might be something we look at in a future project.

      (13) The conclusions rely heavily on one TORC1 mutant allele (mip1-R401G). While this allele is informative, additional alleles or orthogonal methods could further support the generality of the findings.

      It is true that we focused our attention on mip1-R401G, which is present in all the experiments presented. That said, other alleles were used in one or more figures. Five mip1 alleles and one tor2 allele were identified as mis4-ts suppressors (Fig. 1). We have also shown that another mip1 allele, mip1-Y533A, created by another group (Morozumi et al, 2021), is also a suppressor of mis4-ts and affects the phosphorylation of Mis4-S183 and Psm1-S1022 (Fig. 1, Figure 5—figure supplement 1). To this we can add the effect of mutants that render TORC1 hyperactive (Fig. 1E, Fig. 2H) as well as AGC kinase mutants (Figure 5—figure supplement 3.) and finally, the effect of a transient treatment with rapamycin. So yes, mip1-R401G has been used extensively, but we have still broadly covered the TORC1 signalling pathway.

      Reviewer #2 (Recommendations for the authors):

      (1) Given the lack of CTCF in fission yeast, it is worth noting that cohesin ChIP data nonetheless can predict topological domains, which reinforces its important role in dictating chromatin folding (PMID: 39543681).

      We thank the reviewer for this suggestion. We now refer to this study in the discussion section.

      (2) Providing context for the S. pombe nomenclature for the conserved cohesin subunits would help the reader navigate the manuscript, possibly using a cartoon as for the TORC complexes. For example, Psm1 (aka Smc1) is not introduced and therefore its phosphorylation comes into the manuscript without explanation.

      Cohesin subunits and their names are given in the introduction section.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents high-resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore, the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one, except that an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall, the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast,while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a corresponding yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found that this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is that the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall, I found this to be an interesting and comprehensive study of the structural basis for the interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how VPS34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high-resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity VPS34 mutant to determine the 3.2 ang structure of Rab5 bound to VPS34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure, they mutated out the VPS34 interface, allowing for a high-resolution structure of the Rab5 bound at the VPS15 interface.

      They extensively validated the VPS15 interface in the yeast variant of VPS34, showing that the Vp215-Rab5 (VPS21) interface identified is critical in controlling complex II VPS34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously, and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example, see Figure 2A, VPS15 described as 579-SHMIT-583>DDMIE); however, when they shift to the yeast version, they shift to saying VPS15 mutant, but don't define the mutant, Figure 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species, when the exact mutant is included. This is particularly important in this paper, where we are jumping between different subunits and different species, so a clear description in the figure/figure legends makes it much easier to read for non-specialists.

      The reviewer has made an excellent point here. To clarify the yeast mutation, we have revised the manuscript main text to refer to the yeast mutant as SHLITY>DDLIEY, and we have added this to the legend for Figs. 2F,G.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state, "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing as to why they required multiple timepoints, and before they also describe that they required a p-value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p-value of <0.05 (n=3). Also, what do they mean by kept? Does this mean that they only fully processed the peptides with differences?

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an Excel file, including peptides with no differences, as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached, but this reviewer could not see an example of it in the shared data dropbox folder.

      We have revised the HDX method description to clarify. All peptides were kept and fully processed. However, for the results displayed, we have illustrated only peptides meeting the criteria described.

      The Excel file for all peptides (as recommended by Masson et al) was deposited with PRIDE, with the identifier with the dataset identifier PXD061277, in addition, we have included this excel file in our supplementary material.

      Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete.

      Weaknesses:

      What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following PhoΔ60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex.

      One of the first noted features of the VPS34 complexes was that the ATG14-containing complex (VPS34-CI) is important for autophagy, while the VPS38 (yeast orthologue of UVRAG) subunit characteristic of VPS34-CII is important for endocytic sorting (PMID 11157979). However, the VPS34, VPS15 and BECLIN1 subunits are required are present in both complexes, as such, mutations of them may affect both processes.

      We agree with the reviewer that is an important undertaking to examine the effect of the SHLITY>DDLIEY mutation in yeast Vps15 on autophagy. However, the focus of the current manuscript is VPS34-complex II and RAB5 interaction/activation. An autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      The reviewer has raised an excellent question, which was addressed briefly in the introduction to the manuscript. We have now somewhat expanded on these issues near the end of the discussion in the revised manuscript. In our previously published study, we found that soluble RAB5-GTP did not stimulate the complex II activity (supplementary figure 2b of PMID: 33692360). This is consistent with our finding in this manuscript showing that RAB5 did not cause large conformational changes in solution. However, our previous single-molecule study showed that once complex II is recruited to the membrane by RAB5, and RAB5 increases the turnover rate on membranes, indicating an additional allosteric activation (Figure 7 of PMID: 33137306). This study indicated that the primary the role of RAB5 is to anchor complex II on the membrane. Once the complex is anchored on the membrane by RAB5, the kinase domain is in the vicinity of its substrate, PI, leading to higher turnover.

      The Echelon Class III PI3K ELISA Kit (Echelon, K-3000) comes with a soluble PI, diC8 to measure the VPS34 activity, and it is certainly active with this soluble substrate. However, if the substrate is in membranes, the VPS34 activity is greatly dependent on the character of the membrane.

      I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

      The reviewer is correct that the high-resolution features of BECLIN1 at the base of the V-shaped complex that we observed are not related to RAB5 binding, but they are characteristic of VPS34-CII and likely to be important for the specific role of VPS34-CII. This is the first high-resolution structure of the VPS34-CII that has been reported, and we believe it would be irresponsible not to briefly describe them, since they are unique to VPS34-CII. For this reason, we have placed this section at the end of the results, and we now clarify that we do not see a relevance to RAB5 function, but we describe the arrangement of a region (the BH3) that has been functionally noted in many previous studies, in the absence of a structure.

      Reviewing Editor Comments:

      Please address the following suggestions for minor changes to the manuscript. Use your best scientific judgment in addressing the comments and describe the modifications together with your reasoning in a cover letter. We look forward to seeing the revised version of this very nice study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I found a portion of the description of the cryoEM complexes on the top of page 9 to be redundant with similar descriptions near the top of page 7, and it was not clear to me at first that these were describing the same structures. Part of my confusion was due to the redundancy, including the statement near the bottom of page 7: 'Models were built and refined for all RAB5associated VPS34-CII assemblies', and then the similar statement on page 9: 'We fit and refined atomic models into both densities'. I believe these are describing the same models? To clarify for the reader, perhaps on page 9, the authors could begin this part with a statement such as "as described above", and eliminate the redundant descriptions.

      The reviewer is correct. Both sections describe the same set of cryo-EM classes from the same sample. The only difference is what we analysed in the two sections: number of RAB5s bound in the first section and the effect of RAB5 binding in the second section. We have revised the text to make this clear, and to make the second section more succinct.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors show nicely that a mutation in Vps15 disrupts binding to Vps21 in vivo, with defects in the endocytic pathway as analyzed by CPY sorting. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following Pho∆60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex. If the authors were to find evidence that this Vps15 mutant also affects autophagy, it would indicate that there is possibly also another Rab1 binding site in Vps15.

      As we stated above, an autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      (2) It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      As in our response to reviewer #3 above, this point was addressed in previous publications and was described in the introduction to our manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

      We thank all reviewers for their constructive feedback on our manuscript.

      We believe we have addressed all the questions in the rebuttal below in writing, including planned experiments we will perform to strengthen the conclusions of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature.

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      New Results:

      We performed sorbitol sensitivity assays on >20 hours post-infection iRBCs following heat stress in the presence and absence of the PSAC inhibitor furosemide. These additional experiments were added to the supplementary figures (Supplementary Figure 3). Importantly, sorbitol-mediated lysis of iRBCs, with or without prior heat stress, was reduced when furosemide was present, demonstrating that the observed effect is likely PSAC-dependent. We also observed that uninfected RBCs did not lyse with sorbitol, regardless of heat stress, confirming that the effect is specific to infected cells.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These images include co-staining with the P. falciparum proteins KAHRP and SPB1 to assess possible co-localisations. Furthermore, following the reviewer’s suggestion, we have softened the statement regarding PF3D7_1039000-HA to better reflect the data, changing “...does not colocalise” to “...does not strongly colocalise”.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and possibly including Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found in the context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, here we focused on enhanced protein export using multiple complementary approaches, and have chosen to address rigidity questions in a different study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, a second time point in many of the assays (for example, 36 hrs or later) would be useful to determine if heat stress simply accelerates trafficking of proteins to the RBC or if instead it results in an overall increase in trafficking.

      As mentioned earlier: We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). The end level of VAR2CSA is the same in both conditions, but at 24 hours post infection it is higher following heat stress, indicating that trafficking is accelerated.

      In the text, the authors frequently mention changes in the parasites' phenotype in response to heat stress; however, the way it is described is a bit ambiguous and can be confusing. For example, on page 3, they state that "Following heat stress, significantly more iRBCs (57.6% +/-19.4%) cytoadhered.....". From this sentence, it is not initially clear if the end result is cytoadherence of 57.6% of iRBCs or if this refers to an increase of 57.6%. This could be stated explicitly (e.g., "an increase of 57.6% +/- 19.4%") to avoid confusion. Similar descriptions of the results are found throughout the paper.

      We agree this is confusing and altered the text accordingly.

      The authors might consider citing and discussing the paper from Andrade et al (Nat Med, 2020, 26:1929-1940), which describes longer circulation times (less cytoadherence) by parasites in the dry season (asymptomatic patients) than in febrile patients in the wet season (stronger cytoadhesion of younger stages). This would seem to be consistent with the data presented here.

      We are aware of the Andrade study, but chose not to cite it in this context since the reported differences in cytoadhesion appear more consistent with PfEMP1 expression levels, as hypothesized by the authors, than with altered trafficking.

      Reviewer #2 (Recommendations for the authors):

      General comments on the text:

      (1) "Approximately 10% of the proteins encoded by P. falciparum are predicted to be exported beyond the parasite plasma membrane (PPM) into the parasitophorous vacuole lumen (PVL) and subsequently across the parasitophorous vacuole membrane (PVM) into the RBC cytosol."

      To my knowledge, it has not been really demonstrated that all exported proteins take this route (transfer step in the PVL), and how transmembrane proteins transfer from the parasite to the erythrocyte is still poorly understood. I recommend that the authors rephrase this for precision.

      We agree with this reviewer and will change the statement.

      Changes:

      We have clarified these statements to accurately reflect the current understanding of protein export. Approximately 10% of P. falciparum encoded proteins are predicted to be exported beyond the parasite plasma membrane, with many thought to pass through the parasitophorous vacuole lumen (PVL) and parasitophorous vacuole membrane (PVM) into the RBC cytosol, although the exact routes for transmembrane proteins are not fully understood.”

      (2) "Charnaud et al. 25, but not Cobb et al. 26, found HSP70x to be essential for normal PfEMP1 trafficking, although both studies concluded that HSP70x is dispensable for intraerythrocytic parasite growth at 37 {degree sign}C."

      The trafficking block in Charnaud is likely due to a delay in parasite development and cannot thus really be directly related to PfEMP1 trafficking.

      Charnaud et al., report: “Microscopy of Giemsa stained IE indicated that ΔHsp70-x appeared similar to CS2 with no obvious abnormalities (Fig 2c). To more accurately quantify changes in maturation through the cell cycle, the DNA content of parasites stained with ethidium bromide was measured by flow cytometry (Fig 2d). This indicated that most parasites had the same DNA content at each timepoint and were maturing at the same rate.”

      Thus, we cannot conclude that the trafficking phenotype reported in the Charnaud study can be attributed to a growth delay. This is also supported by only minor changes in the transcriptome, which would likely be more widely perturbed if there was a significant growth delay. However, we will change the statement “Charnaud et al., found HSP70x to be essential for normal PfEMP1 trafficking”, to ”…important for PfEMP1 trafficking” to more precisely reflect the data.

      (3) "NanoLuciferase (NanoLuc) fusion proteins and compartment-specific isolation confirmed a greater abundance of PfEMP1 in the RBC cytosol following heat stress."

      Please see my comments about the differentiation between soluble and TM-containing proteins. One would expect that PfEMP1 is membrane-integrated, and thus should not be found in the cytosol (implying a soluble form).

      See our response above.

      (4) "Importantly, heat stress did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1)."

      The authors should constrain this statement to the time frame in which the heat-shock was given. Previous publications have shown a speeded-up development only in younger-stage parasites, which the authors did not study.

      We will re-phrase.

      Changes:

      We have rephrased the sentence to clarify the time window of heat stress: ”Importantly, heat stress between 16-24 hours post-invasion did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1).” The supplementary figure title has also been updated to match.

      (5) I recommend that the authors include line numbers. This makes the reviewers' lives much easier.

      We agree and apologize for this oversight.

      We now added line numbers.

      Reviewer #3 (Recommendations for the authors):

      (1) All the experiments have been performed to a very high standard, and I have no major questions about the results. However, the paper would go up to the next level if the effect of fever temperatures on the stiffness of the iRBCs had been investigated by measuring the passage of iRBCs through an artificial spleen where a bed of metal spheres mimics interendothelial splenic slits.

      See our comment from above.

      (2) With respect to Figures 5E, 6C, and 6E, why was there not a decrease in bioluminescence levels at 39 {degree sign}C for Sap and NP40 to match the increase in EqtII?

      The assay is not performed as a sequence of permeabilisation steps. Instead, samples are split into three parallel treatments: one with EqtII, one with Saponin, and one with NP40. The protein measured in each case reflects the total released under that specific condition rather than being cumulative. Therefore, the NP40 fraction includes proteins from the Saponin-accessible compartment, the EqtII-accessible compartment, and the parasite cytosol.

      (3) In the Supplementary gene maps, I could not read the white text on the black gene boxes.

      We apologize: these have not converted well and will be altered with the revised version.

      Changes

      We have significantly increased the size of all fonts within the gene maps and improved the resolution of the figures to improve readability.

      (4) In Figure S6, why does HSP70-x look different between parts C and D IFAs, with the latter showing much more export?

      We agree these IFAs are not optimal and we will provide better images.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These figures now include multiple images of HA-tagged staining to more accurately represent the observed localisation and export patterns.

      (5) Would the authors care to comment on what kinase might be additionally phosphorylating at 39 {degree sign}C?

      We presume these are Maurer’s clefts FIKK kinases as most of the hyperphosphorylated proteins are MC residents. However, without directly testing for this using conditional KO parasite lines, we cannot exclude that host kinases are also playing a role.

      (6) Could the additional assembly of PSAC at the iRBC membrane be important for survival at 39 {degree sign}C?

      We have tested to see if nutrient uptake helps parasite survival during heat stress in the presence of furosemide and lower nutrient concentrations, but did not see a difference in growth following heat stress compared to control temperature conditions.

      New Results:

      We have added a new supplementary figure (Supplementary Figure 4) detailing experiments testing parasite growth under altered nutrient availability using two approaches (sub-lethal furosemide concentrations or reduced-nutrient RPMI) and with or without a 40°C heat stress applied between 16-24 hpi.

      The main text now references this data: “Culturing parasites in sub-lethal furosemide concentrations or in reduced nutrient media lead to reduced parasitaemia (Supplementary Figure 4). However, the parasitaemia is not further reduced following heat stress. This shows that increased PSAC levels/activity do not enhance parasite survival under conditions of limited nutrient availability either from furosemide-induced nutrient deprivation or a reduced nutrient media composition.”

      These experiments show that nutrient uptake does not improve parasite survival during heat stress compared to control temperature conditions.

      (7) Would the authors like to speculate on how higher temperatures increase the transport of exported proteins with TMDs?

      There are many possible explanations, one of which is that unfolding of the hydrophobic TMD domains is favoured at elevated temperatures. However, we have no data to support this hypothesis and therefore refrained from particularly stating this possibility.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      The authors focused primarily on female mice without commenting on the effect that sex differences would have on their results.

      We agree that sex is an important biological variable. Our experiments were performed primarily in female mice to align with the higher prevalence of affective disorders in females and to maintain consistency across experiments. We now explicitly acknowledge this as a limitation in the Discussion and note that future studies will be needed to determine whether the projection-specific coding principles identified here generalize to male animals. Relevant literature on sex-specific mPFC→BLA/NAc function has also been incorporated.

      While the authors have identified relevant behavioral states across the various behavioral tasks, there is still a missing link between them and "emotional states" - the phrase used by them emphatically throughout the manuscript. The authors have neither provided adequate references to satisfy this gap nor shared any data pertaining to relevant readouts such as cortisol levels.

      We appreciate the reviewer’s concern regarding the use of the term “emotional states.” In the revised manuscript, we have clarified our terminology and now use “behavioral states associated with affective valence” where appropriate. We have also added references supporting the use of open field center vs. corner occupancy, elevated plus maze performance, and social interaction assays as established proxies for anxiety-like and affect-related behaviors.

      Importantly, to provide physiological support for these interpretations, we now include data showing that repeated win/loss outcomes in the tube test are associated with increased corticosterone levels in loser mice. These results indicate that the behavioral manipulations used in this study are accompanied by measurable physiological changes linked to stress-related processes.

      Both the projection-specific recordings and patch-clamp experiments, including histology reports in the manuscript, would provide essential information for anyone trying to replicate the results, especially since it's known that sub-populations in the BLA and NAc can have vastly different functions.

      We agree that detailed reporting of projection targeting is important for reproducibility. We have expanded the Methods and Results to more clearly describe viral targeting, recording locations, and histological verification of mPFC projections to the lateral BLA and NAc shell. We also now explicitly acknowledge the anatomical and cellular heterogeneity within these regions as a limitation and discuss this as an important direction for future work.

      The population-level analysis in the manuscript requires more rigor to reduce bias and statistical controls for establishing the significance of their results.

      We have strengthened the statistical analyses throughout the manuscript. Specifically, we have incorporated permutation-based controls for key analyses, clarified how behavioral and neural features were defined, and provided additional details on dimensionality reduction and clustering approaches. Exact p values, sample sizes, and statistical tests are now reported throughout the manuscript and figure legends.

      Lastly, the tube test is used as a manipulation of the "emotional state" in several of the experiments. While the tube test can cause a temporary spike in anxiety of the participating mice, it is not known to produce a sustained effect - unless there are additional interventions such as forced social defeat. Thus, additional controls for these experiments are essential to support claims based on changes in the emotional state of mice.

      We agree that the tube test is not a classical chronic stress paradigm such as social defeat. In our study, the tube test was used to establish social hierarchy rather than to model sustained stress. We have revised the manuscript to clarify this point and have tempered our language accordingly. At the same time, our corticosterone measurements indicate that repeated social competition induces measurable physiological changes, suggesting that the paradigm captures aspects of social hierarchy–related stress. We now frame these effects conservatively and acknowledge the need for future studies using additional stress paradigms.

      Apart from the methodology, the manuscript could also be improved with the addition of clear scatter points in all the plots along with detailed measures of the statistical tests such as exact p values and size of groups being compared.

      We have revised all figures to include individual data points (scatter overlays) wherever appropriate and have improved reporting of statistical details, including exact p values and group sizes, to enhance transparency and reproducibility.

      Taken together, these revisions clarify our interpretations, improve methodological transparency, and strengthen the rigor of the analyses while preserving the main conclusions of the study.

      Reviewer #2 (Public Review):

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within each region.

      We agree that both the basolateral amygdala (BLA) and nucleus accumbens (NAc) are highly heterogeneous. Our study was designed to focus on projection-defined mPFC outputs (presynaptic activity) rather than resolving postsynaptic subregional or cell-type diversity. We have now:

      - Clarified targeting strategies (PL→NAc shell and PL→BLA basal region)

      - Added histological descriptions of injection and recording sites

      - Expanded the Discussion to acknowledge how subregional and cellular heterogeneity may contribute to the observed variability

      We also highlight this as an important direction for future work.

      The electrophysiological data have significant experimental confounds and more methodological information is required to support other conclusions related to these data.

      We have significantly strengthened the electrophysiological component by:

      - Providing detailed recording conditions (access resistance, membrane properties, inclusion criteria)

      - Clarifying stimulus protocols and normalization procedures

      - Including representative traces and quantification of exclusion rates

      - Addressing potential confounds such as viral expression variability and stimulation parameters

      These revisions improve both interpretability and reproducibility of the electrophysiological findings.

      Reviewer #3 (Public Review):

      Major Weaknesses:

      (1) The manuscript does not clearly and consistently specify the sex of the mice used for behavioral and imaging experiments. Given the known influence of sex on emotional behaviors and neural activity, this omission raises concerns about the generalizability of the findings. The authors should make clear throughout the manuscript whether male, female, or mixed-sex cohorts were used and provide a rationale for their choice. If only one sex was used, the potential limitations of this approach should be explicitly discussed.

      We agree that sex is an important biological variable. We have now clearly specified throughout the manuscript that experiments were performed primarily in female mice and have added a rationale for this choice in the Methods. Briefly, we focused on females to align with the higher prevalence of affective disorders in females and to maintain consistency across experiments. We now explicitly acknowledge this as a limitation in the Discussion and note that future studies will be needed to determine whether these findings generalize to male animals.

      (2) Mice lacking "center-ON" neurons were excluded from analysis, yet the manuscript draws broad conclusions about the encoding of emotional states by mPFC pathways. It is critical to justify this exclusion and discuss how it may limit the generalizability of the findings. The inclusion of data or contextualization for animals without center-ON neurons would strengthen the interpretation.

      We thank the reviewer for raising this important point. Mice lacking identifiable center-ON neurons were excluded from analyses that specifically relied on this functional classification, as inclusion of such datasets would preclude meaningful comparison of this neuronal population. We have now clarified this criterion in the Methods and Results. Importantly, this exclusion does not affect analyses performed at the population level or those not dependent on center-ON classification. We now explicitly discuss this limitation and note that variability in the presence of center-ON neurons may reflect biological heterogeneity across animals.

      (3) The manuscript lacks baseline activity comparisons for mPFC→BLA and mPFC→NAc pathways across subjects. Providing baseline data would contextualize the observed activity changes during behavior testing and help rule out inter-individual variability as a confounding factor.

      We have added baseline comparisons of mPFC→BLA and mPFC→NAc activity across subjects to control for inter-individual variability and better contextualize behavior-related changes.

      (4) Extensive behavioral testing across multiple paradigms may introduce stress and fatigue in the animals, which could confound the induction of emotional states. The authors should describe the measures taken to minimize these effects (e.g., recovery periods, randomized testing order) and discuss their potential impact on the results.

      We now provide detailed descriptions of experimental design, including habituation, randomized testing order, and recovery periods between assays. We also discuss potential cumulative stress effects as a limitation.

      (5) Grooming is described as a "non-anxiety" behavior, which conflicts with its established role as a stress-relieving behavior that may indicate anxiety. This discrepancy requires clarification, as the distinction is central to the conclusions about the mPFC→BLA pathway's role in differentiating anxiety-related and non-anxiety behaviors.

      We thank the reviewer for this important clarification. We agree that grooming can be associated with both stress-related and self-soothing behaviors. In the revised manuscript, we have clarified that grooming is not strictly a “non-anxiety” behavior but instead represents a distinct behavioral state that may reflect stress regulation or internal state transitions. We have revised the text accordingly to avoid oversimplification and to better align with the literature.

      (6) While the study highlights pathway-specific neural activity, it lacks a cohesive integration of these findings with the behavioral data. Quantifying the overlap or decorrelation of neuronal activity patterns across tasks would solidify claims about the specialization of mPFC→NAc and mPFC→BLA pathways. Likewise, the discussion should be expanded to place these findings in light of prior studies that have probed the roles of these pathways in social/emotion/valence-related behaviors.

      We agree that stronger integration between neural and behavioral findings would strengthen the manuscript. In the revised version, we have added quantitative analyses examining the similarity and divergence of activity patterns across behavioral contexts (e.g., cross-context comparisons and correlation-based analyses). We have also expanded the Discussion to better integrate our findings with prior studies on mPFC→NAc and mPFC→BLA pathways in reward, aversion, and social behavior, thereby providing a more cohesive interpretation of pathway-specific functions.

      Minor Weaknesses:

      (1) The manuscript does not explicitly state whether the same mice were used across all behavioral assays. This information is critical for evaluating the validity of group comparisons. Additionally, more detail on sample sizes per assay would improve the manuscript's transparency.

      (2) In Figure 2G, the difference between BLA and NAc activity during exploratory behaviors (sniffing) is difficult to discern. Adjusting the scale or reformatting the figure would better illustrate the findings.

      (3) While the characteristics of the first social stimulus (M1) are specified, there is no information about the second social stimulus (M2). This omission makes it difficult to fully interpret the findings from the three-chamber test.

      (4) The methods section lacks detailed information about statistical approaches and animal selection criteria. Explicitly outlining these procedures would improve reproducibility and clarity.

      We have addressed all these minor concerns, including:

      - Clarifying whether the same mice were used across assays

      - Reporting sample sizes for each experiment

      - Improving figure clarity (e.g., scaling, labeling, scatter points)

      - Providing details for social stimuli (M1 vs. M2)

      - Expanding statistical methods and animal selection criteria

      Summary

      In summary, we have made substantial revisions to:

      - Improve conceptual precision (behavior vs. emotional state)

      - Increase methodological transparency and statistical rigor

      - Strengthen physiological validation

      - Clarify experimental design and limitations

      - Enhance integration with existing literature

      We believe these revisions significantly improve the clarity, rigor, and interpretability of the manuscript, and we are grateful for the reviewers’ guidance in strengthening this work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which significantly contributed to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and describe the revisions that were incorporated.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We addressed this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We included a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10<sup>2</sup> TCID<sub>50</sub>/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We incorporated these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections was revised accordingly, and the discussion was extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we clarified our conclusions where needed and ensured that interpretations were better aligned with the data shown.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Introduction, more details on the experimental model would be appreciated. A short summary of findings obtained with this model in previous works from the authors would help to better understand the context of the study.

      Basic information on the model was added in the Introduction section of the revised manuscript.

      (2) In Figure 1, the addition of more time points on the x-axes would help the interpretation of the figures.

      We agree and have added extra time points to the x-axes.

      (3) To better understand the results in Figure 2A, a figure showing cytokine levels post-Estonia infection of only challenged pigs would help, indicating protected and non-protected animals as in Figure 2C. This figure would be better linked to the corresponding dot plot (Figure 2B).

      Our statistical analyses in Figure 2A are based on using both challenged and non-challenged pigs to assess differences between SPF and farm pigs. We prefer not to remove the non-challenged pigs in order to avoid losing statistical power. Moreover, even when non-challenged and challenged pigs are displayed in the plots, upregulation of IFN-α and IL-8 can be visualized and remains consistent with the positive and negative correlates of protection shown in Figure 2C.

      (4) Dark red colour associated with SPF non-protected is difficult to differentiate from light red in some figures.

      We thank the reviewer for this remark. To preserve the color scheme across the paper, we changed the circle data points to squares for the non-protected SPF pig in the most crowded figures: Figures 1–3 and Supplementary Figures 2 and 8.

      (5) In Supplementary figures 12-16, grouping of the animal numbers (SPF vs farm) would facilitate the interpretation of the results.

      Information on the animal numbers for each group (SPF vs. farm) has been added to the figure captions.

      (6) Are the results shown in Figure 8 based on absolute scores as mentioned? Results from 0 dpc are not shown. Is that correct?

      That is correct. BTM expression values are absolute and could not be normalized, as RNA was not isolated either immediately before the challenge or on day 0 post-challenge. This information is now clarified in the figure captions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors use the words "predicted" and "predicts" although they haven't used any methods to show that this is true, such as a multivariate analysis. I don't think correlation coefficients are sufficient to indicate prediction. This needs to be fixed.

      We agree with this and have made changes in the text to avoid this impression.

      (2) "Lower baseline immune activation was linked to increased protective immunity." Presumably, the authors mean prior to challenge, not prior to "vaccination"?

      In this sentence written in the Abstract, we refer to baseline immune activation in the steady state, i.e., prior to any infection, as demonstrated in a previous study by Radulovic et al. (2022). The sentence was adapted accordingly. This concept is further explored in the Discussion section.

      (3) The abstract mentioned the comparison between farm and SPF pigs, but didn't provide any context for those findings. It could be added here.

      In the new version, we have added information on this model in the Introduction section.

      (4) Figure legends need N to be indicated. For example, the viral load figures don't appear to be representative of all 9 or 5 animals. Is there a reason why not all were challenged, and how were those 5 challenged selected?

      Numbers of animals in each group were added to the figure captions. We have also provided details regarding the animals sacrificed at different time points of the experiment in the ‘Animal experiment’ section of the Methods.

      (5) 1A doesn't have a legend to indicate whether dark or light color indicates sampling.

      Fair point. We have added the information to the figure.

      (6) For Figure 3C, it's not clear how the correlation is presented. The legend indicates in writing that the color indicates the outcome it correlates with, but the legend suggests that it is r.

      The method of presenting correlation data is consistent across all figures, including Figure 3C. The color reflects the direction and strength of the correlation, corresponding to the r coefficient obtained from correlating immunological parameters with clinical scores. We have clarified this description in the figure caption to improve readability.

      (7) For some of the correlation data in 2D and 3C, it would be nice to provide the plots in the supplemental. Also, are there enough data points for a robust interpretation of correlation curves?

      We agree that providing the plots will improve clarity and have included them in the supplementary material. While we acknowledge that the number of data points is modest, we believe it is sufficient to support a robust interpretation of the correlation curves. Corresponding p-value cutoffs are noted in the figure captions.

      (8) The figure 2C method of indicating significance is confusing. There must be a clearer way to present this figure.

      Analyzing statistical significance for the dataset shown in Figure 2C is challenging due to the small number of animals. We carefully considered alternative ways of presenting statistical significance, however, given the limited group sizes, we believe that the current approach provides the most transparent and informative representation of the data.

      For clarity, we divided the animals into SPF and farm groups, as well as into protected (4 SPF, 2 farm pigs) and non-protected (1 SPF, 3 farm pigs) categories, and performed both group-based (unpaired t-test) and time-based (mixed-effects analysis) comparisons. All significant differences were added to the plots so that readers could directly visualize the observed trends and compare them with the correlation analysis presented in Figure 2D.

      (9) Please note that "viremia" means the presence of a virus specifically in the blood. Other descriptions of viral load should be used if this was not measured.

      We have clarified this in the text. When referring to organs, we use the term “viral loads.”

      (10) The way of putting a square around boxes that are significant can be misleading when a box is surrounded by other significant comparisons. Like for Figure 6B - probably all of these are really significant, but I can't tell for sure.

      Good point. We changed rectangles to circles for better readability of the figures.

      (11) There is a potential argument that these correlates of protection might only be valid for this specific vaccine. It should be noted that comparisons of multiple vaccines would be needed before assuming the correlates are broadly relevant.

      We agree with this statement and address it in the Discussion section.

      (12) For the circled pathways in Figure 9, it is not clear from the diagram if there is a directionality to the involvement of those pathways. Modulated or induced?

      When discussing pathways identified by transcriptome analysis, we are always referring to their induction, as this is based on the normalized enrichment score (NES). We have now specified this in the figure caption.

      (13) The authors speculate about NK cells, but this is based on transcriptional pathways identified and the literature. Is there any indication from the flow cytometry data whether activated NK cells versus NKT cells are associated with protection? Also, the memory phenotype of those cells?

      Regarding NK cells, the BTM analysis was corroborated by the flow cytometry data shown in Supplementary Figure 8. NK cells were defined as CD3<sup>-</sup>CD8α<sup>+</sup>. Specific markers to distinguish NKT cells or to assess memory phenotypes were not included in our panel.

      (14) In the discussion, "Our study demonstrates that T cell activation represents a robust correlate of protection against ASFV" doesn't indicate whether they mean after vaccination or after challenge. Re-using the same time points throughout the manuscript compounds this confusion.

      In this case, we mean that T cell activation upon immunization/vaccination and challenge correlates with protection. This information has been added to the sentence. Although some time points overlap between the immunization and challenge phases, we consistently use “dpi” and “dpc” to clearly distinguish them.

      (15) Flow cytometry gating strategies should be provided in the supplemental, particularly since this species is less frequently studied using flow cytometry; it would be helpful to understand gating and expression levels of key markers.

      We have provided the gating strategy in Supplementary Figure 7, which is also referenced in the “Flow cytometry and hematology analysis” section of the Methods.

      (16) Some of the discussion is a bit long and repetitive - e.g. the parts on antibodies and the last paragraph with multiple other parts of the discussion and manuscript.

      While we agree that some sections are extensive, we think that this level of detail is necessary to integrate the different datasets and to place our findings in the context of previous literature.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Thank you for your comments.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Thank you for your help in improving our manuscript

      Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein. Unlike previously reported receptor-like proteins with large ligand-binding domains, the NtRLP4 here has a malectin LRR domain. Interestingly, it also associates with the adaptor SOBIR1. While the function of this protein remains to be further explored, the authors provide strong evidence showing it's the target of salivary proteins as the insects' survival strategy.

      Thank you for your comments.

      Major points:

      The authors mixed the concepts of LRR-RLPs with malectin LRR-RLPs. These are two different type of receptors. While LRR-RLPs are well studied, little is known about malectin LRR-RLPs. The authors should not simply apply the mode of function of LRR-RLPs to RLP4 which is a malectin LRR-RLP. In addition, LRR-RLPs that function as ligand-binding receptors typically possess >20 LRRs, whereas RLP4 in this work has a rather small ectodomain. It remains unclear whether it will function as a PRR. I can't agree with the author's logic of testing uninfested plants for proving a PRR's function. The function of a pattern recognition receptor depends on perceiving the corresponding ligand. As shown by the data provided, RLP4-OE plants have altered transcriptional profile indicating activated defense, suggesting it's unlikely a PRR. An alternative explanation is needed. More work on BAK1 will also help to clarify the ideas proposed by the authors.

      We sincerely thank the reviewer for the insightful and constructive comments, which have helped us critically re-evaluate our interpretation of RLP4 function. In the revised manuscript, we have addressed this important point by adding a detailed discussion of an alternative explanation for RLP4’s role in plant defense. Specifically, we now explicitly distinguish between classical LRR-RLPs and malectin-domain-containing RLPs, and we acknowledge that RLP4 may not function as a canonical PRR. We also discuss the structural features of RLP4, including its malectin-like domain and relatively small LRR region, and the observation that NtRLP4 overexpression lines exhibit altered transcriptional profiles even in the absence of insect infestation. Based on these lines of evidence, we propose that RLP4 may instead act as a regulatory component within plant immune signaling networks, modulating defense outputs rather than functioning as a direct ligand receptor. The revised discussion now reads as follows: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al., investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Thank you for your comments.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Two minor comments:

      In line 140, yeast two-hybrid (Y2H) was used to screen for interacting proteins in plants. However, it is generally difficult to identify membrane receptors using Y2H. Please provide more methodological details to justify this approach, or alternatively, include a discussion explaining this.

      Thank you for pointing this out. It is true that Y2H is generally difficult to identify membrane receptors. To address this limitation, we used truncated versions of RLP4s lacking the signal peptide and transmembrane domains in point-to-point Y2H assays. In addition, the interactions between BtRDP and RLP4s were further validated by Co-IP and BiFC experiments. In the revised manuscript, we have clarified this methodological detail as follows: “Given that Y2H is generally difficult to identify membrane receptors, the truncated versions of NtRLP4/SlRLP4/OsRLP4 lacking the signal peptide and transmembrane domains were used” in Linr 636-638.

      In Figure S12C, the interaction between the two proteins appears to be present in the nucleus as well. Please provide a possible explanation for this observation.

      Thank you for pointing this out. During revision, we further examined the subcellular localization of NtRLP4 and found that NtRLP4-GFP could also be detected in the nucleus when expressed alone (Fig. S18), suggesting that NtRLP4 may have additional functions beyond serving as a cell surface pattern recognition receptor. In the revised manuscript, we discussed that NtRLP4 might play other roles in addition to PRRs in the discussion section as follow: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed all my concerns.

      Thank you for your help in improving our manuscript

      Reviewer #2 (Recommendations for the authors):

      This work is quite interesting. It's not necessary to prove RLP4 as a PRR to show the merit of this discovery. The current logic is forced and thus the conclusion not convincing. Finding an alternative explanation will be more helpful.

      Thank you for your valuable suggestions. In the revised version, we discussed the alternative explanation as follow: “Together, this study reveals that suppressing PRR-mediated plant immunity may be a conserved strategy employed by herbivorous insects for successful feeding. We demonstrate that whiteflies and planthoppers have independently evolved salivary effectors that facilitate the ubiquitin-dependent degradation of defensive RLP4 in host plants, thereby dampen RLP4-mediated plant immunity (Fig. 6). Nevertheless, the precise mechanism by which RLP4 contributes to plant defense warrants further consideration. While it may function as a canonical PRR that perceives insect-derived molecular patterns, several lines of evidence point to an alternative interpretation. Structurally, RLP4 differs from classical LRR-RLP: it contains a malectin-like domain and a relatively small LRR domain, contrasting with typical LRR-RLPs that often possess large LRRs dedicated to ligand binding. Functionally, NtRLP4 overexpression lines exhibit significantly altered transcriptional profiles and dysregulated SA/JA pathways even in the absence of insect infestation, a phenotype inconsistent with canonical PRRs, which typically remain quiescent until ligand perception. These findings point to an alternative explanation: rather than functioning as a classical PRR that recognizes insect-derived molecules, RLP4 may act as a regulatory component within plant immune signaling networks. Elucidating the precise mechanism of RLP4 in conferring plant defense against herbivorous insects will therefore be an important focus of future research” in Line 392-407.

      Inappropriate descriptions still exist at multiple places across the manuscript and damages the merit of this work. I highly recommend the authors to consult an expert in plant PRR research for proof reading. The language editing service the authors used only provided limited help in this case. Here are a few examples:

      We sincerely thank the reviewer for the critical and constructive comments. We agree that precise language is essential for conveying scientific findings. In the revised version, we have refined the text with the help of colleagues who have expertise in plant immunity, aiming to ensure the descriptions are as precise and professional as possible.

      Line 16: Using "depend" ignores the fact that many biotic invaders are recognized by NLRs. The authors can simply use the word "use" or "utilize".

      Thank you for your suggestion. We corrected it in the revised version.

      Line 20:"target defensive RLP4, therefor minimizing the plant immunity" is a strange saying. "dampen RLP4-mediated plant immunity"will be better.

      Thank you for your suggestion. We corrected it in the revised version.

      Line 49: as far as I know, only LRR-RLPs use SOBIR1 as adaptor. The authors should introduce this specific point. The mode of action of other type of LRR-RLPs are less clear.

      Thank you for your suggestion. In the revised version, we re-introduce this as follow: “As RLPs lack the intracellular signaling domains, they are anticipated to associate with adaptor kinases to form the bimolecular receptor kinases. For example, suppressor of BAK1-interacting receptor-like kinase 1 (SOBIR1) is reported to act as a common adaptor for most, if not all, of the leucine-rich repeat RLP (LRR-RLP)” in Line 48-52, “The receptor-like kinase SOBIR1, which contained a kinase domain, has been widely reported to be required for the function of LRR-RLPs in the innate immunity. However, whether SOBIR1 interacted with malectin-LRR RLP remains largely unknown” in Line 170-173.

      Line 67: There are quite a few publications showing that insect salivary proteins dampen plant immunity.

      Sorry for the inaccurate description. We agree that an accumulated literature describes the suppression of plant immunity by insect salivary proteins. However, the specific molecular mechanism by which these proteins target plant PRRs is still poorly understood. In the revised version, we specified that “it remains largely unknown how insects cope with plant PRRs” in Line 68-69.

      Line 149: I don't understand what "point-to-point Y2H" is.

      Thank you for your comment. We agree that the term "pairwise Y2H" is more commonly used in the literature than "point-to-point Y2H." To avoid any confusion and to align with standard terminology, we have replaced "point-to-point Y2H" with "pairwise Y2H" throughout the revised manuscript.

      Line 179: Replace with "NtRLP4 and NtSOBIR1 confers resistance to B. tabaci". You don't say a protein is resistant to a insect infestation. The same applies for Line 209-210.

      Thank you for your suggestion. We corrected it in the revised version.

      Minor points:

      Line 91-92: Lengthy text for simple results.

      Line 98: "which was significantly different from the actin or ribosomal 18S rRNA" can be deleted. It's self-evident that actin and 18S rRNA are controls. The same applies to Line 101.

      Line 130: unnecessary sentence, delete.

      The use of verb forms needs further correction.

      Thank you for your valuable suggestion. In the revised manuscript, we have revised the text accordingly. We truly appreciate your help in improving our manuscript.

    1. Author response:

      eLife Assessment

      This study uses a Bayesian framework to characterize latent brain state dynamics associated with memory encoding and performance in children, as measured with functional magnetic resonance imaging. The novelty of the approach offers valuable insights into memory-related brain activity, but the consideration of developmental changes in memory and brain dynamics, and the evidence to support the proposed mapping between specific states and distinct aspects of memory, are incomplete. This work will be of interest to researchers interested in cognitive neuroscience and the development of memory.

      We are grateful to the editor and reviewers for their positive feedback and constructive evaluation. Their comments have identified important areas where the manuscript can be strengthened. Below, we outline our planned revisions.

      Reviewer #1 (Public review):

      Zeng et al. characterized the dynamic brain states that emerged during episodic encoding and the reactivation of these states during the offline rest period in children aged 8-13. In the study, participants encoded scene images during fMRI and later performed a memory recognition test. The authors adopted the BSDS approach and identified four states during encoding, including an "active-encoding" state. The occupancy rate of, and the state transition rates towards, this active-encoding state positively predicted memory accuracy across participants. The authors then decoded the brain states during pre- and post-encoding rests with the model trained on the encoding data to examine state reactivation. They found that the state temporal profile and transition structure shifted from encoding to post-encoding rest. They also showed that the mean lifetime and stability (measured with self-transition probability) of the "default-mode" state during post-encoding rest predict memory performance. How brain dynamics during encoding and offline rest support long-term memory remains understudied, particularly in children. Thus, this study addresses an important question in the field. The authors implemented an advanced computational framework to identify latent brain states during encoding and carefully characterized their spatiotemporal features. The study also showed evidence for the behavioral relevance of these states, providing valuable insights into the link between state dynamics and successful encoding and consolidation.

      We thank Reviewer #1 for the positive feedback on our study. And we would like to thank you for the reviewer's constructive feedback. We plan to incorporate detailed methodological justifications and a thorough limitation analysis. We also plan to enhance the overall logical coherence of the manuscript, ensuring a more robust and scientifically sound presentation.

      Weaknesses:

      (1) If applicable, please provide information on the decoding performance of states during pre- and post-encoding rests. The Methods noted that the authors applied a threshold of 0.1 z-scored likelihood, and based on Figure S2, it seems like most TRs were assigned a reinstated state during post-encoding rest. It would be useful to know, for the decodable TRs, how strong the evidence was in favor of one state over others. Further, was decoding performance better during post- vs. pre- encoding rest? This is critical for establishing that these states were indeed "reinstated" during rest. The authors showed individual-specific correlations between encoding and post-encoding state distribution, which is an important validation of the method, but this result alone is not sufficient to suggest that the states during encoding were the ones that occurred during rest. The authors found that the state dynamics vary substantially between encoding and rest, and it would be helpful to clarify whether these differences might be related to decoding performance. I am also curious whether, if the authors apply the BSDS approach to independently identify brain states during rest periods (instead of using the trained model from encoding), they find similar states during rest as those that emerged during encoding?

      We plan three additional analyses to strengthen the evidence for state reinstatement during rest: First, we will report quantitative decoding confidence metrics for each decoded time point, including the log-likelihood between the winning state and the next-best state. We will compare these distributions between pre- and post-encoding rest to test whether decoding quality differs between conditions, as the reviewer suggests. Second, we will provide a more detailed characterization of the decoding process, including the proportion of TRs that survive the log-likelihood threshold of 0.1 during pre- vs. post-encoding rest and whether this proportion relates to memory performance. Third, we will train an independent BSDS model directly on the rest data (rather than using the encoding-trained model) and assess the degree of correspondence between the independently discovered rest states and the encoding states in terms of amplitude profiles and covariance structures. Convergence between the two approaches would provide strong validation that the encoding-defined states genuinely re-emerge at rest. Together with our evidence from our previous analyses, these additional analyses will strengthen our claims.

      (2) During post-encoding rest, the intermediate activation state (S1) became the dominant state. Overall, the paper did not focus too much on this state. For example, when examining the relationship between state transitions and memory performance, the authors also did not include this state as a part of the analyses presented in the paper (lines 203-211). Could the author report more information about this state and/or discuss how this state might be relevant to memory formation and consolidation?

      We thank the reviewer for this suggestion. During encoding, S1 had the lowest occupancy (~10%) and showed no significant relationship with memory performance, which led us to interpret it as a non-essential transient configuration. In the revision, we will provide a more thorough characterization of S1, and conduct correlation analyses to probe whether its dynamic properties during post-encoding rest correlate with individual memory performance.

      (3) Two outcome measures from the BSDS model were the occupancy rate and the mean lifetime. The authors found a significant association with behavior and occupancy rate in some analyses, and mean lifetime in others. The paper would benefit from a stronger theoretical framing explaining how and why these two different measures provide distinct information about the brain dynamics, which will help clarify the interpretation of results when association with behavior was specific to one measure.

      We thank the reviewer for this suggestion. Occupancy rate and mean lifetime, while related, capture fundamentally different aspects of brain state dynamics. Occupancy rate reflects the total proportion of time the brain spends in a given state, capturing the overall prevalence of that configuration across the scanning session. Mean lifetime, by contrast, measures the average uninterrupted duration of each state visit, indexing the temporal stability or persistence of a given network configuration once it is entered. Critically, two states could have identical occupancy rates but very different mean lifetimes, a state visited frequently but briefly versus one visited rarely but sustained, implying distinct underlying neural dynamics. In the context of memory, high occupancy of the active-encoding state may reflect repeated engagement of encoding-optimal circuits, while long mean lifetime of the default-mode state during rest may reflect sustained consolidation-related processing. We will expand the theoretical framework in the revised manuscript to articulate these distinctions and connect them to extant findings suggesting that temporal stability versus frequency of state visits may have dissociable behavioral correlates in working memory and episodic memory (He et al., 2023; Stevner et al., 2019).

      (4) For performance on a memory recognition test, d' is a more common metric in the literature as it isolates the memory signal for the old items from response bias. According to Methods (line 451), the authors have computed a different metric as their primary behavioral measure (hits + correction rejections - misses - false alarms). Please provide a rationale for choosing this measure instead. Have the authors considered computing d' as well and examining brain-behavior relationships using d'?

      Our primary memory recognition metric computed as (hits + correct rejections − misses − false alarms) / total trials, provides an unbiased linear estimate of discrimination ability that is mathematically consistent with d' in directional effects. We selected this measure because it is particularly robust with limited trial counts per condition (Verde et al., 2006; Wickens, 2001). Nonetheless, we agree that reporting d' is important for comparability with the broader literature. In the revision, we will compute d' for each participant and conduct parallel brain–behavior correlation analyses to demonstrate that our findings are robust across both metrics.

      (5) While this study examined brain state dynamics in children, there was no adult sample to compare with. Therefore, it is hard to conclude whether the findings are specific to children (or developing brains). It would be helpful to discuss this point in the paper.

      We thank the reviewer for raising this point. While several studies have documented memory-related replay and reinstatement in adults at both the regional and systems levels(Tambini et al., 2017; Wimmer et al., 2020), few have examined whether analogous state-level reinstatement occurs in children. Our study was motivated by this gap: we sought to test whether children show dynamic brain state reinstatement mechanisms similar to those described in adults. However, we acknowledge that without a direct adult comparison, we cannot determine whether the observed patterns are unique to children or reflect general principles of episodic memory organization. In the revised manuscript, we will: (a) frame the study more carefully as examining whether established state-level consolidation mechanisms also operate during childhood, (b) discuss findings in relation to adult studies, and (c) include exploratory analyses of age-related variability in both memory performance and BSDS dynamics within our sample, while acknowledging that the narrow age range (8–13) and small sample size limit the power of such developmental analyses. We will clearly identify the absence of an adult comparison as a limitation.

      Reviewer #2 (Public review):

      This paper investigates the latent dynamic brain states that emerge during memory encoding and predict later memory performance in children (N = 24, ages: 8 -13 years). A novel computational approach (Bayesian Switching Dynamic Systems, BSDS) discovers latent brain states from fMRI data in an unsupervised and parameter-free manner that is agnostic to external stimuli, resulting in 4 states: an active-encoding state, a default-mode state, an inactive state, and an intermediate state. The key finding is that the percentage of time occupied in the active-encoding state (characterized by greater activity in hippocampal, visual, and frontoparietal regions), as well as greater transitions to this state, predicts memory accuracy. Memory accuracy was also predicted by the mean lifetime and transitions to the default-mode state (characterized by greater activity in medial prefrontal cortex and posterior cingulate cortex) during post-encoding rest. Together, the results provide insights into dynamic interactions between brain regions that may be optimal for encoding novel information and consolidating memories for long-term retention.

      We thank Reviewer #2 for recognizing the novelty and broader utility of our methodology and for noting that the manuscript is well-written and concise.

      Weaknesses:

      (1) The study focuses on middle childhood, but there is a lack of engagement in the Introduction or Discussion about what is known about memory development and the brain during this period. Many of the brain regions examined in this study, particularly frontoparietal regions, undergo developmental changes that could influence their involvement in memory encoding and consolidation. The paper would be strengthened by more directly linking the findings to what is already known about episodic memory development and the brain.

      We thank the reviewer for this suggestion. In response, we will substantially expand the Introduction and Discussion to situate our findings within the developmental cognitive neuroscience literature on episodic memory. In particular, we will address the protracted developmental trajectory of frontoparietal regions, the well-documented maturation of hippocampal–cortical connectivity during middle childhood, and how these developmental changes may influence the brain state configurations we observed (He et al., 2023; Ryali et al., 2016). This will provide the necessary developmental context for interpreting our state dynamics results.

      (2) A more thorough overview of the BSDS algorithm is needed, since this is likely a novel method for most readers. Although many of the nitty-gritty details can be referenced in prior work, it was unclear from the main text if the BSDS algorithm discovered latent states based on activation patterns, functional connectivity, or both. Figure 1F is not very informative (and is missing labels).

      We thank the reviewer for this suggestion. We agree that a more accessible overview of the BSDS algorithm (Lee et al., 2025; Taghia et al., 2018) is needed. In the revision, we will expand the Methods and provide a concise algorithmic overview in the main text that clarifies the following key points: (a) BSDS operates on multivariate time series from the ROIs and infers latent brain states defined jointly by their mean activation patterns (amplitude vectors) and inter-regional covariance matrices (functional connectivity); (b) it employs a hidden Markov model framework with Bayesian inference and automatic relevance determination to identify the number of states without manual specification; and (c) state assignments are made at each TR, yielding a temporal sequence that enables computation of occupancy rates, mean lifetimes, and transition probabilities. We will also revise Figure 1F to include appropriate labels and a clearer schematic of the model's inputs, latent structure, and outputs.

      (3) A further confusion about the BSDS algorithm was whether it necessarily had to work on the rest data. Figure 4A suggests that each TR was assigned one of the four states based on the maximum win from the log-likelihood estimation. Without more details about how this algorithm was applied to the rest data, it is difficult to evaluate the claim on page 14 about the spontaneous emergence of the states at rest.

      The key methodological point is that the BSDS model, once trained on encoding data, can be applied to new (rest) time series via log-likelihood estimation: for each TR during rest, the model computes the log-likelihood of each state given the observed multivariate signal, and the state with the maximum log-likelihood is assigned to that TR. This "decoding" approach tests whether the spatial configurations learned during encoding are present during rest, rather than fitting new states de novo. We applied a threshold to the log-likelihood values to exclude TRs where the evidence for any single state was weak, thus controlling for potential misassignment. We will substantially clarify this process in the revised Methods and main text, and as described in our response to Reviewer #1 point 1, we will also conduct additional analyses to address the concerns raised.

      (4) Although the BSDS algorithm was validated in prior simulations and task-based fMRI using sustained block designs in adults, it is unclear whether it is appropriate for the kind of event-related design used in the current study. Figure 1G shows very rapid state changes, which is quantified in the low mean lifetime of the states (between 1-3 TRs on average) in Figure 4C. On the one hand, it is a strength of the algorithm that it is not necessarily tied to external stimuli. On the other hand, it would be helpful to see simulations validating that rapid transitions between states in fMRI data are meaningful and not due to noise.

      This is an important methodological question. The rapid state changes observed in our event-related design (mean lifetimes of 1–3 TRs) differ from the longer state durations typically observed with block designs(He et al., 2023; Zeng et al., 2024), where sustained cognitive demands stabilize brain configurations. We believe these rapid transitions are consistent with the inherent dynamics of event-related encoding, where each trial involves rapid shifts between sensory processing, memory binding, and attentional engagement. Several considerations support the meaningfulness of these transitions: (a) the identified states have interpretable amplitude profiles consistent with well-established memory-related brain systems; (b) state dynamics show statistically significant, directionally consistent correlations with subsequent memory performance; and (c) the transition structure during encoding is distinct from that observed during rest, indicating sensitivity to task demands. Nonetheless, we acknowledge the concern about noise and will conduct additional analyses in the revision to address the concerns raised.

      (5) The Methods section mentions that participants actively imagined themselves within the encoded scenes and were instructed to memorize the images for a later test during the post-encoding rest scan. This detail needs to be included in the main text and incorporated into the interpretation of the findings, as there are likely mechanistic differences between spontaneous memory replay/reinstatement vs. active rehearsal.

      We thank the reviewer for this suggestion. We will include these experimental details in the main text and incorporate it into the interpretation of our findings in the context of spontaneous memory replay/reinstatement vs. active rehearsal (Liu et al., 2019; Wimmer et al., 2020).

      (6) Information about the general linear model used to discover the 16 ROIs that showed a subsequent memory effect are missing, such as: covariates in the model (motion, etc.), group analysis approach (parametric or nonparametric), whether and how multiple-comparisons correction was performed, if clusters were overlapping at all or distinct, if the total number of clusters was 16 or if this was only a subset of regions that showed the effect.

      We apologize for the missing methodological details. In the revised manuscript, we will provide complete information on the general linear model used to identify the 16 ROIs, including: the event regressors and parametric modulators included in the model, nuisance covariates (motion parameters, white matter and CSF regressors), the group-level analysis approach and statistical thresholding, the method for multiple-comparisons correction, whether the 16 ROIs represent all significant clusters or a subset, and whether any clusters were spatially overlapping. We will also clarify how peak voxels were selected for ROI definition.

      Reviewer #3 (Public review):

      This paper uses a novel method to look at how stable brain states and the transitions between them promote memory formation during encoding and post-encoding rest in children. I think the paper has some weaknesses (detailed below) that mean that the authors fall short of achieving their aims. Although the paper has an interesting methodological approach, the authors need better logic, and are potentially "double dipping" in their results - meaning their logic is circular. I think the method that they are using could be useful to the broader neuroimaging community, although they need to make this argument clearer in the paper.

      We thank Reviewer #3 for recognizing the novelty of our approach and its potential utility for the broader neuroimaging community.

      (1) The authors use children as their study subjects but fail to reconcile why children are used, if the same phenomena are expected to be seen in adults (or only children), and if and how their findings change with age across an age range that ranges from middle childhood into early adolescence. They need to include more consideration for the development of their subject population. The authors should make it clear why and how memory was tested in children and not adults. Are adults and children expected to encode and consolidate in a similar manner to children? Do the findings here also apply to adults? How was the age range of 8-13-year-old children selected? Why didn't the authors look at change with age? Does memory performance change with age? Do the BSDS dynamics change with age in the authors' sample?

      Our study was motivated by the observation that while adult studies have documented memory replay and reinstatement, very little is known about whether these dynamic state-level mechanisms operate during middle childhood, a period characterized by substantial improvements in episodic memory ability and ongoing maturation of frontoparietal and hippocampal–cortical circuits. The age range of 8–13 was defined a priori based on typical developmental classifications of middle childhood through early adolescence, representing a period when episodic memory abilities are developing rapidly.

      In response to the reviewer's specific questions: (a) we will conduct exploratory analyses testing whether memory accuracy, BSDS state dynamics (occupancy, mean lifetime, transitions), and brain–behavior correlations vary as a function of age within our sample; (b) we will clearly discuss whether adults are expected to show similar patterns, drawing on the extant adult literature; and (c) we will acknowledge as a limitation that our sample size (N = 24) and narrow age range provide limited statistical power for detecting continuous age-related changes, and that a dedicated cross-sectional or longitudinal developmental design would be needed to draw firm conclusions about developmental trajectories. Please also see responses to Reviewer #1 point 5 and Reviewer #2 point 1.

      (2) The authors look for brain state dynamics within a preselected set of ROIs that are selected because they display a subsequent memory effect. This is problematic because the state that is most associated with subsequent memory (S3, or State 3) is also the one that shows most activity in these regions (that have already been a priori selected due to displaying a subsequent memory effect). This logic is circular. It would be helpful if they could look at brain state dynamics in a more ROI agnostic whole brain approach so that we can learn something beyond what a subsequent memory analysis tells us. I think the authors are "double dipping" in that they selected regions for further analysis based on a subsequent memory association (remembered > forgotten contrast) and then found states within those regions showing a subsequent memory effect to further analyze for being associated with subsequent memory. Would it be possible instead to do a whole-brain analysis (something a bit more agnostic to findings) using the BSDS framework, and then, from a whole-brain perspective, look for particular brain states associated with subsequent memory? As it stands, it looks like S3 (state 3) has greater overall activation in all brain regions associated with subsequent memory, so it makes sense that this brain state is also most associated with subsequent memory. The BSDS analysis is therefore not adding anything new beyond what the authors find with the simple subsequent memory contrast that they show in Figure 1C. This particularly effects the following findings: (a) active-encoding state occupancy rate correlated positively with memory accuracy, (b) transitions to the active-encoding state were beneficial / Conversely, transitions toward the inactive state (S4) were detrimental, with incoming transitions showing negative correlations with memory accuracy / The active-encoding state serves as a "hub" configuration that facilitates memory formation, while pathways leading to this state enhance performance and transitions away from it impair encoding.

      We appreciate this critique, which raises an important concern about analytical circularity.

      a) Why BSDS adds information beyond the static subsequent memory contrast. The reviewer notes that S3 (the active-encoding state) shows high activation in the same regions selected by the subsequent memory contrast, and therefore questions whether BSDS provides new information. We respectfully argue that BSDS captures dimensions of neural organization that a static contrast cannot. Specifically: (a) the subsequent memory contrast identifies which regions are differentially active for remembered vs. forgotten items, averaged across the entire encoding session, it provides no temporal information about when or for how long these regions are co-active; (b) BSDS reveals the moment-to-moment temporal evolution of brain states, including the duration and stability of each configuration (mean lifetime), which independently predicts behavior; (c) BSDS uniquely captures transition dynamics, the rates and patterns of switching between states, which we show are predictive of memory in ways not derivable from the contrast map (e.g., transitions from S2→S3 positively predict memory, transitions toward S4 negatively predict memory); and (d) BSDS characterizes the full covariance structure among regions within each state, revealing distinct connectivity patterns (e.g., the high clustering coefficient and global efficiency of S3), which are not captured by univariate activation contrasts. Thus, while the ROI selection is informed by the subsequent memory effect, the information BSDS extracts from those regions, temporal dynamics, transition patterns, and multivariate covariance, is orthogonal to the information used for selection.

      b) Additional validation. To directly address the circularity concern empirically, we will conduct additional analysis using ROIs from previous studies (e.g. network templates) / meta-analyses/Neurosynth ROIs (He et al., 2023; Meer et al., 2020; Taghia et al., 2018), without resorting to selection based on the subsequent memory contrast.

      (3) The task used to test memory in children seems strange. Why should children remember arbitrary scenes? How this was chosen for encoding needs to be made clear. There needs to be more description of the memory task and why it was chosen. Why was scene encoding chosen? What does scene encoding have to do with the stated goal of (a) "Understanding how children's brains form lasting memories", (b) "optimizing education" and (c) "identifying learning disabilities"? What was the design of the recognition memory test? How many novel scenes were included in the test, and how were they chosen? How close were the "new" images to previously seen "old" images? Was this varied parametrically (i.e., was the similarity between new and old images assessed and quantified?)

      Scene encoding was chosen for several reasons: (a) scenes are rich, complex stimuli that engage the hippocampal–parahippocampal memory system, eliciting robust subsequent memory effects suitable for BSDS modeling; (b) scene encoding recruits distributed networks spanning visual cortex, MTL, and frontoparietal regions, enabling detection of multi-region brain states; and (c) scene encoding paradigms have been widely used in both adult and developmental studies of episodic memory and replay(Tambini et al., 2017; Tompary et al., 2017), facilitating comparison with prior work.

      Regarding the recognition test: participants viewed 200 images (100 old, 100 new), with novel scenes drawn from the same categories (buildings and natural scenes) but chosen to be perceptually distinct from studied images. Similarity between old and new images was not parametrically manipulated or quantified: we will note this limitation. We will also expand the main text to include full task details and have deleted claims about implications for educational optimization and learning disability identification (see also Reviewer #3 point 7).

      (4) They ultimately found four brain states during encoding. It would be helpful if they could make the logic and foundation for arriving at this number clear.

      The number of brain states is not predetermined by the user but is automatically determined by the BSDS algorithm through Bayesian automatic relevance determination (ARD). The model is initialized with a maximum number of possible states, and during inference, states that contribute minimally to explaining the data are effectively pruned, their associated parameters are driven to near-zero by the ARD prior. In our data, the model converged on four states. This is a key advantage of BSDS over conventional HMM approaches, which require the user to specify the state number a priori. We will clarify this process in the revised Methods and Results, referencing the original BSDS methodology paper (Taghia et al., 2018) for full mathematical details.

      (5) There is already extant work on whether brain states during post-encoding rest predict memory outcomes. This work needs to be cited and referred to. The present manuscript needs to be better situated within prior work. The authors should look at the work by Alexa Tompary and Lila Davachi. They have already addressed many of the questions that the authors seek to answer. The authors should read their papers (and the papers they cite and that cite them) and then situate their work within the prior literature.

      We agree that the manuscript must be better situated within the existing literature on post-encoding rest and memory consolidation. We will revise the Introduction and Discussion to further discuss with the foundational work in adults by Tompary & Davachi (2017, Neuron; 2024, eLife) on consolidation-related hippocampal–mPFC representational overlap, as well as Tambini & Davachi (2013, PNAS; 2019, Trends in Cognitive Sciences) on hippocampal persistence during post-encoding rest and awake reactivation(Tambini et al., 2019; Tambini et al., 2017; Tompary et al., 2017). We will explicitly discuss how our BSDS-based approach to state-level reinstatement complements and extends these earlier findings, which largely focused on region-specific pattern similarity or hippocampal–cortical connectivity, by characterizing reinstatement at the level of dynamic, whole-network configurations.

      (6) The authors should back up the claim that "successful episodic memory formation critically depends on the temporal coordination between these systems. Brain regions must coordinate their activity through dynamic functional interactions, rapidly reconfiguring their activity and connectivity patterns in response to changing cognitive demands and stimulus characteristics." Do they have any specific evidence supporting this claim?

      The claim that episodic memory depends on temporal coordination and dynamic functional interactions is supported by several lines of evidence: (a) within our study, the significant correlations between state transition rates and memory performance directly demonstrate that dynamic inter-state communication predicts memory outcomes; (b) studies showing that hippocampal–prefrontal theta coherence during encoding predicts subsequent memory (e.g., Zielinski et al., 2020)(Zielinski et al., 2020); and (c) recent work demonstrating that rapid reconfiguration of large-scale brain networks supports cognitive functions including working memory (Shine et al., 2018; Braun et al., 2015)(Braun et al., 2015; Shine et al., 2018) and episodic encoding (Phan et al., 2024)(Phan et al., 2024) We will revise this passage to include specific citations and to make clear that our own transition–behavior correlations constitute direct evidence for this claim.

      (7) These claims seem overstated: "this work has broad implications for understanding memory function in children, for developing educational interventions that enhance memory formation, and enabling early identification of children at risk for learning disabilities." Can the authors add citations that would support these claims, or if not, remove them?

      We thank the reviewer for raising this point. We agree that the current framing overstates the practical implications. We have now removed these claims and remark on future studies that are needed here.

      References

      (1) Braun, U., Schafer, A., Walter, H., Erk, S., Romanczuk-Seiferth, N., Haddad, L., . . . Bassett, D. S. (2015). Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc Natl Acad Sci U S A, 112(37), 11678-11683.

      (2) He, Y., Liang, X., Chen, M., Tian, T., Zeng, Y., Liu, J., . . . Qin, S. (2023). Development of brain-state dynamics involved in working memory. Cerebral Cortex.

      (3) Lee, B., Young, C. B., Cai, W., Yuan, R., Ryman, S., Kim, J., . . . Menon, V. (2025). Dopaminergic modulation and dosage effects on brain state dynamics and working memory component processes in Parkinson’s disease. Nature Communications, 16(1), 2433.

      (4) Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e614.

      (5) Meer, J. N. v. d., Breakspear, M., Chang, L. J., Sonkusare, S., & Cocchi, L. (2020). Movie viewing elicits rich and reliable brain state dynamics. Nature Communications, 11(1), 5004.

      (6) Phan, A. T., Xie, W., Chapeton, J. I., Inati, S. K., & Zaghloul, K. A. (2024). Dynamic patterns of functional connectivity in the human brain underlie individual memory formation. Nature Communications, 15(1), 8969.

      (7) Ryali, S., Supekar, K., Chen, T., Kochalka, J., Cai, W., Nicholas, J., . . . Menon, V. (2016). Temporal Dynamics and Developmental Maturation of Salience, Default and Central-Executive Network Interactions Revealed by Variational Bayes Hidden Markov Modeling. PLoS Comput Biol, 12(12), e1005138.

      (8) Shine, J. M., & Poldrack, R. A. (2018). Principles of dynamic network reconfiguration across diverse brain states. Neuroimage, 180, 396-405.

      (9) Stevner, A. B. A., Vidaurre, D., Cabral, J., Rapuano, K., Nielsen, S. F. V., Tagliazucchi, E., . . . Kringelbach, M. L. (2019). Discovery of key whole-brain transitions and dynamics during human wakefulness and non-REM sleep. Nature Communications, 10(1), 1035.

      (10) Taghia, J., Cai, W., Ryali, S., Kochalka, J., Nicholas, J., Chen, T., & Menon, V. (2018). Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nature Communications, 9(1), 2505.

      (11) Tambini, A., & Davachi, L. (2019). Awake Reactivation of Prior Experiences Consolidates Memories and Biases Cognition. Trends in Cognitive Sciences, 23(10), 876-890.

      (12) Tambini, A., Rimmele, U., Phelps, E. A., & Davachi, L. (2017). Emotional brain states carry over and enhance future memory formation. Nature Neuroscience, 20(2), 271-278.

      (13) Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e225.

      (14) Verde, M. F., Macmillan, N. A., & Rotello, C. M. (2006). Measures of sensitivity based on a single hit rate and false alarm rate: The accuracy, precision, and robustness of′, A z, and A’. Perception & psychophysics, 68(4), 643-654.

      (15) Wickens, T. D. (2001). Elementary signal detection theory: Oxford university press.

      (16) Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J., & Dolan, R. J. (2020). Episodic memory retrieval success is associated with rapid replay of episode content. Nature Neuroscience, 23(8), 1025-1033.

      (17) Zeng, Y., Xiong, B., Gao, H., Liu, C., Chen, C., Wu, J., & Qin, S. (2024). Cortisol awakening response prompts dynamic reconfiguration of brain networks in emotional and executive functioning. Proceedings of the National Academy of Sciences, 121(52), e2405850121.

      (18) Zielinski, M. C., Tang, W., & Jadhav, S. P. (2020). The role of replay and theta sequences in mediating hippocampal-prefrontal interactions for memory and cognition. Hippocampus, 30(1), 60-72.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Lin et al. studies the role of EXOC6A in ciliogenesis and its relationship with the interactor myosin-Va using a range of approaches based on the RPE1 cell line model. They establish its spatio-temporal organization at centrioles, the forming ciliary vesicle and ciliary sheath using ExM, various super-resolution techniques, and EM, including correlative light and electron microscopy. They also perform live imaging analyses and functional studies using RNAi and knockout. They establish a role of EXOC6A together with myosin-Va in Golgi-derived, microtubule- and actin-based vesicle trafficking to and from the ciliary vesicle and sheath membranes. Defects in these functions impair robust ciliary shaft and axoneme formation due to defective transition zone assembly.

      Strengths:

      The study provides very high-quality data that support the conclusions. In particular, the imaging data is compelling. It also integrates all findings in a model that shows how EXOC6A participates in multiple stages of ciliogenesis and how it cooperates with other factors.

      Weaknesses:

      The precise role of EXOC6A remains somewhat unclear. While it is described as a component of the exocyst, the authors do not address its molecular functions and whether it indeed works as part of the exocyst complex during ciliogenesis.

      We sincerely thank Reviewer 1 for the thoughtful evaluation of our manuscript and the constructive comments provided. We are especially grateful for the recognition of the quality and significance of our imaging data and the comprehensive model we propose regarding EXOC6A’s role in ciliogenesis. We did not address the function of other components of the exocyst complex during ciliogenesis. However, in our biochemical analyses, Myosin‑Va specifically co‑immunoprecipitated with EXOC6A but not with other exocyst subunits tested (EXOC5 and EXOC7) (Fig. 4E) indicating a selective interaction between EXOC6A and the Myo‑Va transport machinery.

      Reviewer #2 (Public review):

      Summary:

      The molecular mechanisms underlying ciliogenesis are not well understood. Previously, work from the same group (Wu et al., 2018) identified myosin-Va as an important protein in transporting preciliary vesicles to the mother vesicles, allowing for initiation of ciliogenesis. The exocyst complex has previously been implicated in ciliogenesis and protein trafficking to cilia. Here, Lin et al. investigate the role of exocyst complex protein EXOC6A in cilia formation. The authors find that EXOC6A localizes to preciliary vesicles, ciliary vesicles, and the ciliary sheath. EXOC6A colocalizes with Myo-Va in the ciliary vesicle and the ciliary sheath, and both proteins are removed from fully assembled cilia. EXOC6A is not required for Myo-Va localization, but Myo-VA and EHD1 are required for EXOC6A to localize in ciliary vesicles. The authors propose that EXOC6A vesicles continually remodel the cilium: FRAP analysis demonstrates that EXOC6A is a dynamic protein, and live imaging shows that EXOC6A fuses with and buds off from the ciliary membrane. Loss of EXOC6A reduces, but does not eliminate, the number of cilia formed in cells. Any cilia that are still present are structurally abnormal, with either bent morphologies or the absence of some transition zone proteins. Overall, the analyses and imaging are well done, and the conclusions are well supported by the data. The work will be of interest to cell biologists, especially those interested in centrosomes and cilia.

      Strengths:

      The TEM micrographs are of excellent quality. The quality of the imaging overall is very good, especially considering that these are dynamic processes occurring in a small region of the cell. The data analysis is well done and the quantifications are very helpful. The manuscript is well-written and the final figure is especially helpful in understanding the model.

      Weaknesses:

      Additional information about the functional and mechanistic roles of EXOC6A would improve the manuscript greatly.

      We sincerely thank Reviewer 2 for the thoughtful and encouraging evaluation of our work. We are grateful for the recognition of the strengths of our study, including the quality of the TEM micrographs, the rigor of our imaging and data analysis, and the clarity of our manuscript and proposed model.

      We have expanded our analyses in the revised manuscript to better define EXOC6A’s contribution to ciliary function. Specifically, we examined the trafficking of two critical ciliary membrane-associated proteins: GPR161, a G-protein-coupled receptor involved in Sonic hedgehog (Shh) signaling, and BBS9, a core component of the BBSome complex essential for ciliary membrane protein transport. Our new data (Fig. 7C) show that both GPR161 and BBS9 fail to localize to the cilium in EXOC6A knockout cells, in contrast to wild-type controls where their ciliary localization is robust. This new evidence significantly strengthens the understanding of EXOC6A’s role.

      Reviewer #3 (Public review):

      Summary:

      Lin et al report on the dynamic localization of EXOC6A and Myo-Va at pre-ciliary vesicles, ciliary vesicles, and ciliary sheath membrane during ciliogenesis using three-dimensional structured illumination microscopy and ultrastructure expansion microscopy. The authors further confirm the interaction of EXOC6A and Myo-Va by co-immunoprecipitation experiments and demonstrated the requirement of EHD1 for the EXOC6A-labeled ciliary vesicles formation. Additional experiments using gene-silencing by siRNA and pharmacological tools identified the involvement of dynein-, microtubule-, and actin in the transport mechanism of EXOC6A-labeled vesicles to the centriole, as they have previously reported for Myo-Va. Notably, loss of EXOC6A severely disrupts ciliogenesis, with the majority of cells becoming arrested at the ciliary vesicle (CV) stage, highlighting the involvement of EXOC6A at later stages of ciliogenesis. As the authors observe dynamic EXOC6A-positive vesicle release and fusion with the ciliary sheath, this suggests a role in membrane and potentially membrane protein delivery to the growing cilium past the ciliary vesicle stage. While CEP290 localization at the forming cilium appears normal, the recruitment of other transition zone components, exemplified by several MKS and NPHP module components, was also impaired in EXOC6A-deficient cells.

      Strengths:

      (1) By applying different microscopy approaches, the study provides deeper insight into the spatial and temporal localization of EXOC6A and Myo-Va during ciliogenesis.

      (2) The combination of complementary siRNA and pharmacological tools targeting different components strengthens the conclusions.

      (3) This study reveals a new function of EXOC6A in delivering membrane and membrane proteins during ciliogenesis, both to the ciliary vesicle as well as to the ciliary sheath.

      (4) The overall data quality is high. The investigation of EXOC6A at different time points during ciliogenesis is well schematized and explained.

      Weaknesses:

      (1) Since many conclusions are based on EXOC6A immunostaining, it would strengthen the study to validate antibody specificity by demonstrating the absence of staining in EXOC6A-deficient cells.

      (2) While the authors generated an EXOC6A-deficient cell line, off-target effects can be clone-specific. Validating key experiments in a second independent knockout clone or rescuing the phenotype of the existing clone by re-expressing EXOC6A would ensure that the observed phenotypes are due to EXOC6A loss rather than unintended off-target effects.

      (3) Some experimental details are lacking from the materials and methods section. No information on how the co-immunoprecipitation experiments have been performed can be found. The concentrations of pharmacological agents should be provided to allow proper interpretation of the results, as higher or lower doses can produce nonspecific effects. For example, the concentrations of ciliobrevin and nocodazole used to treat RPE1 cells are not specified and should be included. More precise settings for the FRAP experiments would help others reproduce the presented data. Some details for the siRNA-based knockdowns, such as incubation times, can only be found in the figure legends.

      Taken together, the authors achieved their goal of elucidating the role of EXOC6A in ciliogenesis, demonstrating its involvement in vesicle trafficking and membrane remodeling in both early and late stages of ciliogenesis. Their findings are supported by experimental evidence. This work is likely to have an impact on the field by expanding our understanding of the molecular machinery underlying cilia biogenesis, particularly the coordination between the exocyst complex and cytoskeletal transport systems. The methods and data presented offer valuable tools for dissecting vesicle dynamics and cilium formation, providing a foundation for future research into ciliary dysfunction and related diseases. By connecting vesicle trafficking to structural maturation of an organelle, the study adds important context to the broader description of cellular architecture and organelle biogenesis.

      We sincerely thank Reviewer 3 for the thorough and thoughtful assessment of our manuscript. We greatly appreciate the recognition of the strengths of our study, including the use of advanced microscopy techniques, complementary functional tools, and the conceptual contributions regarding EXOC6A's role in vesicle trafficking and membrane remodeling during ciliogenesis.

      Below, we detail how we have addressed the specific suggestions for improvement:

      (1) Validation of EXOC6A Immunostaining Specificity

      To directly address the reviewer’s concern regarding antibody specificity, we have included new control immunofluorescence panels in Figure S3E-F, which show a complete loss of EXOC6A signal in two independent knockout (KO) clones. These data confirm the specificity of the EXOC6A antibody used throughout the study and reinforce the accuracy of our localization analyses at different stages of ciliogenesis.

      (2) Addressing Potential Clone-Specific or Off-Target Effects

      To ensure that the observed phenotypes are attributable to EXOC6A loss and not due to off-target effects, we performed parallel analyses using two independent KO clones, all of which exhibited identical defects in ciliogenesis, including arrest at the ciliary vesicle stage and impaired cilia assembly (Fig. S3C-D).

      In addition, we conducted rescue experiments by re-expressing EXOC6A in the KO background, which effectively restored ciliogenesis. Quantitative analysis of the rescue data has been added to the revised manuscript (Figure S6B), providing further support that the observed phenotype is specifically due to EXOC6A deficiency.

      (3) Expanded Methodological Details

      - A detailed protocol for co-immunoprecipitation experiments, including lysis conditions, antibody concentrations, and washing steps.

      - The precise concentrations and treatment durations for all pharmacological agents used, including ciliobrevin and nocodazole.

      - Comprehensive details on the siRNA-mediated knockdowns, including oligonucleotide sequences, transfection reagents, and incubation durations.

      Recommendations for the authors:

      Reviewing Editor Comments:

      After further consultation, all 3 reviewers agreed that this is an important study with highquality data, in particular the imaging data. They also considered most of the evidence convincing, but overall they termed it "solid" for two main reasons: first, they would have liked to see a validation of the EXOC6A antibody specificity, and second, they suggest that you demonstrate for at least key experiments the phenotypes with a second KO clone, to exclude clonal effects. In principle, rescue would be suited to address this, but the issue here is that the presented rescue is not very robust.

      We sincerely thank the Editor and all reviewers for their constructive and thoughtful evaluation of our manuscript. We are especially grateful for the recognition of the highquality imaging data, the experimental rigor, and the significance of our findings to the field of ciliogenesis.

      We fully acknowledge the two principal concerns raised during further consultation: (1) the need for validation of EXOC6A antibody specificity, and (2) the importance of confirming the phenotypes in an independent knockout clone to exclude clonal artifacts. We have taken both of these points seriously and have now addressed them through additional experiments and analyses, as detailed below:

      (1) Validation Using Independent Knockout Clones

      To rigorously validate antibody specificity and eliminate the possibility of clonal variation, we have characterized a second independent EXOC6A knockout (KO) clone. We confirmed complete loss of EXOC6A expression in both clones using three orthogonal approaches: genotyping, immunoblotting, and immunofluorescence (Fig. S3). Both KO clones exhibit indistinguishable phenotypes, including arrest at the ciliary vesicle stage and impaired cilia formation (Fig. S3D). 

      (2) Rescue Phenotype Validation with Statistical Significance

      In response to concerns about the robustness of the rescue, we have now included statistical analysis of the rescue experiments. A two-tailed Student’s t-test comparing ciliogenesis between the EXOC6A KO and rescue (GFP-EXOC6A re-expression) conditions shows a statistically significant improvement (p = 0.0041) (Fig. S6B). While we acknowledge that the rescue is partial—likely due to limitations of overexpression systems—the statistically significant recovery provides strong genetic evidence that the phenotypes are specific and reversible. These data are now included in the revised Figure S6.

      (3) Functional Consequences of EXOC6A Loss on Ciliary Membrane Protein Trafficking

      To further strengthen the mechanistic conclusions, we expanded our study to include the trafficking of two functional ciliary membrane proteins. We show that in EXOC6A KO cells, both BBS9 (a component of the BBSome complex) and GPR161 (a GPCR involved in Shh signaling) fail to enter the cilium. These results suggest that EXOC6A is required not only for early structural events in ciliogenesis, but also for establishing a competent transition zone, critical for ciliary membrane protein recruitment. These findings are detailed in the revised Figure 7C and corresponding Results.

      We believe that these additional experiments and clarifications directly address the concerns and significantly strengthen the robustness and impact of our study.

      The reviewers also made additional suggestions regarding functional and mechanistic insights that would strengthen the manuscript even further.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should include control IF panels for the specificity of the EXOC6A stainings at the various ciliogenesis stages using the KO cell line.

      We thank the reviewer for this important suggestion. We have now included the requested immunofluorescence (IF) control panels to validate the specificity of the EXOC6A antibody. As shown in the newly added Figure S3, EXOC6A immunofluorescence signal is completely absent in EXOC6A knockout (KO) cells at CV (Fig. S3E) and cilia membrane (Fig. S3F) stages, whereas robust and stage-specific signals are observed in wild-type cells. These results confirm the specificity of the endogenous EXOC6A staining used throughout the study and validate the spatiotemporal localization patterns reported in the main figures.

      (2) It would be informative to compare EXOC6A KO and RNAi to determine whether the only partially impaired ciliogenesis phenotype may be a consequence of cellular adaptation.

      We appreciate the reviewer’s concern regarding potential cellular adaptation or clonespecific effects. To address this, we examined the ciliogenesis phenotype in two independent EXOC6A KO clones generated using distinct sgRNA targeting strategies. As shown in Figure S3, two independent KO clones displayed a highly consistent phenotype characterized by a pronounced arrest at the ciliary vesicle (CV) stage and a significant reduction in mature cilium formation.

      The reproducibility of this phenotype across multiple independently derived clones strongly argues against clonal variability or long-term adaptive compensation as the underlying cause. Instead, these results support the conclusion that the observed ciliogenesis defects are a direct and specific consequence of EXOC6A loss.

      (3) It remains unclear whether EXOC6A's function in ciliogenesis is part of the exocyst complex. This is currently implied by the context in which it is introduced and discussed, although the authors avoid any direct statement about this. Do the authors observe similar phenotypes by knocking down any other exocyst subunit? In any case, this issue should be discussed.

      We thank the reviewer for raising this conceptual point. This study did not explore the functions of other components of the exocytosis complex during ciliogenesis, which warrants further investigation in the future. However, in our biochemical analyses, Myosin ‑Va specifically co‑immunoprecipitated with EXOC6A but not with other exocyst subunits tested (EXOC5 and EXOC7) (Fig. 4E) indicating a selective interaction between EXOC6A and the Myo‑Va transport machinery.

      Reviewer #2 (Recommendations for the authors):

      To clarify the roles of EXOC6A in ciliogenesis, I suggest the following:

      (1) Myo-Va is involved in both the intracellular and extracellular ciliogenesis pathways. The authors show that EXOC6A has a role in the intracellular ciliogenesis pathway. Does it also participate in the extracellular pathway?

      We thank the reviewer for this insightful question. Given that Myo-Va functions in both intracellular and extracellular ciliogenesis pathways, it is indeed plausible that EXOC6A may also participate in the extracellular pathway. However, the current study was specifically focused on elucidating the molecular mechanisms of intracellular ciliogenesis using RPE1 cells, which exclusively undergo this pathway. Assessing EXOC6A’s role in the extracellular pathway would require the use of specialized models (e.g., polarized epithelial cells such as MDCK or IMCD3), which fall beyond the scope of this manuscript.

      (2) In the live imaging movies (Fig 3C, 3D, supp movie 4 and 5), the authors observe tubular structures and puncta with EXOC6A and conclude that these are dynamic vesicles/membranes. While the movies are suggestive of membrane-like behavior, it would be helpful to show that these puncta and tubules have membrane, perhaps by astaining with a membrane dye.

      We appreciate the reviewer’s suggestion to validate the membrane identity of EXOC6Apositive structures. While we did not perform membrane dye staining in the current study, we agree this approach would provide additional confirmation. Nevertheless, the dynamic behaviors observed in our live-cell imaging—including membrane-like tubulation, fusion, and fission—strongly support the interpretation that EXOC6A puncta and tubules

      (3) It is unclear how the EXOC6A tubules and vesicles are delivered, and the extent to which MyoVa plays a role. The authors co-label EXOC6A and MyoVa in Supp Fig 2, but EXOC6A dynamics seem very different here, as compared to Fig 3D - there are fewer tubules and puncta and less movement of either tubules or puncta between time points. Does expression of MyoVa decrease EXOC6A membrane dynamics? Or is it required for EXOC6A membrane dynamics?

      We thank the reviewer for this observation. The apparent differences in EXOC6A dynamics between Supplementary Figure 2 and Figure 3D most likely reflect cell-to-cell variability in dynamic behavior, which is common in live-cell imaging. Both figures were derived from the same stable cell line co-expressing EXOC6A and Myo-Va-GTD. Moreover, our analysis shows that Myo-Va-GTD overexpression does not suppress EXOC6A dynamics, nor is it required for membrane remodeling per se. However, Myo-Va is essential for EXOC6A recruitment to the ciliary vesicle, as shown by the loss of EXOC6A localization in Myo-Va KO cells (Fig. 4A).

      (4) The authors show that loss of EXOC6A affects the localization of some transition zone proteins. Does this subsequently lead to defects in transition zone function?

      We agree with the reviewer that structural defects in the transition zone (TZ) should be linked to its function. To address this, we examined the localization of two wellcharacterized ciliary membrane-associated proteins: BBS9 and GPR161. Both proteins failed to localize to the cilia in EXOC6A knockout cells, despite proper recruitment in wildtype controls (Fig. 7C). Although we did not examine the exact functions of GPR161 and BBS9, our results suggest that the loss of EXOC6A may impair TZ function, particularly its gating capacity for membrane protein trafficking.

      (5) Additional information about how the MKS proteins are regulated by EXOC6A would be helpful to understand the mechanisms by which EXOC6A builds the transition zone. Does EXOC6A directly bind to MKS proteins, or are the MKS proteins delivered by EXOC6A-containing vesicles during ciliogenesis?

      We appreciate the reviewers' questions regarding the mechanistic relationship between EXOC6A and MKS module proteins. In this study, we did not explore the mechanism by which EXOC6A constructs the transition zone. This is an interesting topic worthy of further investigation in the future.

      Reviewer #3 (Recommendations for the authors):

      Recommended modifications:

      (1) The co-immunoprecipitation experiments suggest an interaction between EXOC6A and Myo-Va; however, the presence of a faint band in the IgG control raises some uncertainty. To reinforce this conclusion, the authors could demonstrate that the interaction is absent in the EXOC6A knockout cell line.

      We thank the reviewer for this careful observation. We acknowledge the presence of a faint Myo‑Va signal in the IgG control lane. Myosin‑Va is a highly abundant cytoskeletal motor protein and can occasionally exhibit low‑level nonspecific binding to agarose beads during immunoprecipitation assays. Importantly, the Myo‑Va signal co‑immunoprecipitated with endogenous EXOC6A is substantially stronger and specifically enriched compared with the IgG control, supporting a specific interaction.

      (2) Figure S5: The partial rescue of the EXOC6A phenotype is not entirely convincing. A statistical test to assess the significance of the observed differences may help to strengthen the authors' conclusion.

      We appreciate the reviewer’s suggestion to validate the rescue experiment. We have now performed a pairwise two‑tailed Student’s t‑test comparing ciliogenesis efficiency between EXOC6A knockout cells and rescue cells expressing GFP‑EXOC6A. As shown in the revised Figure S6 (original Figure S5), re‑expression of EXOC6A resulted in a statistically significant recovery of ciliogenesis (p = 0.0041). While the rescue is partial—likely due to inherent limitations of plasmid‑based expression systems, including variable transfection efficiency and imperfect restoration of endogenous protein levels—the statistically significant improvement confirms that the ciliogenesis defect is specifically caused by EXOC6A loss. Figure S6 and its legend have been updated accordingly.

      (3) A detailed description of the EXOC6A knockout strategy should be included.

      The Method section has been expanded to include a comprehensive description of the CRISPR/Cas9 ‑ mediated EXOC6A knockout strategy, including sgRNA sequences, genomic target sites, and validation approaches. Additionally, we now include Figure S3, demonstrating complete loss of EXOC6A protein expression in two independent knockout clones, confirming the efficiency and specificity of the gene‑editing strategy.

      (4) The labeling in Figure 6 is confusing; assigning a separate letter to each panel would improve clarity.

      Figure 6 has been reorganized for clarity: the original panels have been subdivided and relabeled as 6A/6A’ and 6B/6B’, respectively. The figure legend and all corresponding references in the main text have been updated accordingly.

      (5) Lines 109-112: The cell line used is not well described. While experts might understand that Dox is used to induce expression of the transgenes, this should be better explained for non-expert readers.

      We have revised the text to clearly explain that doxycycline (Dox) is used to induce transgene expression via a Tet‑On inducible system. This clarification has been added to the main text.

      (6) Line 180: replace "labels" with "structures".

      We have revised the text as suggested.

      (7) Line 189: the EXOC6A recruitment to the membrane structures seems to be occurring on a short timescale that should be specified. In this context, "immediately" appears unscientific.

      We have revised the sentence to specify that EXOC6A recruitment occurs within seconds, based on our live‑cell imaging data, providing a more accurate temporal description.

      (8) Lines 280-282: We recommend rewording to soften this statement. Actin and microtubule inhibitors affect the entire cytoskeletal network; more specific experiments would be required to assess whether the transport of vesicles is defective.

      We have reworded the statement to indicate that the accumulation of these vesicles at the mother centrioles is highly sensitive to disruption of dynein or microtubules, suggesting that efficient transport of these vesicles may depend on the integrity of the microtubule network. However, more experiments are required to confirm this conclusion. 

      (9) Lines: 428-433: Similarly, we recommend rewording this statement as it presents the authors' current model, which is in line with the presented data but would require more rigorous investigation.

      We have revised this section to describe the mechanism as a working model supported by our data, while acknowledging that further investigation will be required to fully establish the proposed hierarchy and molecular details.

      Questions and comments to consider:

      (1) 15-30% of cells can form cilia-like structures in the EXOC6A KO cells, although membrane transport should be reduced. It would be interesting to investigate whether these cilia are only formed intracellularly and fail to reach the cell surface.

      We thank the reviewer for this insightful question. Using both immunofluorescence and electron microscopy, we observed that a subset of ciliary membranes in EXOC6A KO cells do appear to fuse with the plasma membrane. However, due to the low frequency and heterogeneous morphology of these structures, we were unable to reliably quantify this population. 

      (2) In the Western blot shown in Figure 4, EXOC6A appears at multiple molecular weights when detected with the anti-EXOC6A antibody. Providing a possible explanation for this shift would be helpful.

      We clarify that the apparent molecular weight shift likely results from gel distortion during electrophoretic separation. Importantly, the specificity of the major EXOC6A band was rigorously validated by its complete absence in EXOC6A knockout lysates, confirming that the detected signal corresponds to EXOC6A.

      (3) The Western blot in Figure 5B is not fully convincing; including additional independent blots would be nice.

      We thank the reviewer for this suggestion. Figure 5B has been replaced with a blot from an independent experiment, improving clarity and reproducibility.

      (4) According to the materials and methods section, siRNA-mediated knockdown of targets was performed using a single siRNA per gene, which could result in off-target effects. It would be advised to use several different siRNAs for a single target to exclude off-target effects, cite references or, in case this has been done.

      We appreciate this concern. The siRNAs used in this study were previously validated in our earlier work (Wu et al., Nat Cell Biol 2018), where both specificity and efficiency were rigorously tested. We have now explicitly cited this reference in the Materials and Methods section to justify the selection of these reagents.

      (5) The abbreviation CFLEM is uncommon for correlative (fluorescence) light and electron microscopy; the authors should consider using the standard abbreviation CLEM.

      We have replaced “CFLEM” with the standard term CLEM (Correlative Light and Electron Microscopy) throughout the manuscript and figure legends.

      (6) The term "M-centriole" is uncommon and should at least be introduced. The use of the term "mother centriole" is recommended.

      We have replaced “M‑centriole” with the standard term “mother centriole” throughout the manuscript and figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lipid transfer proteins (LTPs) play a crucial role in the intramembrane lipid exchange within cells. However, the molecular mechanisms that govern this activity remain largely unclear. Specifically, the way in which LTPs surmount the energy barrier to extract a single lipid molecule from a lipid bilayer is not yet fully understood. This manuscript investigates the influence of membrane properties on the binding of Ups1 to the membrane and the transfer of phosphatidic acid (PA) by the LTP. The findings reveal that Ups1 shows a preference for binding to membranes with positive curvature. Moreover, coarse-grained molecular dynamics simulations indicate that positive curvature decreases the energy barrier associated with PA extraction from the membrane. Additionally, lipid transfer assays conducted with purified proteins and liposomes in vitro demonstrate that the size of the donor membrane significantly impacts lipid transfer efficiency by Ups1-Mdm35 complexes, with smaller liposomes (characterized by high positive curvature) promoting rapid lipid transfer.

      This study offers significant new insights into the reaction cycle of phosphatidic acid (PA) transfer by Ups1 in mitochondria. Notably, the authors present compelling evidence that, alongside negatively charged phospholipids, positive membrane curvature enhances lipid transfer - an effect that is particularly relevant at the mitochondrial outer membrane. The experiments are technically robust, and my primary feedback pertains to the interpretation of specific results.

      (1) The authors conclude from the lipid transfer assays (Figure 5) that lipid extraction is the rate-limiting step in the transfer cycle. While this conclusion seems plausible, it should be noted that the authors employed high concentrations of Ups1-Mdm35 along with less negatively charged phospholipids in these reactions. This combination may lead to binding becoming the rate-limiting factor. The authors should take this point into consideration. In this type of assay, it is challenging to clearly distinguish between binding, lipid extraction, and membrane dissociation as separate processes.

      We have included a detailed consideration of this issue on page 11 of the revised manuscript.

      (2) The authors should discuss that variations in the size of liposomes will also affect the distance between them at a constant concentration, which may affect the rate of lipid transfer. Therefore, the authors should determine the average size and size distribution of liposomes after sonication (by DLS or nanoparticle analyzer, etc.)

      We have included DLS measurements for all lipid sizes (page 6) (SupFig. 2A). Due to the sensitivity of the intensity distribution in DLS measurements by larger particles, we also conducted cryo-EM analysis of vesicles with different sizes (page 6) (SupFig. 2B).

      We also now discuss the challenges posed by a fixed membrane-binding surface, which can lead to variations in vesicle spacing when using liposomes of different sizes and its possible influence on the interpretation of results (page 10-11).

      (3) The authors use NBD-PA in the lipid transfer assays. Does the size of the donor liposomes affect the transfer of NBD-PA and DOPA similarly? Since NBD-labeled lipids are somewhat unstable within lipid bilayers (as shown by spontaneous desorption in Figure 5B), monitoring the transfer of unlabeled PA in at least one setting would strengthen the conclusion of the swap experiments.

      To experimentally address this comment, we explored several different approaches. We first performed transfer experiments using unlabelled lipids, following the general procedures described in the manuscript. After the transfer reaction, we attempted to separate donor and acceptor vesicles by centrifugation and subsequently analyzed the samples by high-resolution mass spectrometry and thin-layer chromatography. Despite considerable effort, we were not able to reliably separate the differently sized liposomes. In particular, small liposomes proved difficult to handle during centrifugation, which is a well-known challenge (Kučerka et al. 1994, BBA; Boucrot et al. 2012, Cell). In addition, liposomes exhibited a tendency to cross-link in the presence of protein, further complicating the separation. Even if this separation step were straightforward, an important limitation of such an approach is that it is very difficult to monitor lipid transfer with sufficient time resolution. Much of the relevant activity occurs within the first 20–30 seconds, and precise interruption at defined time points would be essential.

      We therefore set out to establish a fluorescence-based assay that would allow us to follow lipid transfer in real time. For this, we adapted a dequenching-type assay based on a PE coupled fluorescein dye, whose fluorescence is quenched in the proximity of negative charges (e.g., negatively charged lipid headgroups). In principle, this assay should allow us to monitor the movement of negatively charged PA lipids away from donor membranes. Although a fluorescein-based passive lipid-transfer assay has been described previously (Richens et al., 2017), it is used only rarely in the lipid-transfer field. While establishing this assay, we encountered several technical challenges. For example, immediately after protein addition, fluorescence intensity changed in unexpected ways that could not be attributed to lipid transfer. Such effects have been reported in the literature (Wall et al., 1995) and are most likely caused by changes in membrane charge density upon protein binding. After extensive fine -tuning of the experimental conditions and careful evaluation of the data, we were ultimately able to demonstrate that lipid-transfer rates are significantly higher with smaller than with larger liposomes. These results confirm our initial observations, and importantly, they were obtained using unlabelled PA.

      The revised manuscript now includes this independent lipid-transfer assay demonstrating the transfer of non-labelled PA (page 11) (SupFig. 4).

      (4) The present study suggests that membrane domains with positive curvature at the outer membrane may serve as starting points for lipid transport by Ups1-Mdm35. Is anything known about the mechanisms that form such structures? This should be discussed in the text.

      We included a detailed consideration of this interesting point in the discussion section on page 13-14.

      Reviewer #2 (Public review):

      Summary:

      Lipid transfer between membranes is essential for lipid biosynthesis across different organelle membranes. Ups1-Mdm35 is one of the best-characterized lipid transfer proteins, responsible for transferring phosphatidic acid (PA) between the mitochondrial outer membrane (OM) and inner membrane (IM), a process critical for cardiolipin (CL) synthesis in the IM. Upon dissociation from Mdm35, Ups1 binds to the intermembrane space (IMS) surface of the OM, extracts a PA molecule, re-associates with Mdm35, and moves through the aqueous IMS to deliver PA to the IM. Here, the authors analyzed the early steps of this PA transfer - membrane binding and PA extraction - using a combination of in vitro biochemical assays with lipid liposomes and purified Ups1-Mdm35 to measure liposome binding, lipid transfer between liposomes, and lipid extraction from liposomes. The authors found that membrane curvature, a previously overlooked property of the membrane, significantly affects PA extraction but not PA insertion into liposomes. These findings were further supported by MD simulations.

      Strengths:

      The experiments are well-designed, and the data are logically interpreted. The present study provides an important basis for understanding the mechanism of lipid transfer between membranes.

      Weaknesses:

      The physiological relevance of membrane curvature in lipid extraction and transfer still remains open.

      We thank the reviewer for the constructive feedback on our work. We agree that the physiological relevance of membrane curvature in lipid extraction and transfer remains an open question. Our data show that Ups1 binding to native-like OM membranes under physiological pH conditions is curvature-dependent, supporting the idea that this mechanism may optimize lipid transfer in vivo. While the intricate biophysical basis of this behaviour can only be dissected in vitro, these findings offer valuable insight into how curvature may functionally regulate Ups1 activity in the cellular context. To directly test this, it will be important in future studies to identify Ups1 mutants that lack curvature sensitivity and assess their performance in vivo, which will help clarify the physiological importance of this mechanism.

      Reviewer #3 (Public review):

      The manuscript by Sadeqi et al. studies the interactions between the mitochondrial protein Ups1 and reconstituted membranes. The authors apply synthetic liposomal vesicles to investigate the role of pH, curvature, and charge on the binding of Ups1 to membranes and its ability to extract PA from them. The manuscript is well written and structured. With minor exceptions, the authors provide all relevant information (see minor points below) and reference the appropriate literature in their introduction. The underlying question of how the energy barrier for lipid extraction from membranes is overcome by Ups1 is interesting, and the data presented by the authors could offer a valuable new perspective on this process. It is also certainly a challenging in vitro reconstitution experiment, as the authors aim to disentangle individual membrane properties (e.g., curvature, charge, and packing density) to study protein adsorption and lipid transfer. I have one major suggestion and a few minor ones that the authors might want to consider to improve their manuscript and data interpretation:

      Major Comments:

      The experiments are performed with reconstituted vesicles, which are incubated with recombinant protein variants and quantitatively assessed in flotation and pelleting assays. According to the Materials and Methods section, the lipid concentration in these assays is kept constant at 5 µM. However, the authors change the size of the vesicles to tune their curvature. Using the same lipid concentration but varying vesicle sizes results in different total vesicle concentrations. Moreover, larger vesicles (produced by freeze-thawing and extrusion) tend to form a higher proportion of multilamellar vesicles, thus also altering the total membrane area available for binding. Could these differences in the experimental system account for the variation in binding? To address this, the authors would need to perform the experiments either under saturated (excess protein) conditions or find an experimental approach to normalize for these differences.

      To experimentally address this comment, we have conducted a detailed structural analysis of liposomes of different sizes using cryo-EM to determine the degrees of multi-lamellarity and to estimate how much membrane surface is available for protein binding. We found that while indeed as expected liposomes extruded through a 400 nm sized filter showed about 75 % of the initially calculated membrane surface is still available (SupFig. 3A). For 50 nm extruded liposomes, this number went up to about 93 % and for sonicated liposomes the number was about 94 %. Given the fact that we found about 70 % binding of Ups1 to sonicated liposomes, while this number went down to about 40 % with 50 nm liposomes and to about 30 % for 400 nm extruded liposomes, we can rule out that the effects we observe are due to an increased or decreased available membrane binding area.

      Additionally, we performed experiments with increasing amounts of lipids to analyse the impact of lipid concentration on Ups1 membrane binding, when comparing 400 nm extruded liposomes with sonicated liposomes. Interestingly, while we do observe an increased binding of Ups1 to sonicated liposomes with concentrations varying between 2.5 mM to 10 mM no major increase in binding was observed with 400 nm extruded liposomes. Ups1 membrane binding to sonicated liposomes highly exceeded binding to 400 nm extruded liposomes under all tested conditions (page 7) (SupFig. 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors:):

      (1) Figures 1, 2, and 3 - In the flotation assays, the Ups1-containing fractions differ between experiments. The presence of liposomes in these fractions should be confirmed, for example, by fluorescence measurements. In relation to this, the broad low MW bands in Supplementary Figure 3 may reflect liposomes (mixed micelles of lipids and SDS?), as their fractionation patterns coincide with those of Ups1 at pH 5.5 -6.7 but deviate at pH 7.0 and 7.5. Could the authors clarify this discrepancy?

      Flotation profiles vary with changing conditions of the experiment. We have included a picture of a gel showing the Coomassie staining and the fluorescence of the used lipids side by side to show that the protein bands co-migrate together with liposomes (SupFig. 5). 

      (2) Figures 2, 3, and 5 - The sizes of the liposomes (400 nm and 50 nm) should be experimentally confirmed, e.g., by dynamic light scattering (DLS).

      We have included DLS measurements confirming the differences of liposome sizes. Please see answer to point 2 of Reviewer 1.

      (3) Figure 4C - The free energy landscape for different phospholipids is interesting. What about other acidic phospholipids, such as PS?

      This is indeed an interesting point. Our molecular dynamics simulations show that PE has a similar free energy landscape to PA while PC is significantly different. This might point into the direction that the headgroup size plays a major role. For intra-mitochondrial PS transport a specific protein complex consisting of Ups2/Mdm35 has been identified, and it will be an interesting question for future studies if PS transfer is regulated by similar factors.

      (4) Supplementary Figure 2 - The deformation of liposomes by Ups1 is interesting. Does this depend on the presence of PA or other acidic phospholipids?

      We asked ourself the same question throughout the project. As pointed out in the manuscript, the membrane-deforming activity of Ups1 is relatively mild when compared to proteins found for example in endocytosis. This made a proper static analysis challenging. We weren’t able to unambiguously show whether other acidic phospholipids showed comparable effects to PA.

      (5) It may not be easy to assess experimentally, but the OM in mitochondria should have scramblase activity. Then, such scramblase activity could influence the observed effects of membrane curvature on Ups1-mediated PA transfer.

      (6) It would be helpful to discuss this possibility in the manuscript.

      In the revised version of the manuscript, we now discuss the existence of scramblases, such as Sam50 and VDAC, in the outer mitochondrial membrane with regard to their likely effect on membrane packing (page 13 - 14). As for a co-reconstitution experiment we considered the in vitro analysis of the impact that a scramblase in liposomes might have on lipid transfer outside the scope of this study. 

      (7) Figure 6 is not referenced in the main text.

      Thank you, this oversight was corrected.

      (8) The non-abbreviated forms of LUV and SUV should be defined in the text upon first use.

      We now include a definition in the manuscript.

      (9) The term "transfer velocity" would be better expressed as "transfer rate".

      We agree, and we changed the wording accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) As flotation assays are a central technique of the study, readers who are not familiar with this method could benefit from a few explanatory sentences and appropriate references in the introduction section.

      Figure 1B now contains an updated version of a cartoon outlining the flotation assay and a description in the manuscript (page 4) that should make it easier to understand the assay. We have also included a direct reference within the methods section to a paper describing this assay in more detail.

      (2) Related to the major point, but also to improve the manuscript overall, the authors could add DLS (for size distribution and zeta potential) and cryo-EM (for multilamellarity analysis) data. This would aid future efforts to reproduce their observations.

      In the revised version of the manuscript we include DLS and zeta potential measurements as well as a detailed analysis of liposome multilamellarity by cryo-EM (also see answer to point 2 by Reviewer 1) (SupFig. 2A & B; SupFig. 3E).

      (3) Could the authors state the specific zeta potentials of the negatively charged (under varying pH) and neutral liposomes and relate these to natural membranes?

      We have included zeta potential measurements of differently charged liposomes in and changed the text accordingly (page 8) (SupFig. 3E).

      (4) Changes in pH affect several characteristics of membranes (including lipid dipoles, charge, packing density, fluidity, and phase separation), particularly charge density. This experimental system does not allow all of these factors to be disentangled and studied separately. Some of the observations presented in Figures 2 and 5 could also be explained by these effects.

      The effects of pH on various membrane properties, such as lipid headgroup dipoles, lipid packing, interfacial tension, and others, are well described in the literature. For example, it was implied that increasing pH leads to phosphatidic acid (PA) becoming more negatively charged when in proximity to phosphatidylethanolamine (PE). We already discuss this effect in the manuscript, as our observation that Ups1 binding to membranes depends on negatively charged lipids but nevertheless increases with decreasing pH is unexpected.

      As pointed out, many of the parameters mentioned above are beyond control in our assays, and a systematic analysis of each of these factors with respect to Ups1 membrane binding and lipid transfer would be well beyond the scope of this manuscript. We have therefore included a passage discussing this issue in more detail (page 4-5).

      (5) Is the curvature simulated in the theoretical models comparable to the curvature of the liposome systems (e.g., a sphere of 100 nm diameter)?

      The simulated curvature spans a defined range, with the highest curvature corresponding to vesicles with diameters of approximately 15 nm. This corresponds reasonably well to the vesicle size distribution as analyzed by cryo-EM.

      Reference

      Connerth, M., Tatsuta, T., Haag, M., Klecker, T., Westermann, B., & Langer, T. (2012). Intramitochondrial transport of phosphatidic acid in yeast by a lipid transfer protein. Science, 338(6108), 815-818. https://doi.org/10.1126/science.1225625

      Lu, J., Chan, C., Yu, L., Fan, J., Sun, F., & Zhai, Y. (2020). Molecular mechanism of mitochondrial phosphatidate transfer by Ups1. Commun Biol, 3(1), 468. https://doi.org/10.1038/s42003-020-01121-x

      Miliara, X., Garnett, J. A., Tatsuta, T., Abid Ali, F., Baldie, H., Perez-Dorado, I., Simpson, P., Yague, E., Langer, T., & Matthews, S. (2015). Structural insight into the TRIAP1/PRELI-like domain family of mitochondrial phospholipid transfer complexes. EMBO Rep, 16(7), 824-835. https://doi.org/10.15252/embr.201540229

      Miliara, X., Tatsuta, T., Berry, J. L., Rouse, S. L., Solak, K., Chorev, D. S., Wu, D., Robinson, C. V., Matthews, S., & Langer, T. (2019). Structural determinants of lipid specificity within Ups/PRELI lipid transfer proteins. Nat Commun, 10(1), 1130. https://doi.org/10.1038/s41467-019-09089-x

      Miliara, X., Tatsuta, T., Eiyama, A., Langer, T., Rouse, S. L., & Matthews, S. (2023). An intermolecular hydrogen-bonded network in the PRELID-TRIAP protein family plays a role in lipid sensing. Biochim Biophys Acta Proteins Proteom, 1871(1), 140867. https://doi.org/10.1016/j.bbapap.2022.140867

      Potting, C., Tatsuta, T., Konig, T., Haag, M., Wai, T., Aaltonen, M. J., & Langer, T. (2013). TRIAP1/PRELI complexes prevent apoptosis by mediating intramitochondrial transport of phosphatidic acid. Cell Metab, 18(2), 287-295. https://doi.org/10.1016/j.cmet.2013.07.008

      Richens, J. L., Tyler, A. I. I., Barriga, H. M. G., Bramble, J. P., Law, R. V., Brooks, N. J., Seddon, J. M., Ces, O., & O'Shea, P. (2017). Spontaneous charged lipid transfer between lipid vesicles. Sci Rep, 7(1), 12606. https://doi.org/10.1038/s41598-017-12611-0

      Wall, J., Golding, C. A., Van Veen, M., & O'Shea, P. (1995). The use of fluoresceinphosphaCdylethanolamine (FPE) as a real-time probe for peptide-membrane interactions. Mol Membr Biol, 12(2), 183-192. https://doi.org/10.3109/09687689509027506

      Watanabe, Y., Tamura, Y., Kawano, S., & Endo, T. (2015). Structural and mechanistic insights into phospholipid transfer by Ups1-Mdm35 in mitochondria. Nat Commun, 6, 7922. https://doi.org/10.1038/ncomms8922

    1. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that significant additional feasibility studies are required. As comparison, the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003) achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: “They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.”

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and importantly test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798 and PMID: 24685391). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one wants to backcross at a later stage, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment that tagging each gene separately may not be considered helpful. Why would one do single tagging at a time, rather than triple tagging if the whole point of the paper is to demonstrate the scalability of tagging? Meaning, that one can shortcut tagging all genes by a factor of 3 through joint tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is very limiting. The theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to when bemoaning that the Abstract and Introduction are too focused on our paper and not presenting the state of the field. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we will gladly expand our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we will discuss some of these points in the first paragraph of the results section:

      “In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014).”

      “These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci.”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that expectations are mistaken. Below we respond to the reviewer’s specific examples and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (Packer et al 2019, highest intestine and hyp; Ghaddar et al 2023 intestine, sheath and BWM, and even oocyte). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that “there are no published studies about this enzyme, so we really don't know for sure what it's doing” is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      We note that the Ghaddar et al. and CeNGEN/Taylor et al. datasets do not. The scRNA paper cited by the referee (PMID: 38816550) also shows enrichment in neurons and pharynx, which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we will add the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We will add this information to the table including annotated expression levels in young adults from various datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We are grateful for the referee’s appreciation that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow-up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      We thank you for your kind words!

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      We understand these statements might seem trivial, but they are meant to fully acknowledge, particularly to non-evolutionary biologists, the fact that we can’t do the genetics to “prove” these putatively deleterious mutations really are so (hence the statement about forward/reverse genetic experiments), nor causation (since this mating system evolved once in the history of gorillas we cannot know directionality in this lineage, although we could infer it if we had species in which different stages were extant, for example).”

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

      We thank you for the helpful suggestion and have added a discussion of these issues in the reworked Caveats and limitations section (lines 639 - 710).

      Reviewer #1 (Recommendations for The Authors):

      This is an excellent paper with thorough and creative approaches to address an interesting connection between genotype and phenotype. Stylistically the paper is very well written.

      We thank you for your kind words.

      Page 3: I suggest deleting the word "vaginal" so the sentence reads "... the evolution of female traits such as anatomical features that allow female control...". Most of the well-documented examples of cryptic female choice are in animals that do not have vaginas like insects, fish, and birds, including the reference given at the end of the sentence (Brennan et al. 2007 on waterfowl).

      We agree and have made this edit.

      Page 3: I would delete the words "multimale-multifemale" when discussing gorillas, to make the sentence read "Most gorillas, for example, live in groups with age-graded...". The use of "multimale-multifemale" here is not exactly wrong, but can be confusing to the reader since the authors essentially use "multimale-multifemale" as a synonym for "polygamous" in the previous paragraph.

      We agree and have made this edit.

      The writing in the Materials and Methods fluctuates between present and past tense. The authors should pick a consistent style, probably past tense by convention.

      We have edited the Materials and Methods only to use past tense.

      "Drosophila" is italicized sometimes, but not sometimes not. Make consistent.

      To ensure consistency, italics were used only when genus and species were shown together (i.e., Drosophila melanogaster).

      In the main text, a few reference typos/confusions:

      Box 1, Figure 1B caption: I believe this "Dixson, n.d." reference should be Dixson (2009), if it refers to the book (Oxford Press).

      Yes, that is the case. Thank you for having spotted this. The reference has been corrected.

      Page 21: The authors use the term "false exons" and "fake exons" in the same paragraph. Are these the same thing? If so, just use "false exons" both times.

      These are the same, we have changed fake to false.

      Page 22-23, maybe elsewhere: The Smith et al. reference includes Martin's first name.

      Thank you for bringing this issue to our attention. The reference has been corrected.

      Page 25: in the parenthetical listing of scientific species names, the word "and" should not be italicized. In this same section, there's really no reason to include "gorilla" as the subspecies. It isn't given for the other species.

      Corrected.

      Page 27: Missing period in the second paragraph after "(Guyonnet et al. 2012)".

      Corrected.

      Page 29: Should read "... available in gnomAD that would allow us to exclude..." (or possibly "... available in gnomAD that would allow the exclusion of ...").

      Corrected.

      Page 33, figure legend off Appendix Figure 1A: "gray line" not "gray liner".

      Corrected.

      Box 1, Figure 1A: This is confusing in a few ways. First, the gorilla red dot is labeled "Gorilla", but the chimpanzee and bonobo dots are not labeled. Perhaps in the legend the colors could be indicated, such as "... percentage of body mass for gorilla (red), common chimpanzee (dark blue), and bonobo (light blue)"? Secondly, the bar chart shows the testes/body mass ratio but it is not clear what they are scaled to. Should there be a second y-axis on the right side of the plot?

      The bar chart showed the testis weight/body weight ratio (log), but it is not really necessary. We have removed the bar chart and labeled chimpanzees and gorillas.

      Figure 1D: I found myself confused by the vertical label of "Percent of genes with w>1 in Gorilla". Because all genes are in the stacked histogram, my first thought was that ~99% of the genes have w>1 (gray). Would be more clear if the label was the same as 1G ("Percent of genes").

      We agree and have made this change.

      The text in the figures is extremely small. I don't know what it will look like once it is fully formatted for publication, so I'll leave those concerns to the editor/publisher.

      We will wait until the proofs to determine if this figure needs to be split into multiple figures with larger text.

      References in the reference section need a LOT of cleaning up. It does not appear that any manual editing was done. Please check for consistency in capitalization, italicization, abbreviations, missing information, etc. The level of neglect to this section is frankly unprofessional.

      I (VJL) apologize for this; it is entirely my fault. To explain but not justify, I have dyslexia, and the shifting combination of text, numbers, punctuation, fonts, and font styles makes it difficult to see the inconsistencies. To mitigate this, I use a reference manager to format references (like everyone else) and almost always have someone proofread the reference section, but I didn’t do that with this manuscript. I apologize for the oversight. My dedicated co-authors have cleaned the reference section.

      Reviewer #2 (Public Review):

      As outlined in the public review, this is a nicely executed molecular evolutionary study. The analyses and overall patterns described in gorillas appear rigorous and convincing. The fundamental limitation here is a lack of comparative context to specifically establish the connection to mating system or the uniqueness of these overall patterns to gorillas.

      We thank the reviewer for the compliments. However, there is some confusion about the hypothesis we tested. We hypothesized that genes involved in male reproductive biology would have relaxed selective constraints in gorillas because of their mating system, not that polygynous mating systems would lead to relaxed selection. While that may be true, it is not the hypothesis we tested, nor do we state that the overall pattern we observe is unique to gorillas. Our data, however, support our claims: 1) We performed an unbiased selection scan in gorillas and identified genes with K<1, an evolutionary signature of reduced selection intensity; 2) We found that those genes were enriched for male reproductive functions; and 3) Some of those genes had effects on male reproduction in both Drosophila screens and in infertile men. These are the results one would expect if our hypothesis were true.

      To partly address the concern that our results do not have a connection to mating systems or may be an overall pattern rather than a gorilla-specific one, we ran RELAX using the same dataset but in the elephant seal, another species with a highly polygynous mating system. Although elephant seals are a polygynous species, they differ from gorillas in that their spermatogenesis does not undergo persistent deterioration, but instead follows a seasonal pattern. According to the comprehensive study by Laws (The Elephant Seal (Mirounga Leonina Linn.): III. The physiology of reproduction; Scientific Reports, 15, Falkland Islands Dependencies Survey, 1956], male gamete production is upregulated during the mating season and is mostly inactive throughout the rest of the year. Of the 573 genes with K<1 in gorillas only 14 also have K<1 in elephant seals, which had 350 genes with K<1. A GO analysis of the 350 elephant seal K<1 genes does not identify enrichment in spermatogenesis-related terms. In fact, the list of GO terms is quite broad. A potential, if admittedly speculative, interpretation of these findings is that although polygynous, the selective pressure on elephant seal spermatogenesis is not relaxed (unlike in gorillas) because of the seasonal nature of their mating period. In other words, by having a temporally narrower window for reproductive success than gorillas, the selective constraint on male gametogenesis in seals is not weakened. Regardless, the low overlap in relaxed genes between the two tested polygynous species support the view that this reproductive strategy is probably associated with different evolutionary signatures in the genome (depending on the species), a likely reflection of the complex, nuanced and multi-factorial aspects of such strategies. We include this analysis in the Appendix (lines 1112 - 1132).

      While there is much that I like about the study and approach, this is a substantial shortcoming that really limits the significance of the, especially given that lineage specific patterns were also analyzed by Scally et al. (2012) over a decade ago.

      While Scally et al. (2012) reported the initial sequencing, assembly, and analyses of the gorilla genome, the method they used to characterize selective pressure on coding genes - the branch and branch-site model implemented in PAML - is misspecified to detect relaxed selection (PMID: 25540451). Under relaxed selection, the d<sub>N</sub>/d<sub>S</sub> of sites under purifying selection will move towards 1, the d<sub>N</sub>/d<sub>S</sub> of sites under positive selection will also move towards 1, and some sites will not experience a change in d<sub>N</sub>/d<sub>S</sub>. The PAML test used Scally et al. (2012) averages d<sub>N</sub>/d<sub>S</sub> across all sites, rather than having distinct rate categories for each of the three selection classes. A change in d<sub>N</sub>/d<sub>S</sub> toward 1 under the PAML model can arise because the strength of positive selection is weaker in the foreground lineage than the background lineage, even if there is still positive selection acting on some sites. Averaging across all sites also means there is little power to detect relaxed selection, even if it is relaxed selection. Furthermore, the PAML test used by Scally et al. (2012) is underpowered to detect relaxed selection because it depends on selective regimes in background species. Scally et al. (2012) also used six species, which underpowers their test of relaxation, because if one or more of those species experience an increase in their d<sub>N</sub>/d<sub>S</sub> rate, the background rate will increase giving the appearance of a decrease in the gorilla lineage even if its d<sub>N</sub>/d<sub>S</sub> rate has not changed. We elaborate on this in the Appendix section (lines 1036 - 1073). Finally the method implemented in PAML does not allow for synonymous rate variation across sites or multi-nucleotide mutations per codon, ignoring synonymous rate variation dramatically inflates the false positive rates in selection tests (PMID: 32068869) as does ignoring multi-nucleotide mutations (PMID: 29967485 and PMID: 37395787); we have added a discussion of these issues in our Caveats and limitations section (lines 683 - 710).

      Reviewer #2 (Recommendations for The Authors):

      Specific comments

      Framing: Overall, the connection between mating system is referred in variable levels of certainty, some appropriate, others overstated. The paper title uses 'coincident' which is appropriate, but also at odds with the stronger conclusions that are emphasized throughout. Elsewhere the phrasing is much stronger (abstract, discussion) implying a direct statistical association with mating system variation that has not been established. Elsewhere the term 'association' is used in the same manner, but in instances where a statistical association is tested and demonstrated (tests of enrichment, etc).

      We are unsure why the Reviewer considers our claims overstatements. The patterns of molecular evolution we found are ‘associated,’ and 'coincident with,' and we believe our results are ‘compelling’. Our tests for relaxed and positive selection are statistically associated with a polygynous social system which we a priori hypothesized. We have taken care to ensure a more consistent framing of this connection throughout the manuscript to avoid potential misinterpretations of causality.

      Page 7, elsewhere- It is essential to compare the reported patterns (percentage of relaxed genes in gorilla, patterns of enrichment, etc) to other primate lineages to identify if this number is enriched due to mating system or if these patterns are unusually for sperm genes across mammals. The implication here and throughout is that the specific pattern reflects specific aspects of gorilla mating biology, but this is never established. Additionally, it would be interesting to know the relative number of genes under positive selection across species (or across great apes).

      We agree that if we were using a PAML-like approach that these controls would be informative. But with the RELAX method the foreground K is compared to the background K, K only becomes significantly less than one if there is relaxing in the intensity of selection in the foreground. If these patterns were common to sperm genes across mammals the background and foreground K would not be significantly different. Our a priori hypothesis was that genes related to male reproductive biology would show evidence of a decrease in the intensity of selection (both positive and purifying), which we tested and found to be true. In this regard, we can conclude that the gorilla mating system is associated with patterns of molecular evolution in the species’ genome.

      While we too would find it interesting to know the relative number of genes under positive selection across species (or across great apes), that is not the study we performed and is beyond the scope of this one (and we only identified 96 genes that were positively selected in gorilla suggesting that few genes are positively selected across species).

      Page 8, bottom, elsewhere- "13,491 background set" elsewhere this is 13,310 (abstract). The number of genes here is different, and the set seems to change across multiple parts of the paper without explanation. This could be a simple typo, however, it may affect statistical analysis if the problem is widespread, especially when assessing enrichment of (presumably) small sets of genes.

      This is partly true and partly a typo. We generated 13,491 alignments, 13,310 of which had HUGO gene symbols. These 13,310 genes were used in all subsequent studies. We have re-written the text to clarify this point, and have added a statement: “We thus generated a dataset of 13,491 orthologous coding gene alignments from the genomes of 261 Eutherian mammals, corresponding to 62.7% of all protein-coding genes in the gorilla genome. Of the 13,491 alignments, 13,310 had an identifiable HUGO gene symbol and were used in all subsequent analyses (lines 158 - 162).”

      Related to this, it is difficult to determine how many genes these GO associations are based on. Even small numbers of genes can result in very significant results with these tests. How many genes are these associations based on? This connection is a key component of the overall narrative that changes in sperm competition have a large effect on genome-wide shifts.

      All analyses are based on the 13,310 genes with identifiable HUGO gene symbols, including over-representation analyses (ORA). Our dataset submitted with this manuscript includes these 13,310 genes (as well as the genes with K<1 and K>1). The number of genes used as the foreground is the 578 with K<1, these genes are given in Figure 1 – source data 3. The minimum number of genes annotated in a GO or pathway term was 3. While it is unlikely that statistically significant GO term enrichments result from a few genes annotating to each term, that scenario would produce small P-values, the false discovery rate would be high and readers can decide what false discovery they are willing to accept.

      How many of these 578 genes are plausibly related to reproduction? Apologies if I missed this detail, but Figure 3 does not convey this. Could you speak to this directly in the text and include a table or supplemental table of the GO terms to show the differences in enrichment between classes of genes, and counts per term?

      These data are included in Figure – 3 source data 1.

      One of the key results is the relative frequency of relaxed constraint versus positive selection. This is expected on some level as the form of recurrent positive directional selection detected with these models is usually relatively rare. However, it is not at all clear that it is rarer in gorillas versus other mammals, as implied.

      Our comparison of relaxed constraint to positive selection was to explore if more genes experienced one pattern of molecular evolution or the other within gorillas, we do not imply that it is rarer in gorillas than in other mammals.

      Likewise, I was wondering how the dataset itself may be biased toward this result. If I understand correctly, you are requiring very high levels of conservation (251/261 genes) for inclusion in the dataset, resulting in ~60% of all gorilla genes being included. Rapidly evolving genes that are targets of recurrent positive selection often also tend not be highly conserved across such a deep phylogenetic sample. It would be good to acknowledge this potential bias when implying meaning to the differences in relative rates of the two forms of selection.

      Our results are unlikely to be subject to this bias. The RELAX test relies on accurately estimating K in background lineages, which requires that we include as many species as possible. The tradeoff is a reduction in the number of genes included in the dataset due to evolutionary dynamics across a wide range of species. However, it's not that 40% of the genes are excluded because they are evolving so rapidly we cannot identify or align them, it mainly reflects the fact that we cannot identify the gene in 251 of the 261 species included in the dataset (due to gene loss, etc).

      Page 9 - The results here (and in Figure 3D) shows that relaxed genes are enriched broadly across spermatogenesis cell types except for Sertoli cells. But the Sertoli cells and a few non-significant cell types are the only thing to compare to. Instead, it would be interesting to identify single cell expression patterns from other tissues- or even bulk RNA as sc-RNA may be limited in the species. This would show that these genes are enriched in testis compared to other tissues, as opposed to just being broadly expressed. Additionally, the authors could compare to the other primate testis sc-RNA available in Murat et al. Without such comparisons the interpretations here seem limited.

      We did not test whether K<1 were enriched in other cell types because: 1) we had an a priori hypothesis that genes with K<1 would be enriched in cells involved in male reproduction, rather than enriched in cell types in the testis compared to any other cell type; and 2) The number of genes with K<1 is relatively small and the number of known cell-types in very large, at least one estimate points to ~400 major cell types in a higher primate (PMID: 37722043). Using a P-value of 0.05 from a hypergeometric or Fisher's exact test and a Bonferroni correction to control for multiple hypothesis testing, we would need the P-value for enrichment in any cell type to be 0.000125, which we are unlikely to achieve.

      More comprehensive functional comparisons could provide evidence that even though relaxed constraint is present in all lineages, perhaps relaxed constraints in the gorilla lineages are more related to sperm formation and function.

      The RELAX test is a relative one; while relaxed constraint may be present in other lineages, to observe a statistically significant K<1 in gorillas the degree of relaxation would have to have a greater effect size in gorilla than in other lineages.

      I was also a little unclear what to make of the interpretation of K<1 versus K >1 enrichment by cell type. The enrichment of K<1 is called out as noteworthy because this is when the spermatogenesis specific genes begin to be expressed, but then the K > 1 result is dismissed as occurring during pachytene which is a transcriptional permissive state of testis. To be clear, pachytene is also a critical checkpoint for fertility and enhanced purifying selection at this step could be reasonably interpreted as being at odds with the entire erosion of reproduction argument. This seems to be a selective interpretation for the overall narrative. Also, permissive transcription is not only limited to the pachytene stage and the relaxation of constraint concomitant with increased specificity and permissive expression during the later stages of spermatogenesis is a well-known result in mammals, and not anything that can be ascribed gorillas and their change in mating system.

      We agree with the Reviewer’s comment and have removed the K<1 versus K>1 interpretation from the manuscript.

      Page 13 - The LOF enrichment identified from this random sampling is borderline significant. An improved approach would be to perform permutations of random samplings and identify the range of significance based on 1000+ permutations.

      We have redone the burden test with population-matched groups to confirm the reliability of this association (lines 435 - 446). In addition, we now acknowledge in the Caveats and limitation section that our observations could benefit from a permutation analysis (lines 695 - 697).

      Page 17, bottom- Statements like these are overstating the correlation as the comparative analyses were not shown.

      We agree and have edited the text to avoid potential overstatements.

      This is good to include the role of female reproductive tract. Shouldn't the unbiased screen pull these out anyway? The authors did find some female GO terms enriched. What additional information or experiments would be needed to test the hypothesis of female compensation? The expectations for this should be made clearer.

      Given the nature of these putative female compensatory mechanisms (primarily acting on the oviduct and lower uterus, as speculated in lines 586 – 601), it is currently impossible to functionally test them in gorillas. The continued development of in vitro systems mimicking the female reproductive tract may allow such studies in the future.

      Page 18, middle- Pleiotropy is an important consideration and this paragraph discusses some valuable points. However, this is another section that could be improved by discussing the relaxed constraints in later spermatogenesis, which likely suggests that genes expressed in later stages are less pleiotropic and more testis- specific.

      We agree and have added a brief discussion of this in lines 619 - 622: “It is also possible that the negative consequences of deleterious pleiotropy become less pronounced at later stages of spermatogenesis as meiotic and post-meiotically expressed genes are enriched for testis-specific functions (PMID: 36544022).”

      Page 27, Bottom- The criteria for selection of genes to target here is interesting and disconnected from the claimed interpretation of the results. If you're targeting genes with reliable expression in Drosophila, it is not surprising that a percentage of them will lead to fertility loss. Shouldn't the background be a random set of testis-expressed genes? This test would show that relaxed constraint is a strong way to screen for fertility genes. Additionally, the authors previously showed that these genes were enriched in SC-rna in gorilla,- and likely other species. Suggesting that you identified genes 'lacking evidence' of a role in spermatogenesis in previous studies is misleading, when many of these genes are present in testis RNA datasets and enriched for sperm go terms. I would argue that genes found to be expressed in testis and spermatogenesis specific cell types, certainly have evidence of being involved in spermatogenesis.

      We thank you for the helpful suggestion. We have generated a new background group composed of a random set of testis-expressed genes. More specifically, by looking at previously published Drosophila testis expression data (PMID: 30249207), we randomly selected 156 genes with TPM>1 (transcript per million) and determined the percentage of them with reported spermatogenic / male fertility defects in Drosophila. We observed that 18 (11.5%) had been previously demonstrated to be functionally required for male reproductive fitness. This percentage is slightly higher than what we had previously observed for a random selection of Drosophila genes (9.6% - an update, using the latest available data, to the 7.7% reported in the original version). Nevertheless, both figures are still well below the 27.6% hit rate we found for the Drosophila orthologs of the gorilla K<1 genes. We have added this new information to the manuscript (lines 380 - 386).

      Regarding the potential correlation between expression and function in spermatogenesis, we and others have shown that the majority of the protein-coding genome is expressed during spermatogenesis in both vertebrate and invertebrate species (PMID: 39388236). Although the reasons for such widespread transcription in the male germ line are not entirely clear, it advises a cautious approach in terms of correlating expression with function. Indeed, our recent analysis of 920 genes reliably expressed in insect and mammalian spermatogenesis revealed that only 27.2% of them caused male reproductive impairment when individually silenced in the Drosophila testis (PMID: 39388236). Since genetic redundancy is a factor that needs to be taken into consideration when dealing with such a central biological process for the survival of a species, we take the more stringent approach of only considering a gene to be functionally involved in spermatogenesis if there is phenotypical evidence (from our RNAi assay or from previous publications) that its disruption is associated with spermatogenic impairment and/or abnormal fertility. We have added this clarification to the manuscript (lines 349 - 363).

      Page 17 "Our data ... suggests that gorillas may be at the lowest limit of male reproductive function that can be maintained by natural selection (at least in mammals or vertebrates)." I realize this is the speculation section, but this is a massive overstatement. There is absolutely nothing in your data or results that support this statement, nor is this supported by the extensive comparative reproductive data in mammals. For example, there are many mammalian systems that show lower metrics of reproductive function than gorillas. For example, the sperm abnormality indices in Box 1F are nowhere near as severe as found in many species that still somehow manage to reproduce.

      We agree and have edited the text to avoid potential overstatements (see above).

      Reviewer #3 (Recommendations for The Authors):

      (1) More discussion is needed as to whether their results could be explained by a reduction in effective population size in gorillas.

      Thank you for raising this important point. As you know, reduced effective population size can lead to an increased load of deleterious mutations/relaxed selection intensity. However, we do not believe that it substantially affects our observations. Indeed, relatively few genes have K<1 and those are enriched in sperm biology. Given that a reduced effective population size will plausibly increase the load of deleterious mutations and relaxed selection across many genes, it is unlikely that such a broad phenomenon would result in a specific enrichment in genes related to male reproductive biology. We have added this reasoning to the Caveats and limitations section (lines 675 - 682).

      (2) Properly controlled genetic association testing when performing a burden test is essential, and methods that allow for some variants to be associated with increased fertility should be considered. Rare variants are much more likely to show population-specific differences, and selecting humans from two potentially very different cohorts and sample sizes can easily lead to confounding. I suggest performing a principal component analysis to ascertain the degree of genetic differentiation between these cohorts, and use this to guide the selection of a subset of the control cohort as well.

      We agree and have replicated this analysis using only individuals of European descent; our conclusions have not changed but the P-values have become lower (lines 435 - 446).

      (3) Citations should also be included in Table 1, for each relevant phenotype. You may also want to consider a more general comparison of p-values and effect sizes of genome-wide association studies for human male infertility to test for an enrichment in/nearby genes showing relaxed selection along the gorilla lineage. In other words, do the relaxed genes in the gorilla lineage have an enrichment of small p-values for being associated with male infertility.

      Citations have been included in Table 1, as suggested, and the table has been updated to include the latest reported phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how βglucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We appreciate your comments on assessing the chormoatain accessibility of HSCs induced by b-glucan training, as epigenetic reprogramming is known to be one of the underlying mechanisms for trained immunity suggest by many groups including our group. To delineate the genome-wide epigenetic reprogramming induced by β-glucan (BG), we reanalyzed publicly available chromatin profiling datasets where ATACseq of HSC from control and β-glucan trained mice was performed (accession number: CRA014389). Comparative analysis revealed HSC from BG-trained mice demonstrated pronounced enrichment at promoters and distal intergenic regions—key regulatory loci governing transcriptional activity (Fig. S7A). This divergent genomic targeting was further corroborated by distinct signal distribution profiles (Fig. S7B), supporting pronounced upregulation-driven remodeling of the epigenomic landscape induced by BG treatment. Functional annotation of these epigenetically primed promoters via GO term analysis revealed significant enrichment of immune-relevant processes, including leukocyte migration, cell-cell adhesion, and chemotaxis (Fig. S7C). Consistently, KEGG pathway analysis highlighted the enrichment of signaling cascades such as chemokine signaling and cell adhesion molecules (Fig. S7D), reinforcing the involvement of BG-induced trained immunity in inflammatory and mucosal homing pathways.

      Furthermore, promoter-centric enrichment of terms related to “defense response to bacterium” (Fig. S7E) underscored the role of BG in priming antibacterial transcriptional programs, which is a crucial axis for maintaining intestinal homeostasis. Locus-specific examination of chromatin states further validated BG-induced epigenetic modifications in the upstream regions of selected target genes, including Gbp5, Gbp2 and S100a8 and Nos2 (Fig. S7F). Collectively, our integrative reanalysis demonstrates that BG reshapes the epigenomic architecture at regulatory elements, thereby orchestrating immune gene expression programs directly relevant to IBD pathophysiology and mucosal immunity. (Line 201-211)

      Reviewer 1 (Recommendations for the authors):

      (1) It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We appreciate your comments and proposed a graphical abstract as in Author response image 1.

      Author response image 1.

      (2) Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target or side-effects of β-glucan induced trained immunity. As trained immunity is known to augment inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We have discussed this potential caveat in the discussion (Lines 299-302)

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for your positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, antiinflammatory TI program is proposed.

      We appreciate your valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript as the current manuscript was aimed to build on the foundation of β-glucan-induced trained immunity established by many other groups including us and address its potential as a therapeutic approaches in the colitis setup.

      That being said, we fully agree with your comments to analyze the epigenetic profile on key pathways similar to the question raised by reviewer 1, we reanalyze the relevant public datasets and presenting summarize the finding in Supplementary Figure S7. ATAC-seq analysis further validated and provide the epigenetic basis of the enhanced inflammatory and antibacterial capacity of monocytes which are seeded back in the HSC compartment.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation. Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). These results indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1<sup>+</sup> macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      We thank the reviewer for this important point. We acknowledge that direct in vivo tracking of the adoptively transferred monocytes to confirm their homing to the colon and differentiation into specific macrophage subsets would strengthen the mechanistic link. However, due to technical limitations in reliably tracing the fate of transferred cells in our experimental setting, we were unable to provide this direct evidence. Instead, we present a strong correlative and functional evidence chain that supports the proposed model:

      (a) Following BG pretreatment, we observed a significant decrease in circulating Ly6Chi monocytes specifically at the peak of colitis (day 7, Fig. 5D), concurrent with a marked increase in monocytes/macrophages within the colonic lamina propria (Fig. 2D). This inverse relationship strongly suggests enhanced recruitment of monocytes from the blood into the inflamed colon upon BG training.

      (b) Using CX3CR1-GFP reporter mice, we found that BG pretreatment led to an increased proportion of colonic myeloid cells in an intermediate state (P5: Ly6C<sup>+</sup>MHCII<sup>+</sup>CX3CR1<sup>+</sup>, Fig. 5F). This population represents monocytes actively undergoing differentiation into intestinal macrophages, supporting the idea that BG accelerates the monocyte-to-macrophage transition in situ.

      (c) Our scRNA-seq analysis independently revealed an expansion of monocyte-derived macrophage clusters (e.g., Macro1, Macro2) in BG-treated mice, which express canonical tissue macrophage markers (including Cx3cr1) and genes associated with tissue repair (e.g., Vegfa, Fig. 4A, 5H, 5I).

      These data collectively indicate that BG-trained monocytes exhibit enhanced capacity for colonic recruitment and preferential differentiation toward reparative macrophage subsets, which aligns with the protective phenotype observed after adoptive transfer. We have explicitly noted the absence of direct fate-mapping data as a limitation in the revised Discussion and agree that future studies employing advanced tracing techniques would be valuable to definitively establish this cellular trajectory. (Line 378-380)

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show direct casual evidence via specifically depleting subcluster cells. However, the result from the monocyte adoptive transfer experiment with Ccr2 KO mice experimental strongly suggest the presence of monocytes is crucial for this protective effect. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 400-404).

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-326).

      Reviewer 2 (Recommendations for the authors):

      (1) The authors do not provide direct mechanistic evidence of TI (e.g., epigenetic and metabolic reprogramming). The absence of such data weakens the mechanistic strength of the TI claim. The authors should soften the terminology to BGinduced myeloid reprogramming suggestive of trained immunity, acknowledge, and discuss this limitation.

      We appreciate your comment highlighting the lack of direct epigenetic and metabolic assessment in our current study. Previous work from our group (S.-C. Cheng) and others has extensively documented the epigenetic and metabolic profiles of monocytes from β-glucan–trained mice, focusing primarily on inflammatory-related genes. Based on this established foundation, our current manuscript focuses on exploring the translational potential of BG-induced trained immunity.

      That said, as mentioned in our response to the identified weakness, we performed reanalysis from the public epigenetic datasets with a focus on pathways related to reparative and antibacterial functions and integrated this part in the revised manuscript (Fig S7, Lines 201-211).

      (2) CX3CR1<sup>+</sup> macrophages' role is not functionally validated. The data relies solely on scRNA-seq and cluster annotations, which are insufficient to confirm functional roles in vivo. Depletion or in vitro studies would provide stronger causal evidence. The authors should acknowledge this limitation in the Discussion.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show a direct casual evidence. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 395-404).

      (3) Rag1<sup>-/-</sup> mice retain innate lymphoid cells (ILCs), particularly ILC3, which are mucosal and produce IL-22, contributing to tissue repair (PMID: 21502992; PMID: 32187516). The potential for BG to activate ILCs remains unexplored in this study. This limits the interpretation of whether the observed protection arises from monocyte/macrophage reprogramming or is partially mediated by residual ILC activity. The authors should explicitly acknowledge this limitation and discuss the possible contribution of ILCs to the observed phenotype.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-327).

      (4) Figure 1-It would help to clarify whether a BG-only control group (without DSS) was included in the design. This would be critical to determine if BG alone alters the colon. If omitted, the authors should clearly state this and consider adding such a group in future experiments. This would help define the baseline effects of BG and support the claim that its benefits are dependent on TI (upon second challenge - DSS).

      We appreciate this valuable suggestion. While we did not perform qPCR to assess mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a dedicated BG-only control group at based line before DSStreatment (Colitis_d0). These data indicate that BG preconditioning alone does not alter the baseline expression of colon mucosal repair genes.

      (5) Figure 3 - It would strengthen the conclusions to include a vehicle-treated PBS BMT donor control group, or to state its absence. It is unclear whether the protective effect observed in recipients of BG-treated BM is due to trained immunity or to non-specific effects of transplantation, irradiation, or batch variation.

      We fully agree with your comments that it is critical to including the vehicle-treated PBS BMT control to rule out any non-specific effects induced by transplantation, irradiation or batch variation. We actually did the blank PBS transfer control everytime after mice received irradiation treatment as a control to assess the successful induction of irradiation to get rid of bone marrow from irradiated mice. Mice that receive PBS only will die after 8 days while only mice receiving either bone marrow from PBScontrol or BG-treatment group will survive. We also perform flowcytometry to examine the successful BMT transplantation (Fig S5C). We have added part regarding the vehicle-treated control for BMT in the material method section for clarification (Lines 456-466).

      (6) No gene expression or phenotypic data is provided for monocytes/macrophages in BMT recipients; therefore, it cannot be confidently stated that these cells were reprogrammed. Expression/phenotypic data should be added or discussed.

      We thank the reviewer for raising this important point. We acknowledge that a detailed transcriptomic or phenotypic analysis of donor-derived tissue-resident myeloid cells in the BMT recipients would provide the most direct evidence for their reprogrammed state.

      While our BMT study focused primarily on assessing the transferability of the protective phenotype via endpoint disease parameters and circulating immune cell composition, we present a coherent and compelling line of evidence supporting the conclusion that BG's training effect is maintained within the hematopoietic system of recipients and mediated by reprogrammed myeloid cells:

      (a) A key finding is the significant increase in the proportion of donor-derived Ly6Chi monocytes in the peripheral blood of recipients receiving BG-trained bone marrow (Fig. 3J). This is not a bystander effect but direct evidence that the BG-induced on donor hematopoietic stem/progenitor cells instructs a biased differentiation program towards a specific effector precursor population within the new host, demonstrating the functional persistence of the trained state post-transplantation.

      (b) The core of reprogramming in trained immunity lies in persistent epigenetic and functional changes. Our new analysis of public datasets (Fig. S7) confirms that BG directly reshapes the chromatin accessibility landscape in hematopoietic stem cells (HSCs), particularly at loci regulating immune and antibacterial responses. This provides the fundamental mechanism explaining how the trained phenotype is both long-lasting and transplantable: the reprogramming occurs at the progenitor level.

      (c) The most causally compelling data in our study comes from the independent adoptive transfer experiment, where transfer of purified BG-trained monocytes alone was sufficient to ameliorate colitis in recipient mice (Fig. 3K, L). This definitively proves that the trained monocytes themselves carry the protective functional program. It strongly suggests that these reprogrammed monocytes/macrophages are the likely effectors mediating protection in the BMT model.

      (d) Our interpretation aligns with well-established paradigms in the field. Precedent studies confirm that the BG-trained phenotype (e.g., enhanced cytokine potential) can be transferred via BMT or monocyte adoption. For instance, Haacke et al. (PMID: 40020679) demonstrated that splenic monocytes from BG-trained donors, when transferred into arthritic recipient mice, led to elevated inflammatory cytokine (e.g., Tnf, Il6) expression in recipient joints, directly proving the maintained functional reprogramming of trained cells in a heterologous host environment. This provides a strong precedent supporting the functional activity of transferred trained cells in our model.

      (7) The study is consistent with emerging evidence that distinct TI programs may exist depending on the stimulus and context, including immunoregulatory and tissue-reparative responses (PMID: 35133977; PMID: 31732931; PMID: 32716363; PMID: 30555483). The authors should integrate this perspective into the Discussion to acknowledge that their findings may represent one example of such context-dependent, potentially reparative TI programs. This would place the study within the growing literature describing functional heterogeneity in innate immune training.

      We appreciate this suggestion and have incorporated it into the discussion. In the revised manuscript, we discussed how our findings of BG-induced protective myeloid reprogramming align with the concept of tissue-reparative or immunoregulatory TI, which is distinct from the pro-inflammatory TI phenotypes described in other contexts. By highlighting the functional heterogeneity of innate immune training, we position our work as an example of a stimulus-specific, reparative TI program. (Lines 356-379)

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      Thank you for the positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Reviewer 3 (Recommendations for the authors):

      (1) Current best practices recommend working with raw count data when using DESeq2 to ensure statistically robust differential expression analysis between samples. However, for visualization and clustering, like heatmaps, FPKMs can be used. Could the authors explain why they have used FPKM for differential gene expression analysis?

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Minor Comment

      (1) Line 92: remove extra word "that".

      We remove the extra word “that” from Line 92 in the revised manuscript.

      (2) Line 201: please state here what "GBP" stands for, as it appears first.

      We define “GBP” as “Guanylate-Binding Protein” at its first appearance in Line 201. (Lines 213)

      (3) Line 235: consider rewriting "we analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid"; added spacing for "day 7", "which", and "the".

      We revise the sentence in Line 235 to read: “We analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid…” to improve readability. (Lines

      246-247)

      (4) Line 290: consider rewriting " as seen in conditions such as rheumatoid arthritis and ...".

      We revise Line 290 to: “as observed in conditions such as rheumatoid arthritis and…” for clarity. (Lines 301-302)

      (5) Line 375-376: please check sentence starting lower case "with minor modifications, by assessing ".

      We correct the sentence to start with a capital letter: “With minor modifications, by assessing…” (Lines 422-423)

      (6) Line 399: kindly consider adding "was" after "cDNA".

      We revise Line 399 to include “was” as suggested: “cDNA was synthesized…” (Lines 446)

      (7) Line 346-347: consider adding "which" after "monocytes": "We transferred BGpreconditioned monocytes which significantly alleviated clinical symptoms".

      We revise Line 346-347 to include “which” as suggested for grammatical clarity. (Lines 385-386)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992) and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We softened this term in our revision to “nearly parallel to the microtubule” (Line 464). In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We acknowledge that our treatment of kinesin-3 was confusing. In response, we deleted any reference to kinesin-3 catch-bond in the Results section, and restricted it to the Discussion where it is interpretation. In Line 635 in the Discussion, we softened the statement of catch-bond activity to “…all three dominant kinesin transport families display catch-bond like behavior at stall…”. We acknowledge that, classically, the catch/slip bond nomenclature refers to simple protein-protein interactions and is easier to interpret there. However, the term ‘catch-bond’ has been used in the literature for myosin, dynein and kinesin, and thus we feel that it is sufficiently established to use it here.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution, and we calculated a corrected kinesin-3 stall duration due to these undetected slips. This data and analysis are included as a new Supplementary Figure S8. In the main text on Lines 283-293 we included the following text:

      “It was notable that the kinesin-3 stall durations at high load are longer than the ramp durations at low load, because this indicates that the kinesin-3 off-rate slows with increasing load. However, because kinesin-3 had the most slip events at stall, we were concerned that there may be undetected slip events below the 60 nm threshold of detection that led to an overestimation of the kinesin-3 stall duration. To test this hypothesis, we plotted the distribution of kinesin-3 slip distances at stall, fit an exponential, and calculated the fraction of missed slip events (Fig. S8). From this analysis, we calculated a correction factor of 1.42 that brought the kinesin-3 stall duration down 1.33 s. Notably, this stall duration value is still well above the kinesin-3 ramp duration value of 0.75 s in Fig. 3C and thus does not qualitatively change our conclusions.”

      We thank the reviewer for this suggestion.

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point. More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We revised this sentence to the following: “In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to continue generating force after a small rearward displacement, rather than fully detaching and ‘resetting’ to zero load.” (Line 339-342)

      It should be noted that, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. To address this point, we added in the Discussion on lines 654-656:

      “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      To address the question of neutravidin acting as a roadblock, we did the following. Because of the sequence of injections used to assemble the tensiometer in the flow cell, there are often some residual GFP-kinesin motors that aren’t attached to DNA and thus serve as internal controls for unloaded motility on the neutravidin-functionalized Mt. We quantified the run durations of these free kinesin-GFP and found that their run duration was 0.92 s (95% CI: 0.79 to 1.04 by MEMLET). This is slightly lower but not statistically different from the 1.04 s [0.78, 1.31] on control microtubules in Fig 2A. This result is included in Figure S6 in the revised manuscript.

      We don’t have a precise estimate for the amount of neutravidin on the microtubules. Based on Fig. 3C of Korten and Diez (Korten and Diez, 2008), the reduction in the unloaded run duration that we see corresponds to a ~2% biotinylation ratio. We polymerize Mt with 10% biotinylated tubulin and add 8 nM neutravidin to the flow cell, so in principle the microtubules could be 10% biotin-streptavidin coated. However, there are a number of uncertainties that push this estimate lower – a) the precise degree of biotinylation, b) whether the %biotinylated tubulin in polymerized microtubules is lower than the mixing ratio due to unequal incorporation, and 3) what fraction of the biotinylated tubulin are occupied by the neutravidin when using this neutravidin flow-in method. Thus, our best estimate is ~2% biotin-streptavidin functionalization.

      The ramp durations in Fig. 3 provide another argument that biotinylated microtubules are not affecting the motors. Compared to unloaded durations for each motor, the kinesin-1 ramps were longer, the kinesin-2 ramps were the same, and the kinesin-3 ramps were shorter duration. That argues against any systematic effect of biotinylation on motor run durations, with the caveat that family-dependent differences could in principle be masking an effect. The fact that ramp durations aren’t systematically longer or shorter than the unloaded run durations also argues that the stalls we see, which are at the expected extension length of the dsDNA, are not caused by neutravidin roadblocks.

      The final point the reviewer brings up is whether neutravidin may be contributing to the rescues from slips events that we observe. This is difficult to fully rule out. However, because the unloaded run durations aren’t significantly altered by the biotin-streptavidin on the microtubules, we don’t expect the rescue events following a slip to be significantly affected. In principle, we could systematically increase and decrease the biotinylation and see whether the slip rescues change, but we haven’t done this.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history-independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although non-parametric methods such as K-M make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6 s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). Specifically, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections to the kinesin-3 unloaded run durations due to finite microtubule lengths. To address this point in the revision, we added the following note in Table S2: “* Because the Markov-Bayesian model, which is unaffected by left and right censoring of data gave same unloaded run durations for kinesin-3 as the MEMLET fit, we did not the kinesin-3 unloaded run durations for any right censoring due to finite microtubule lengths.” We also added the following point in the legend of Fig. S1: “A fraction of kinesin-3 unloaded run durations were limited by the length of the microtubules, but fitting to a model that took into account missed events gave a similar mean duration as an exponential fit, and so no correction was made (Table S2).”

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6 kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. We clarified this in the revised Figure S6 legend. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      We addressed this point in lines 200-212 of the revised manuscript:

      “We carried out two additional control experiments. First, to confirm that the neutravidin used to link the DNA to the microtubule wasn’t affecting kinesin motility, we analyzed the run durations of kinesin-1 motors on neutravidin-coated microtubules and found no change compared to unlabeled microtubules (Fig. S6). Second, we measured the run duration of kinesin-1 linked to a DNA tether that was not bound to the microtubule and thus was being transported (Fig. S6). The kinesin-DNA run duration was 1.40 s, longer than the 1.04 s of motors alone (Fig. 2A). We interpret this longer duration to reflect the slower diffusion constant of the dsDNA relative to the motor alone, which enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event.“

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We changed this text (Lines 265-267) to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and the model is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the Discussion of the revision, we added text to note that this behavior is indicative of an ideal bond (not a catch-bond) on Lines 480-483: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics and instead characteristic of an ideal-bond.” We also added a sentence in the Introduction highlighting this work, Lines 84-87: “Fourth, when kinesin-1 was connected to a bead through a micron-long segment of DNA and hydrodynamic forces were imposed on the bead, motor interaction times were insensitive to hindering loads up to 3 pN, indicative of an ideal-bond.”

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. We added the following paragraph in Lines 101-111 in the Geometry Consideration section of the Supplementary Methods.

      “Another consideration when comparing the DNA tensiometer to optical trap measurements is the relative stiffness of the trap and dsDNA. Optical trap stiffnesses are generally in the range of 0.05 pN/nm [12,13]. To calculate the predicted stiffness of the dsDNA spring, we computed the slope of theoretical force-extension curve in Fig. 1B. The stiffness is highly nonlinear and is <0.001 pN/nM below 650 nm extension. At the predicted stall force of 6 pN (960 nm extension), the dsDNA stiffness ~0.2 pN/nm, which is stiffer than most optical traps, but it is similar to the estimated 0.3 pN/nm stiffness of kinesin motors themselves[12,13]. An 8 nm step at this stiffness leads to a 1.6 pN jump in force, so it is reasonable to expect that motors are dynamically stepping at stall. Therefore, there is no reason to expect that stiffness differences between optical traps and the dsDNA spring are affecting the motor detachment kinetics.”

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. In response to the point from Reviewer #3, we added the following sentence on Lines 654-656: “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (e.g. ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      We agree that at first glance those jumps are puzzling. To investigate this question the first thing we did was to go back to our tensiometer dataset and look systematically at jumps for all three motors. We found roughly 4-6 large jumps like these for all three motors (kinesin-1: 250 +/- 99 nm (mean +/- SD; N=5); kinesin-2: 249 +/- 165 nm (N=6); kinesin-3: 490 +/- 231 nm (N=4)). Thus, although the apparent jumps may be more pronounced due to the specific rebinding kinetics of kinesin-2, this behavior is not unique to this motor. (Note that the motor binding position distribution in Fig. S2 is taken from initial binding positions that follow a clear period of detachment; thus, not all jumps are captured there.)

      Our interpretation is that these apparent jumps are simply a reflection of the long length and high compliance of the dsDNA tether. For instance, below 650 nm extension the stiffness, k <0.001 pN/nM (see Reviewer #3, point #1 above). Thus, we expect large fluctuations of the tethered motor when not bound to the microtubule. One reason that these events look like ‘jumps’ is that the sub-ms fluctuations during detached periods are not captured by the ~25 fps movies (40 ms frame acquisition time). Instead, the fitted Qdot position represents the average position during the acquisition window. Actually, due to these rapid fluctuations (and the limited depth of the TIRF illumination field) the position often can’t be determined during these periods of fluctuation (e.g. see gaps at ~2.5 s, 11 s and 24 s in Fig. 1F).

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      Recommendations for the authors: 

      Reviewing Editor Comments:

      The reviewers are in agreement with the motivation and approach of this study. The use of DNA tethers is an important advance in tethering motor proteins to gain insight into how motors respond to load. However, all 3 reviewers express reservations on how well the results support the claims. In particular, the use of the term catch bond was problematic, with Reviewer #2 suggesting some alternative nomenclature. Reviewer #1 expressed concern with experimental evidence for the predicted force-extension curve shown in Figure 1. I agree with the reviewers that additional experimental evidence would be required to conclude the catch-bond detachment kinetics of kinesin.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) By eye, the run lengths, e.g., of kin-1 look very long in Figure S1 ... certainly above the expected 1 µm. Please check and comment.

      We agree that the long runs do stick out by eye in this figure. To address this point, we analyzed the run lengths and run times from the kymograph shown in Fig. S1. Fitting the run duration distribution gave t = 1.31 s with a 95% CI of 0.96 to 1.67. This is slightly longer than the 1.04 s duration in Fig. 2A, but the 95% CI include this population mean, and so the S1 data are not statistically significantly different. The run time distribution from the S1 kymograph is given in Author response image 1.

      Author response image 1.

      (2) The upper right kymograph in Figure 4A does not show a motor return to the baseline. Also, the scale bars, etc., are unreadable. Please modify.

      Our purpose for showing the kymographs in Fig. 4A was to show the specific features of slips and fast and slow reattachment. Because we blew up the kymographs to show those specific features, it precluded us from showing the entire return to baseline. As suggested, we magnified the scale bars and the labels on the kymograph labels to make them readable.

      Reviewer #3 (Recommendations for the authors):

      (1) The frequent references to 95% confidence intervals disrupt the flow of the text. Perhaps the confidence intervals could be listed in a table rather than in the body of the text.

      We deleted those from the text; they are shown in Fig. 2D and listed in Table S2.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Korten, T., and S. Diez. 2008. Setting up roadblocks for kinesin-1: mechanism for the selective speed control of cargo-carrying microtubules. Lab Chip. 8:1441-1447.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-ofwar models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna y J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:63716376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaher. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:11221126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaher. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17: e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Xiong and colleagues presents a compelling validation of UniDesign, a fully computational protein design framework, by using it to engineer a novel, PAM-relaxed variant of Staphylococcus aureus Cas9 (SaCas9) named KRH. The core achievement is the successful de novo generation of a high-performance nuclease (E782K/N968R/R1015H) solely through in silico modeling, without any subsequent experimental optimization or directed evolution. The authors demonstrate that KRH expands the SaCas9 PAM specificity from NNGRRT to NNNRRT, achieving genome editing and base editing efficiencies across multiple human cell types that are comparable to, and sometimes exceed, the well-known evolution-derived KKH variant. The work positions UniDesign not merely as an analytical tool, but as a powerful engine for the generative design of complex molecular functions, offering a scalable and mechanistically insightful alternative to traditional experimental screening.

      Strengths:

      This is an outstanding manuscript that serves as a powerful proof-of-concept for the next generation of computational protein design. The primary selling point-the raw predictive and generative power of UniDesign-is convincingly demonstrated throughout.

      The manuscript shows that the tool can:

      (1) successfully navigate a complex sequence landscape to identify a minimal set of three mutations (KRH) that remodel a critical protein-DNA interface;

      (2) accurately model and balance the delicate interplay between specific base contacts and non-specific backbone interactions to achieve relaxed PAM specificity;

      (3) deliver a final product whose performance is indistinguishable from, and in some cases superior to, a variant that required extensive wet-lab evolution.

      The experimental validation is rigorous, thorough, and directly supports the computational predictions. This work will stand as a landmark study for the field, illustrating that computational design has matured to the point where it can reliably generate sophisticated tools for genome engineering.

      (1) Demonstration of Generative Power:

      The most significant finding is that UniDesign, without any experimental feedback, generated a variant (KRH) that matches the performance of the evolution-derived KKH. This is a remarkable achievement. The iterative design strategy-first reducing PAM bias (R1015H), then restoring binding through non-specific interactions (e.g., N968R, E782K)-is a textbook example of rational design, but it is executed entirely by the algorithm. This validates UniDesign's energy function and search algorithm as capable of capturing the subtle biophysical principles governing PAM recognition.

      (2) Mechanistic Insight as a Built-in Feature:

      A key advantage of UniDesign highlighted by this work is its inherent ability to provide mechanistic explanations. The computational models not only predicted which mutations would work (e.g., N968R over N968K in the KRH variant) but also why they work. The structural and energetic analyses showing the bidentate salt bridge formed by Arg968 versus the single bond formed by Lys968 (Figure 4A) is a perfect example of how the tool's output can rationalize functional differences, a level of insight that is rarely attainable from directed evolution campaigns alone.

      (3) Scalability and Accessibility for Engineering:

      The authors explicitly contrast UniDesign's efficiency (minutes to hours per design run) with the computational expense of methods like COMET and the experimental overhead of directed evolution. The improvements to UniDesign v1.2, specifically the mutation-count and sequence-uniqueness penalties, directly address a key challenge in computational design (generating diverse, low-energy point-mutant libraries). This positions the tool as a highly accessible and scalable platform for engineering other CRISPR systems, a point that will be of immense interest to the community.

      We sincerely thank the reviewer for the comprehensive summary and the highly positive and encouraging comments on our manuscript.

      Weaknesses:

      (1) Title and Abstract Emphasis: The title and abstract are effective but could be slightly sharpened to emphasize the primary message. Consider a title like "Fully computational design of a PAM-relaxed SaCas9 variant with UniDesign demonstrates power to match directed evolution." The abstract could more explicitly state upfront that the design was achieved without any experimental iteration.

      We thank the reviewer for these valuable suggestions. We agree that our current title and abstract may be overly objective and neutral, and we will consider refining them during the formal revision.

      (2) Figure 1, Panel M: The data points in panel M are currently presented at a font size that makes them difficult to read, particularly the labels for the many triple-mutant variants. This density obscures the clear identification of the top-performing designs, such as the KRH variant selected for experimental validation. I recommend that the authors increase the font size of all text elements within this panel, including axis labels, tick marks, and data point labels, to improve legibility. If necessary, the panel dimensions can be adjusted or the layout reorganized to accommodate the larger text without compromising clarity. Ensuring this figure is readable is important, as it visually communicates the energetic convergence that led to the selection of KRH.

      We thank the reviewer for these valuable suggestions. We will refine the Fig. 1M during the formal revision.

      (3) Generality of the Design Strategy for Other PAM Positions:

      The design strategy focused on relaxing specificity at the highly constrained third position of the PAM (the guanine in NNGRRT). How transferable is this specific strategy (i.e., disrupting a key specific contact and compensating with non-specific backbone binders) to relaxing other positions in the PAM or to other Cas enzymes with different PAM-interaction architectures? A short discussion on this point would help readers understand the broader applicability of the "fine-tuning the balance" principle.

      We thank the reviewer for this insightful question and suggestion. The current study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which eight Cas9 proteins and two Cas12 proteins (each has a different PAM) were investigated. Our computational results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs). For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform similar PAM relaxation designs for other Cas9 or Cas12 proteins, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We will include additional discussion to clarify this point and highlight the broader applicability of our design strategy.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the fully in silico design of a new variant of Staphylococcus aureus Cas9 (SaCas9) using an improved UniDesign workflow.

      The design strategy consists of three sequential steps:

      (1) reducing positional bias at PAM position 3;

      (2) restoring DNA binding through nonspecific interactions;

      (3) combining individually favorable substitutions.

      The overall pipeline is conceptually elegant and logically structured, and the genome-editing activity of the designed variants is comprehensively characterized. The resulting KRH variant exhibits relaxed PAM specificity, expanding the targeting range of SaCas9 across diverse cell types. Notably, the KRH variant demonstrates performance comparable to that of the evolution-derived KKH variant, underscoring the effectiveness of the proposed computational design framework.

      Strengths:

      The design pipeline is entirely computational and does not rely on experimental data for pretraining or iterative optimization.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The computationally generated KRH mutant differs from the experimentally evolved KKH variant by only a single residue, which may reflect insufficient exploration of the available sequence space.

      We thank the reviewer for this insightful critique. In the present study, our strategy was not to allow UniDesign to freely explore all 27 mutable positions simultaneously, but rather to constrain the search to point mutations (e.g., double or triple mutants) within the full sequence space (approximately 20^27). Even with this constraint, UniDesign effectively samples a substantially large design space compared to traditional protein engineering approaches.

      Through iterative design, we observed that only certain residue types became enriched at a subset of positions when identifying effective double mutants. These enriched residues were then systematically combined to generate performance-enhancing triple mutants in an automated manner. Although we ultimately selected the KRH mutant for experimental validation due to its high similarity to the known KKH variant, UniDesign also proposed additional multi-mutants that are distinct from KKH.

      Reviewer #3 (Public review):

      Summary:

      This study reports KRH, a SaCas9 variant computationally engineered via UniDesign to recognize an expanded NNNRRT PAM with substantially enhanced editing efficiency at non-canonical sites. KRH achieves genome- and base-editing efficiencies comparable to or exceeding the evolution-derived KKH variant across multiple human cell types, demonstrating that computational design can effectively remodel PAM specificity while preserving nuclease activity.

      Strengths:

      The research follows a clear line of reasoning, and the results appear sound. The computational design strategy presented offers a valuable alternative to directed evolution, with potential applicability beyond Cas9 engineering.

      We thank the reviewer for the concise and accurate summary of our manuscript.

      Weaknesses:

      The benchmarking of the UniDesign method is insufficient. How its performance compares to other protein design algorithms, whether the energy function parameters were systematically optimized, and if the design strategy can be generalized to other Cas9 orthologs or genome engineering tasks.

      We thank the reviewer for this valuable critique. The present study builds upon our previous work on CRISPR–Cas PAM recognition modeling using UniDesign (PMID: 37078688), in which many of these concerns were systematically addressed. In that study, UniDesign was benchmarked against Rosetta, a well-established protein design platform, across eight Cas9 proteins and two Cas12 proteins, each recognizing distinct PAM sequences.

      Our results demonstrated that UniDesign effectively captures the mutual preferences between natural PAMs and native PAM-interacting amino acids (PIAAs) across these CRISPR–Cas systems. For example, UniDesign accurately predicted the canonical PAMs of SpCas9 and SaCas9 as NGG and NNGRRT, respectively; conversely, given their canonical PAMs, UniDesign successfully recapitulated the corresponding PIAAs in both systems.

      These findings provide the foundation for the present study and motivate our selection of SaCas9 as a representative system to explore PAM relaxation, thereby further demonstrating UniDesign’s predictive power through experimental validation. Although we did not perform analogous PAM relaxation designs for other Cas9 or Cas12 proteins in this work, we believe that the UniDesign framework is broadly generalizable and can be readily extended to these systems. We will incorporate additional discussion in the revised manuscript to address these points and clarify the broader applicability of our approach.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      While the results show some loss in the eyelid meibomian glands, there is significant gland retention in HSD3b6 KO mice, as shown in Figure 2. This is supported by the lack of DEG patterns showing downregulation of Meibum lipid genes (AWAT2, Far2, Soat1, Plin2, SCD, etc.), and no decrease in Pparg expression, known to be critical for meibomian gland lipid gene expression.

      Weaknesses:

      It should be noted that while the authors indicate that CD38 is significantly up-regulated in the HSD3b6 KO mouse, the increase was not sufficient to show a significant adjusted P-value. Bulk RNA sequencing also shows no significant change in meibum lipid gene expression for aged mice that are treated with 78c, an inhibitor of CD38, which the authors indicate increases NAD levels, leading to increased meibomian gland size compared to vehicle-treated mice. Unfortunately, there was no increase in meibum lipid gene expression with 78c, as identified by adjusted P-value. However, it should be noted that the supplemental file covering DEG expression was labeled as a Microarray analysis. This did not include the 78c+NMN treated mice, which the authors contend show a more impactful effect on the meibomian gland.

      We thank the reviewer for the careful evaluation and insightful comments regarding the interpretation of meibomian gland phenotypes and gene expression profiles.

      Regarding the point on the apparent retention of meibomian gland structure and the lack of downregulation of key lipid-related genes (e.g., Awat2, Far2, Soat1, Plin2, Scd, and Pparg), we agree that these observations are important for interpreting the extent of gland dysfunction. In the revised manuscript, we will more clearly present and discuss the RNA-seq data, including the expression profiles of representative meibomian gland lipid genes (and other DEGs), to better contextualize these findings.

      With respect to Cd38 expression, we acknowledge that the statistical significance based on adjusted P-values was limited in the current microarray dataset. To address this point, we will perform additional validation using targeted quantitative PCR with specific primers to more accurately assess Cd38 expression changes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors demonstrate strong correlations between a pro-inflammatory state, the activity of an intracrine hormone (3 beta-hydroxysteroid dehydrogenase, 3B-HSD), and the NAD co-factor. Specifically, in a 3B-HSD knockout mouse, there was an upregulation in pro-inflammatory cytokines and increased CD38+ cells (CD38 is an enzyme that depletes NAD, a necessary cofactor for 3B-HSD activity). Conversely, induction of inflammation in the eyelids resulted in reductions in 3B-HSD activity. Supplementation with 5 alpha-dihydrotestosterone (DHT) or the NAD precursor NMN, and inhibition of CD38 activity (78c), corrected the pathologies observed in both the 3B-HSD knockout mouse and the pro-inflammatory model (LPS injection into eyelids).

      Strengths:

      The experiments were performed with good rigor, assessing the impact of inflammation and 3B-HSD activity using multiple model systems. The endpoints represented a combination of transcriptional changes, protein quantification, enzymatic activity, and immunofluorescent microscopy. The authors use human tissue from both younger and older individuals to justify their hypotheses that increased CD38 + cells and reduced 3B-HSD quantity exist in older individuals. The data provide the foundation for assessing more global changes to the tear film and ocular surface.

      Weaknesses:

      The main weaknesses of the study include the following:

      (1) An absence of information on meibomian gland health, tear film, and ocular surface.

      (2) Too few human subjects to validate the hypotheses.

      Conclusion:

      Overall, this study demonstrates an important relationship that exists between intracrine signaling, inflammation, and cofactor signaling. It represents a novel approach in therapeutic design for patients with meibomian gland dysfunction.

      We thank the reviewer for the positive evaluation of our study and for recognizing the rigor of the experiments, the use of multiple model systems, and the potential of the data to provide a foundation for further investigation.

      Regarding the points raised under weaknesses, we agree that evaluation of meibomian gland function, tear film, and ocular surface phenotypes would provide important additional insight. In the present study, we focused primarily on the structural phenotype of the meibomian gland, particularly gland size, as a primary feature of MGD. We acknowledge that pathological assessments of gland function and ocular surface conditions have not been fully addressed. We will clearly state this limitation and expand the Discussion to position these aspects as important directions for future investigation.

      With respect to the limited number of human samples, we acknowledge that this is an important consideration for validating the translational relevance of our findings. We will revise the manuscript to more explicitly address this limitation and interpret the human data with appropriate caution.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate whether disruption of intracrine steroid hormone metabolism contributes to meibomian gland dysfunction and proposed a "vicious cycle" of gland dysfunction and inflammation, using a global Had3b6 knockout mouse model. The work addresses an important aspect of MGD, but its impact may be limited unless the intracrine mechanism can be more clearly distinguished from systemic hormonal effects.

      Strengths:

      This study addressed an important question. The hormonal regulation of the meibomian gland has long been recognized. If clarified, the concept of local steroid metabolism influencing gland homeostasis could have implications for understanding disease mechanisms and identifying therapeutic targets.

      Weaknesses:

      The use of a global knockout makes it difficult to separate local intracrine effects from systemic hormonal changes, and key controls and hormone measurements are lacking.

      LPS-induced inflammation may not reflect the chronic nature of MGD.

      We thank the reviewer for the thoughtful evaluation and for highlighting the importance of distinguishing intracrine mechanisms from systemic hormonal effects.

      We agree that, as currently presented, the use of a global Hsd3b6 knockout model makes it difficult to fully separate local intracrine effects from systemic hormonal changes. This point is also consistent with the major concern raised in the editorial assessment regarding the need to more clearly establish the proposed intracrine mechanism. To address this issue, we will strengthen the evidence for intracrine regulation by incorporating additional analyses. Specifically, we will assess systemic testosterone levels in Hsd3b6 knockout mice and include appropriate controls using orchidectomized (ORX) mice. These analyses will help to better distinguish local intracrine mechanisms from systemic hormonal influences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) As mentioned above, numerous studies have reported that the number of MuSCs declines with aging. The authors' claim is valid, as Pax7 and Vcam1 were widely used for these observations. However, age-related differences have also been reported even when using these markers (Porpiglia et al., Cell Stem Cell 2022; Liu et al., Cell Rep 2013). (a) When comparing geriatric Vcam1⁺ MuSCs with young MuSCs in this study, did the authors observe any of the previously reported differences? (b) Furthermore, would increasing the sample size in Figure 1 reveal a statistically significant difference? The lack of significance appears to result from variation within the young group. (c) In addition, this reviewer requests the presentation of data on MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure.

      (a) When comparing geriatric Vcam1<sup>+</sup> MuSCs with middle aged MuSCs, we found 1,428 DEGs, where 701 genes were downregulated and 727 genes were upregulated (Fig. S3E). Some of the pathways altered were similar to previously reported differences, such as alterations in the autophagy-lysosome related genes and PI3K-Akt Pathways. However, these alterations did not affect the functional integrity of geriatric Vcam1<sup>+</sup> MuSCs (Fig. 3 A-F). On the other hand, greater alterations were observed in geriatric Vcam1<sup>-</sup> MuSCs, accompanied by functional impairment. We have added further elaborations in the manuscript to reflect the comment from the reviewer (pg. 17, lines 369-379).

      (b) Thank you for this helpful comment. We understand the reviewer’s concern that the variability within the young group may contribute to the absence of statistical significance. We respectfully note that the variance observed in the young cohort could be biologically expected rather than technical noise. Multiple studies have shown that young adult MuSCs display great transcriptional and functional heterogeneity from undergoing post-natal myogenic maturation (e.g., Biressi et al., 2010; Tierney & Sacco, 2016; Motohashi & Asakura, 2014). This broader heterogeneity naturally increases variance in marker distribution within young samples. We would also like to clarify that our main conclusions are not solely based on differences in the overall proportion of YFP⁺ and Lin⁻ cells among age groups. Instead, we also rely on the functional and phenotypic heterogeneity that specifically emerges in geriatric MuSCs.

      Although the young group shows greater biological variation, the mean values are relatively similar among the groups. Multiple independent datasets in our study including functional performance and molecular profiles consistently show that the total MuSC frequency does not markedly decline with aging. For these reasons, even if the sample size is increased, we do not expect a change in the overall interpretation of this result. We have revised the Results section to acknowledge the variability observed in the young group and to emphasize that total MuSC frequency is not central to the conclusions of this study (pg. 6, lines 129-134).

      (c) MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure are in the figure legend of Fig. 5F (pg. 39, line 825-828).

      (2) Can the authors identify any unique characteristics of Pax7-VCAM-1 GERI-MuSCs using only the data generated in this study, without relying on public databases? For example, reduced expression of Vcam1 and Pax7. The results of such analyses should be presented.

      In Fig S2C, using the bulk-RNA sequencing data generated in this study, we observe reduced expression of both Pax7 and Vcam1 in Pax7-VCAM-1 GERI-MuSCs population. To better highlight this finding, we have added text in the Results section that explicitly describes the reduced Pax7 expression and Vcam1 loss as distinguishing features of Pax7-VCAM-1 GERI-MuSCs in our dataset (pg. 9, lines 199-200).

      (3) In the senolysis experiment, the authors state that GER1-MuSCs were depleted. However, no data are provided to support this conclusion. Quantitative cell count data would directly address this concern. In addition, the FACS profile corresponding to Figure 4D should be included.

      In Figure 4D we quantified the frequency of VCAM1 Low YFP positive Lin negative MuSCs after senolysis treatment. This analysis shows a clear trend toward a decrease in the GERI subpopulation, although the difference did not reach conventional statistical significance in this experiment (t test p = 0.0596). We have therefore revised the text to describe this as a reduction trend rather than complete depletion, and we now explicitly report the p value in the results section (pg. 12, line 270-272). Furthermore, representative FACS profiles for Figure 4D is now included with the quantification (pg. 38, line 811-814).

      (4) Figure S4: It remains unclear whether DHT enhances regenerative ability through restoration of the VCAM1 expression in GER1-MuSCs, as DHT also acts on non-MuSC populations. Analyses of the regenerative ability of Senolysis+DHT mice may help to clarify this issue.

      We thank the reviewer for this important insight. We agree that DHT can act on non-stem cell populations in the muscle environment and therefore we cannot conclusively attribute the improved regenerative performance solely to restoration of VCAM1 expression in GERI-MuSCs. To address this concern, we have revised the discussion to explicitly state this limitation and to clarify that DHT may influence multiple cell types that contribute to muscle regeneration. We also indicate that combined senolysis plus DHT treatment would be an informative future approach, although additional animal experiments were not feasible within the scope of the current study (pg. 18, line 382-390).

      (5) Why are there so many myonuclear transcripts detected in the single-cell RNA-seq data? Was this dataset actually generated using single-nucleus RNA-seq? This reviewer considers it inappropriate to directly compare scRNA-seq and snRNA-seq results.

      Regarding the question of why many myonuclear transcripts were detected and whether this dataset was generated using single nucleus RNA sequencing, we confirm that the experiments were performed using single cell RNA sequencing. The presence of myonuclear transcripts likely reflects partial nuclear leakage or fragmentation during the enzymatic dissociation of aged muscle tissue. This is a known technical issue when preparing single cell suspensions from adult or geriatric skeletal muscle.

      To avoid inappropriate interpretation, we identified the myonuclear transcript enriched cluster and excluded it from all downstream analyses that involve MuSC comparison. Therefore, our major conclusions do not rely on this cluster. We have revised the Results text to clearly state that the dataset was generated using single cell RNA sequencing and to explain how myonuclear transcript-positive cells were handled (pg. 8, lines 176-181).

      Reviewer #2 (Public review):

      In this study, Kim et al. explore the heterogeneity within the aged MuSC population using a mouse model that enables lineage tracing of MuSCs throughout life. The questions addressed in the manuscript are highly relevant to the fields of aging and stem cell biology, and the experimental approach overcomes limitations of earlier studies. However, some of the claims would benefit from additional data analysis, and the central claim of the identification of a "previously unrecognized subpopulation" of aged MuSCs should be evaluated in light of prior work that has also examined MuSC heterogeneity in aging.

      Specific points:

      (1) As a general comment that is transversal to multiple figures, several experiments should include a direct comparison to a young cohort. Previous studies have shown that the depletion of subpopulations with aging is observed early in the aging process, for example, the loss of Pax7-high MuSCs is observed already in 18‐month‐old mice (Li, 2019, doi: 10.15252/embj.2019102154). Using only mice at 12-14 months as the control group is therefore insufficient to claim that no changes occur with aging.

      We thank the reviewer’s suggestion for comparing the aged mice to a young cohort and we acknowledge that previous studies have observed depletion of subpopulations is observed early in the aging process. However, this study is specifically designed to delineate the transition from middle aged to geriatric stages, rather than to characterize differences that are already well established in young versus geriatric comparisons. Previous studies have extensively documented the decline in MuSC function between young and aged animals, whereas the process and timing by which these changes emerge remain unclear. Our results show that major alterations in MuSC phenotype and identity are detected predominantly in the geriatric stage rather than at the middle aged stage. To avoid any misunderstanding, we have revised the text to clearly state that the primary objective of this work is to define the critical shift that occurs from middle aged to geriatric muscle stem cells (page 3-4, line 67-71).

      (2) One of the central claims of the manuscript is a challenge to the notion that MuSCs number declines with age. However, the data analysis associated with the quantification of YFP+ cells needs to be expanded to support this conclusion. The authors present YFP+ cells only as a proportion of Lin-neg cells. Since FAP numbers are known to decrease with aging, a stable proportion of YFP+ cells would simply indicate that MuSCs decline at the same rate as FAPs. To more accurately assess changes in MuSC abundance, the authors should report absolute numbers of YFP+ cells normalized to tissue mass (cells/ mg of muscle).

      We thank the reviewer for this helpful suggestion. We agree that a proportion based analysis alone does not fully exclude the possibility that MuSCs and FAPs decrease at similar rates during aging. At the time of isolation, muscle mass was not recorded, so we are unable to report YFP<sup>+</sup> cell numbers normalized to tissue weight as requested. To partially address this limitation, we have now clarified our gating strategy in the methods and Figure 1 to explicitly indicate Sca1<sup>+</sup> FAP exclusion (pg. 6, line 121-122, pg. 22, lines 460-463). These analyses do not support a major selective loss of MuSCs relative to other mesenchymal populations with aging.

      (3) The authors emphasize that several studies use VCAM1 as a surface marker to identify MuSCs. However, many other groups rely on α7-integrin, and according to Figure 1D, the decline in ITGA7 expression within the YFP+ population is not significant. Therefore, the suggestion that MuSC numbers have been misquantified with aging would apply only to a subset of studies. If the authors can demonstrate that YFP+ cell numbers (normalized per milligram of tissue) remain unchanged in geriatric mice, the discussion should directly address the discrepancies with studies that quantify MuSCs using the Lin−/α7-integrin+ strategy.

      We thank the reviewer for this important comment. We agree that VCAM1 is only one of several commonly used surface markers for MuSC identification and that many studies quantify MuSCs using the Lin negative and ITGA7 positive strategy. That is why in our study, in addition to VCAM1, we also examined ITGA7 expression within the YFP positive population. Although the mean ITGA7 level did not significantly decline, the variance among geriatric MuSCs was significantly increased based on the F test. This supports the idea that aging does not uniformly reduce marker expression but instead increases phenotypic instability, which could lead to under detection of a subset of MuSCs even when ITGA7 is used as the primary marker. We have added this interpretation to the Discussion (pg. 16, lines 346-355).

      (4) The authors focus their attention on a population of VCAM-low/VCAM-neg subpopulation of MuSCs that is enriched in aging. However, the functional properties of this same population in middle-aged (or young) mice are not addressed. Thus, it remains unclear whether geriatric VCAM-low/VCAM-neg MuSCs lose regenerative potential or whether this subpopulation inherently possesses low regenerative capacity and simply expands during aging.

      We thank the reviewer for this comment. In young and middle aged mice, the VCAM low or VCAM negative population is extremely small, nearly absent in most samples. The emergence and expansion of this population is therefore a feature that becomes detectable only at the geriatric stage. Given that these cells are not present in appreciable numbers earlier in life, the reduced regenerative performance observed in geriatric VCAM1<sup>low</sup> MuSCs likely reflects a phenotype that arises during aging rather than an inherent property of a pre-existing subpopulation. We have added this clarification to the Results section (pg. 7, lines 142-146).

      (5) According to Figure 1F, the majority of MuSCs appear to fall within the category of VCAM-low or VCAM-neg (over 80% by visual estimate). It would be important to have an exact quantification of these data. As a result, the assays testing the proliferative and regenerative capacity of VCAM-low/negative cells are effectively assessing the performance of more than 80% of geriatric MuSCs, which unsurprisingly show reduced efficiency. Perhaps more interesting is the fact that a population of VCAM-high geriatric MuSCs retains full regenerative potential. However, the existence of MuSCs that preserve regenerative potential into old age has been reported in other studies (Garcia-Prat, 2020, doi: 10.1038/s41556-020-00593-7; Li, 2019, doi: 10.15252/embj.2019102154). At this point, the central question is whether the authors are describing the same aging-resistant subpopulations of MuSCs using a new marker (VCAM) or whether this study truly identifies a new subpopulation of MuSCs. The authors should directly compare the YFP+VCAM+ aged cells with other subpopulations that maintain regenerative potential in aging.

      We thank the reviewer for this comment. First, in response to the request for precise quantification, we now provide the proportions of VCAM1-high and VCAM1-low/negative MuSCs in each age group in the figure legends for Fig.1F (pg. 34-35, lines 765-772). In geriatric mice, VCAM1 low/negative MuSCs represent approximately 44.6% ± 35.7%, whereas VCAM-high MuSCs represent 3.9% ± 1.8%. The substantial variability reflects mouse-to-mouse heterogeneity at very advanced ages.

      Importantly, our conclusions do not rely solely on the observation that a large fraction of geriatric MuSCs exhibit reduced regenerative potential. Rather, the VCAM-low state represents a transcriptionally and functionally distinct subpopulation that emerges specifically in the geriatric stage, and exhibits molecular signatures not present in young or mid-aged MuSCs. We have expanded the Results and Discussion to clarify this point.

      Regarding whether VCAM-high geriatric MuSCs correspond to previously reported “aging-resistant” MuSCs (e.g., Garcia-Prat 2020; Li 2019), we agree that there may be conceptual overlap, as both populations retain regenerative activity. However, those studies identified resilient MuSCs based on mitochondrial or Pax7-high properties, whereas our classification is based on surface VCAM1 intensity, and we currently lack direct evidence that these populations are equivalent. We have therefore added a statement acknowledging this possibility while clarifying that our work does not claim that VCAM1-high MuSCs represent a newly discovered resilient subset, but instead focuses on the emergence and characterization of the VCAM-low dysfunctional subpopulation (pg. 16, lines 346-355).

      (6) In Figure 3F, it is unclear from the data presentation and figure legend whether the authors are considering the average of fiber sizes in each mouse as a replicate (with three data points per condition), or applied statistical analysis directly to all individual fiber measurements. The very low p-values with n=3 are surprising. It is important to account for the fact that observations from the same mouse are correlated (shared microenvironment, mouse-specific effects) and therefore cannot be considered independent.

      We thank the reviewer for raising this important statistical point. We fully agree that individual myofibers from the same mouse are not independent biological replicates. In morphometric analyses of regenerated muscle, however, it is standard practice to analyze the full CSA distribution across all regenerated fibers, as the distribution itself (rather than a per-mouse mean) provides the biologically relevant measure of regeneration quality.

      The original analysis therefore treated each regenerated fiber as a component of the overall CSA distribution, not as an independent biological replicate, and the statistical comparison was performed at the level of distributions rather than per-mouse replication. We agree that per-mouse averaged CSA values would also be informative, but the raw data were not archived in a format that allows reconstruction of mouse-specific fiber subsets.

      Importantly, the group-level CSA distribution differences are robust and remain clearly detectable regardless of statistical approach. We have added clarification in the figure legend to explicitly describe how CSA measurements were obtained and analyzed mouse (pg. 36, lines 796-800).

      (7) Regarding Figure 5, it is unclear why ITGA7, a classical surface marker for MuSCs that appears unchanged in aged YFP+ MuSCs (Fig. 1F), is considered inadequate for detecting and isolating GERI-MuSCs.

      We thank the reviewer for raising this point. As shown in Figure 1F, the mean ITGA7 expression level does not significantly decline in geriatric YFP positive MuSCs. However, the variance of ITGA7 expression is significantly increased in geriatric MuSCs based on the F test, indicating instability in surface marker expression. This suggests that a fraction of MuSCs may fall below the conventional gating threshold for ITGA7 during aging. Therefore, ITGA7 remains effective for identifying a large portion of MuSCs but may under detect the subset of geriatric MuSCs with reduced marker expression. We have revised the Discussion to clarify this point (pg. 16, lines 346-355).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 3B: In the colony formation assay, the authors should specify the number of biological replicates and the number of cells analyzed per mouse.

      We have now added the number of biological replicates and the number of cells analyzed per mouse in the figure legend of Figure 3B (pg. 37, lines 790-791).

      (2) Figure 3F: The replication number is indicated as n = 3, which appears to refer to the number of transplanted mice. How many myofibers were analyzed in each transplanted mouse? The authors should provide a more detailed description of the methodology in the Figure legend or M&M.

      We thank the reviewer for the question and clarify that n = 3 refers to three independent transplanted mice per group. For each mouse, the entire TA muscle was cryosectioned and immunostained, and all regenerated fibers containing centrally located nuclei were included in the CSA quantification. We have added clarification in the Figure legend to indicate that quantification was performed on all regenerated fibers from each mouse (pg. 37, lines 796-800).

      (3) Figure 4: The RNA-seq results are presented as a single dataset per sample. If multiple experiments were performed, individual datasets should be shown. Replicated analyses are essential to ensure the reliability of the findings.

      In response to the reviewer comment, we confirm that the RNA sequencing in Figure 4 was performed with 3-4 independent biological replicates for each condition. These replicates showed very consistent sequencing quality and gene expression profiles and were therefore combined for the differential expression analysis. We have revised the materials and methods to clearly describe the number of biological replicates and the analysis workflow. (pg. 25, lines 543).

      (4) Line 148: If the authors examined MyoG expression, it should be described as committed myoblasts.

      We have now changed the term from myoblasts to committed myoblasts (pg. 8, line 168).

      (5) Typo and Referencing Errors:

      (a) Line 244: The term 'Antide' appears to be a typo.

      We thank the reviewer for noting this point. ‘Antide’ is not a typo but the correct name of a GnRH antagonist (Antide acetate). To avoid confusion, we have revised the text to specify ‘Antide, a GnRH antagonist’ at its first mention (pg. 13, line 289).

      (b) Lines 278, 280: Please correct Figure 5H to Figure 5F.

      We apologize for this error. We have fixed the figure notations accordingly (pg. 15, lines 326-330).

      (c) Some references are incomplete or inappropriate (ex. line 49, line 71, line 86, line 109).

      We apologize for this error. We have fixed the references accordingly (pg. 4, line 94, pg.6, line 117).

      (d) Line 49: Skeletal muscle regeneration is orchestrated primarily by tissue resident stem cells, known as muscle stem cells (MuSCs) or satellite cells (Relaix et al., 2021). The following paper should be cited:

      Satellite cell of skeletal muscle fibers.

      MAURO A. J Biophys Biochem Cytol. 1961 Feb;9(2):493-5.

      The reference has been revised (pg. 3, line 49).

      (e) Line 109: Paired box protein 7 (Pax7) is a transcription factor widely recognized as a defining marker of MuSCs (Sambasivan et al., 2011). The following paper should be cited:

      Pax7 is required for the specification of myogenic satellite cells.

      Seale P, Sabourin LA, Girgis-Gabardo A, Mansouri A, Gruss P, Rudnicki MA. Cell. 2000 Sep 15;102(6):777-86.

      The reference has been revised (pg.6, line 117).

      (6) Lines 73-74: Many rejuvenation studies define 'aged' mice as 12 to 24 months old. This reviewer is not aware of any studies that have examined 12-month-old MuSCs as a model of aging.

      We apologize for this error. We have fixed the numbers to 18 months accordingly (pg. 4, line 94).

      Reviewer #3 (Recommendations for the authors):

      (1) Geriatric versus aged mice in the MuSC subpopulation analysis. The authors use geriatric mice (>28 months) to demonstrate the loss of VCam expression in MuSCs and propose that this accounts for previous reports of decreased MuSC numbers in aged contexts. However, as noted in their introduction, most reports use "aged" mice, which are typically around 24 months old, which is biologically distinct from the geriatric stage. This distinction makes it difficult to conclude that the reported decline in MuSC numbers in aged mice can be explained by the phenomenon observed only in geriatric mice (Line 289). The authors should test whether VCam expression is altered in aged (24-month-old) mice to strengthen this argument.

      We appreciate the reviewer’s thoughtful comment and agree that 24 month old mice are commonly used as an aged reference in the literature. However, prior studies using 18 to 24 month old animals have reported inconsistent results regarding whether and to what extent MuSCs decline during this period. To avoid ambiguity from intermediate aging stages, we purposefully selected geriatric mice older than 28 months, a condition under which MuSC depletion has been more consistently reported in previous studies. Notably, our data show that even at this stage MuSC abundance is not dramatically reduced, which makes it unlikely that a robust decline would already be present at 24 months. We have clarified this rationale in the revised text. Although investigating the precise timing of the emergence of these changes at earlier time points is an important future direction, it is beyond the scope of the present study.

      (2) Variability and bimodal distributions.

      Figure 1b: The decline in VCAM+ MuSCs in geriatric mice shows high variability - 3 of 7 replicates align more closely with young/mid-aged levels. Please clarify this variability.

      We thank the reviewer for pointing out the variability. We agree that there is heterogeneity in the extent of VCAM1 reduction across geriatric mice. This variability likely reflects animal-to-animal differences in the onset and progression of aging-related phenotypes, which are known to vary at very advanced ages. Importantly, despite this variability, all geriatric samples contain a detectable VCAM1 low population that is not observed in young or middle-aged mice, and the overall trend is consistent across all replicates. We have clarified this in the revised manuscript (pg. 6, lines 125-127).

      Figure 1c: While the Mid and Geriatric groups are tightly clustered, the Young group appears bimodal, which challenges the claim (Line 118) that values are "comparable across ages." Since all males were used and it is not sex related, what is driving this bimodal distribution?

      We appreciate the reviewer’s observation regarding the variability in the young group. Muscle stem cells in young adult mice are known to encompass diverse transcriptional and functional substates, which contribute to greater biological heterogeneity at this stage (Biressi et al. 2010; Tierney & Sacco 2016; Motohashi & Asakura 2014). As aging progresses, these substates gradually converge toward a common functional phenotype, resulting in more uniform profiles in middle-aged and geriatric mice. Therefore the bimodal appearance in the young group likely reflects the broader developmental heterogeneity of early adult MuSCs rather than a technical discrepancy. We have added this explanation to the revised in the results section (pg.6. lines 129-134).

      Figure 4D: Geriatric replicates also display a trimodal distribution. This should be addressed throughout - what is causing these types of distribution, and how does this impact significance tests and conclusions?

      We appreciate the reviewer’s observation regarding the multimodal distribution. We interpret this pattern as reflecting increased individual variability that becomes more pronounced at the geriatric stage. Even though aging affects all mice, the extent and timing of age-related phenotypic changes can vary considerably across individuals at very advanced ages. This leads to broader divergence in VCAM1 expression states among geriatric mice. Therefore, when we look at the correlation between VCAM1 High and VCAM1 Low/- population, there exists a significant negative correlation between the two populations (Fig. S3F). We have clarified this interpretation in the text and note that the statistical analysis was performed using the mouse as the biological replicate, so this variability does not alter the overall conclusion (pg.12-13, lines 270-278).

      (3) The fate of the Vcam-low/negative cells should be better assessed. For example, Line 180: Colony formation is low/absent in VCAM-low/- cells. Are these cells still viable? Cell death assays are needed. Is expansion capacity truly impaired, or are the cells simply non-viable? Using gene expression as the only means (Line 300) to suggest not dying is insufficient.

      We thank the reviewer for this important point. As per the reviewer's analysis, there is lack of direct evidence to show that these cells are viable and apoptosis or viability assay would further strengthen our research. However, we carefully suggest that they are viable from the fact that these cells can be isolated by FACS and generate high quality RNA sequencing libraries, which would not be possible if they were undergoing cell death. Moreover, the transcriptomic data indicate upregulation of stress response and senescence associated pathways rather than apoptotic or necrotic signatures. These findings suggest that VCAM low or negative cells are alive but exhibit reduced proliferative and regenerative capacity. We have revised the text to clarify that our data reflect impaired function rather than loss of viability and that apoptosis assays represent a direction for future investigation (pg. 16, 360-366).

      (4) Transplant assays are suggestive, but could use additional characterization. Lines 191 & Figure 3E-F: While representative images match quantification, areas at the edge of VCAM-low/- TAs show signs of regeneration. Please include lower-magnification images. Additionally, assess early post-transplant engraftment efficiency - do certain populations experience a higher loss rate (cell death)? YFP-tracing would also help confirm the donor contribution to fibers.

      While we did not collect additional early time-point samples for new engraftment analyses, we carefully re-examined all available transplantation data, including the distribution and density of YFP<sup>+</sup> donor-derived cells in early post-injury sections. We did not observe patterns suggestive of differential early cell loss between VCAM-high and VCAM-low groups. Thus, although we cannot formally quantify early engraftment efficiency, the existing evidence does not support a model in which differential donor-cell retention accounts for the observed regenerative differences.

      Also, we attempted direct YFP co-staining of regenerated myofibers, but as reported by several groups, YFP signal within mature or regenerating myofibers is often diminished or inconsistent after fixation and permeabilization, making reliable fiber-level YFP detection technically challenging in our system. Therefore, instead, we confirmed donor contribution using PBS-injected control muscles, which lack donor MuSCs, and showed that PBS-injected muscles never generated YFP<sup>+</sup> fibers. This demonstrates that endogenous MuSCs do not contribute to YFP⁺ myofibers in our model, and therefore indirectly supports our suggestion that any YFP⁺-regenerated fiber necessarily originates from transplanted donor cells. We hope the reviewer understands the technical limitations.

      (5) Figure S3D: mRNA profiling suggests Mid-aged MuSCs are more distinct from Geriatric Vcam-hi than expected. This should be addressed or at least elaborated on in text.

      We appreciate this insightful comment. We agree that mid aged VCAM high MuSCs show detectable transcriptional differences from geriatric VCAM high cells. This pattern likely reflects the fact that some aging related molecular changes begin to accumulate gradually during the middle aged stage even before overt functional decline or VCAM1 loss becomes evident. Importantly, however, these transcriptomic shifts do not lead to the emergence of the VCAM low dysfunctional phenotype that is uniquely present in geriatric muscle. We have added clarification to the text noting that molecular alterations arise progressively while the major phenotypic transition in VCAM1 expression and regenerative impairment occurs at the geriatric stage (pg.11, 238-244).

      (6) The conclusion of senescence needs more support. Lines 218-226: p16 is elevated in VCAM-low/- cells, but drawing conclusions on senescence from 1-2 markers (mRNA) is insufficient. DQ Treatment: It's unclear how DQ alters cell composition in the absence of clear senescence markers (besides p16). Since DQ targets BCL-2/anti-apoptotic pathways, analyzing these signaling cascades is necessary. Line 255: The term "terminally senescent" is contradictory. These may be pre-senescent. It's also surprising DQ would target such cells, and further clarification is needed. Lines 307-313: Proposing a revised definition of senescence is premature. These cells may be pre-senescent, and multiple ways to senescence exist (replicative, stress-induced, etc.). Please clarify.

      We agree with the reviewer that the term 'terminally senescent' may be premature and potentially contradictory. Although p16 is elevated in this population, we acknowledge that one or two mRNA markers are insufficient to establish bona fide senescence, and that multiple senescence programs exist, including replicative, stress-induced, and mitochondrial-associated pathways. We have revised this to 'senescent-like' throughout the manuscript to better reflect the complexity of this state. Also, although beyond the scope of this study, we now emphasize that future studies incorporating additional senescence markers, functional assays, and lineage tracing will be required to determine the precise senescence status of VCAM-low MuSCs (pg.17-18, lines 381-392).

      Regarding DQ treatment, we agree that DQ is not selective for senescent cells, as it targets BCL-2–related survival pathways. The reduction of VCAM-low cells after DQ treatment therefore indicates increased dependence on survival signaling in this population rather than providing direct evidence of senescence. We have revised the text to clarify this interpretation (pg.12-13, lines 270-278).

      (7) Figure 5C: The Pax7+ cells appear interstitial rather than sublaminar. This raises questions about the specificity of staining. Providing lower-magnification images with these as insets may help.

      We thank the reviewer for this helpful comment. We agree that the high-magnification image in Figure 5C may give the impression that Pax7<sup>+</sup> cells are interstitial due to the limited field of view. We regret to inform the reviewer that low-magnification images for this sample are not available as these images were obtained via confocal imaging where we only recorded areas of interest. Therefore, we are unable to provide an additional panel at this time and we hope the reviewer understand.

      (8) CD63 and CD200 expression on Pax7-YFP traced cells. Figure 5: YFP-traced geriatric MuSCs co-stained for CD63 and CD200 are essential. Current data only show expression in Young traced cells. It's crucial to confirm whether protein/surface expression persists in geriatric YFP+ (traced) cells. The current Figure 5 F does not appear to include YFP tracing for geriatrics.

      We thank the reviewer for highlighting the importance of confirming CD63 and CD200 expression specifically in Pax7-YFP traced MuSCs from geriatric muscle. The datasets shown in Figure 5F were generated from wild-type C57BL/6 mice using a standard MuSC gating strategy rather than Pax7-YFP animals. All geriatric Pax7-YFP mice available for this study were exhausted during earlier experiments, and additional tissue is not available for new co-staining or FACS analyses. We now state this technical limitation in the manuscript and clarify that the geriatric CD63/CD200 data were obtained from conventionally isolated MuSC populations rather than YFP-traced cells (pg.18-19, lines 407-416).

      Minor points:

      (1) Please show the outliers in addition to the concentric circles. Figures 1B, C, and F are examples, but this should be addressed throughout.

      Outliers have been added where applicable.

      (2) Figure 2C: Was a significance test performed between the 5 dpi and "geri" fractions?

      We thank the reviewer for this important point. We have now performed the requested statistical comparison between the 5 dpi fraction and the geriatric VCAM1-defined subpopulations using the same analysis framework applied in Figure 2 (Kruskal–Wallis test followed by Dunn’s multiple comparisons).

      While 5 dpi MuSCs differed significantly from young MuSCs (adjusted p = 0.0139), the comparisons between 5 dpi and each geriatric subgroup (VCAM-high, -mid, and -low) did not reach statistical significance after correction for multiple testing (adjusted p = 0.17, 0.15, and 0.17, respectively). These results have been added to the revised Figure 2C corresponding figure legend (pg. 36, lines 777-780).

      Importantly, we now clarify in the text that although 5 dpi muscles display a prominent increase in VCAM1-high cells at the population level, this increase does not statistically exceed the variability observed within geriatric subpopulations under the conservative non-parametric testing framework used.

      (3) Line 155: The phrase "Surprisingly, all clusters mapped to quiescent clusters" is misleading; this is expected given the population type.

      We thank the reviewer for this helpful comment. We have revised the sentence to remove the misleading wording and now describe the observation more accurately (pg. 8 lines 180-181).

      (4) Line 211: The figure notation should be corrected from Figure S4E to Figure S3E.

      We apologize for this error. We have fixed the figure notation for Figure S4E to S3E (pg. 11, line 247).

      (5) Line 216: "All of which" seems overstated. Many populations share similar profiles with minor differences.

      We appreciate the reviewer’s comment. We agree that the phrase “all of which” overstated the degree of divergence among clusters. We have revised the wording to more accurately reflect the data (pg. 11-12, lines 252-253).

      (6) Line 270: The notations for panels D, E, and F need to be updated to match the figure. Panel "H" is not indicated in Figure 5.

      We apologize for this error. We have fixed the figure notations accordingly (pg. 15, lines 326-336).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Xu et al. reported base-resolution mapping of RNA pseudouridylation in five bacterial species, utilizing recently developed BID-seq. They detected pseudouridine (Ψ) in bacterial rRNA, tRNA, and mRNA, and found growth phase-dependent Ψ changes in tRNA and mRNA. They then focused on mRNA and conducted a comparative analysis of Ψ profiles across different bacterial species. Finally, they developed a deep learning model to predict Ψ sites based on RNA sequence and structure.

      This is the first comprehensive Ψ map across multiple bacterial species, and systematically reveals Ψ profiles in rRNA, tRNA, and mRNA under exponential and stationary growth conditions. It provides a valuable resource for future functional studies of Ψ in bacteria.

      We thank Reviewer 1 for the supportive and positive comments, particularly for highlighting the novelty and value of our comprehensive pseudouridine landscapes across multiple bacterial species as a valuable resource for the scientific community.

      Ψ is highly abundant on non-coding RNA such as rRNA and tRNA, while its level on mRNA is very low. The manuscript focuses primarily on mRNA, which raises questions about the data quality and the rigor of the analysis. Many conclusions in the manuscript are speculative, based solely on the sequencing data but not supported by additional experiments.

      We appreciate the insightful comments of Reviewer 1. We fully agree that Ψ is highly abundant on rRNA and tRNA, while its fractions on mRNA are generally lower. Ψ is highly conserved at specific positions in rRNA and tRNA, such as Ψ within tRNA T‑arm (position 55), where it plays essential roles in tRNA structural folding, tRNA stability, and mRNA translation, across plants, mammals, and bacteria[1–3]. However, most Ψ sites in mRNA exhibit lower fractions compared to rRNA and tRNA. This phenomenon is also widely observed in HeLa cell mRNA and plant mRNA, as evidenced by bisulfite-induced deletion sequencing and 2-bromoacrylamide-assisted cyclization sequencing[3–5]. In bacteria, the modifications on mRNA are harder to map and quantify, due to its low abundance in total RNA and difficulty in bacterial rRNA removal. This highlights the significance of our study.

      To prove our data quality and analytical rigor, we first present the most convincing sites in bacteria, as benchmark sites. Specifically, we detected 9 out of 10 known conserved pseudouridine (Ψ) sites in E. coli across two biological replicates [6], displaying notable modification fraction. Ψ516 site in E. coli 16S rRNA, which serves as a benchmark site, consistently exhibited a high modification fraction (~100%) under multiple growth conditions, underscoring the robustness of our method. In other strains, we also observed conserved 16S rRNA Ψ sites.

      To further demonstrate strong reproducibility and sensitivity. We selected three positive Ψ sites from two independent biological replicates for experimental validation, alongside one negative control site, using pseU‑TRACE method[6]. Ct values were first normalized to the corresponding Ct value of the negative control site, and the treated samples were then further normalized to their corresponding input controls (new Supplementary Fig. 2e).

      Four Ψ sites were tested with pseU‑TRACE: Ψ site at position 944 on 23S rRNA, a negative control site located within guaA gene, a Ψ site within clpV1 gene, and an intergenic Ψ site located between guaA and guaB genes. We successfully validated these Ψ sites in P. aeruginosa. The detailed pseU‑TRACE experimental procedures and corresponding data figures have been added to the revised manuscript, in either Results or Methods sections (Line 171-175, 594–617).

      Previous transcriptome-wide mapping of Ψ have primarily relied on CMC-based methods to induce RT truncation signatures at the modified sites, exhibiting a limited Ψ detection sensitivity caused by low labeling efficiency[5]. In contrast, BID-seq method used in this study provides substantially higher sensitivity of Ψ detection, particularly the low-stoichiometry Ψ sites within mRNA. The high reliability and quantitative performance of BID-seq have been extensively validated in prior work using mammalian cells and synthetic Ψ-containing oligonucleotides[4].

      To further ensure robustness and minimize false positives—when identifying low-level mRNA Ψ sites through bioinformatic analysis—we have applied stringent and uniform filtration criteria to all candidate sites on mRNA (new Supplementary Table 1):

      (1) Total sequencing coverage >20 reads in both ‘Treated’ (BID-seq; Σd<sub>t</sub> > 20) and ‘Input’ libraries (Σd<sub>i</sub> > 20);

      (2) An average deletion count >5 in ‘Treated’ libraries;

      (3) An average modification fraction >0.02 (2%) in ‘Treated’ libraries;

      (4) A deletion ratio in ‘Treated’ libraries at least two-fold higher than that in ‘Input’ libraries.

      Sites with a Ψ stoichiometry >0.5 (50%) were classified as highly modified. These filtration criteria have now been explicitly described in Methods section (Lines 739–745). We strictly adhered to these Ψ site identification standards, leading to all subsequent analysis and functional studies.

      Finally, to address concerns regarding reproducibility, we calculated mRNA Ψ site overlap and correlation of Ψ fractions, between two biological replicates, which has been presented in (new Supplementary Fig. 2a,d).

      Overall, we have revised the manuscript to clarify these methodological strengths, and validate mRNA Ψ detection. We also tone down all speculative conclusions, with more clear linkage to the actual sequencing data, which await future functional validation.

      Reviewer #2 (Public review):

      Summary:

      In this study, Xu et al. present a transcriptome-wide, single-base resolution map of RNA pseudouridine modifications across evolutionarily diverse bacterial species using an adapted form of BID-Seq. By optimizing the method for bacterial RNA, the authors successfully mapped modifications in rRNA, tRNA, and, importantly, mRNA across both exponential and stationary growth phases. They uncover evolutionarily conserved Ψ motifs, dynamic Ψ regulation tied to bacterial growth state, and propose functional links between pseudouridylation and bacterial transcript stability, translation, and RNA-protein interactions. To extend these findings, they develop a deep learning model that predicts pseudouridine sites from local sequence and structural features.

      Strengths:

      The authors provide a valuable resource: a comprehensive Ψ atlas for bacterial systems, spanning hundreds of mRNAs and multiple species. The work addresses a gap in the field - our limited understanding of bacterial epitranscriptomics, by establishing both the method and datasets for exploring post-transcriptional modifications.

      We thank Reviewer 2 for the supportive and positive comments. We appreciate the reviewer’s recognition of the novelty and value of our work in providing a comprehensive pseudouridine atlas across multiple bacterial species.

      Weaknesses:

      The main limitation of the study is that most functional claims (i.e., translation efficiency, mRNA stability, and RNA-binding protein interactions) are based on correlative evidence. While suggestive, these inferences would be significantly strengthened by targeted perturbation of specific Ψ synthases or direct biochemical validation of proposed RNA-protein interactions (e.g., with Hfq).

      We thank Reviewer 2 for the constructive feedback. We fully agree that our functional claims regarding translation efficiency, mRNA stability, and RNA-binding protein interactions rely primarily on correlative evidence from existing datasets rather than a direct experimental validation. We agree that the perturbation of specific pseudouridine synthases and direct biochemical validation of proposed RNA-protein interactions (for instance, Hfq) would substantially strengthen the conclusions on bacterial Ψ function. In Discussion section, we have added a discussion on this limitation of our current study (Line 517–523). Considering the scope of our current work, we anticipate such validation experiments in future research.

      Additionally, the GNN prediction model is a notable advance, but methodological details are insufficient to reproduce or assess its robustness.

      In response to methodological concerns regarding our pseU_GNN prediction model, we have undertaken substantial improvements to address these issues comprehensively. We have updated the complete codebase on GitHub (https://github.com/Dylan-LT/pseU_NN.git) with comprehensive documentation and a user-friendly prediction tool specifically designed for Ψ site prediction across the four bacterial species examined in this study.

      We further systematically evaluated multiple neural network architectures and implemented critical architectural refinements. Specifically, we incorporated bidirectional LSTM (bid-LSTM) layers upstream of the transformer block to more effectively capture sequential dependencies and contextual information in RNA sequences. This enhanced architecture demonstrates substantially improved predictive performance, achieving an AUC-ROC of 0.89 on independent test datasets using 41-nucleotide input sequences (new Figure 6).

      We have revised Figure 6 and Supplementary Fig. 7, along with their corresponding content and figure legends (Lines 428-430, 434–436, 440-447, 1065-1073), to reflect these architectural improvements and performance enhancements. We have detailed the methods part (Lines 679–708), including model architecture, validation methods and evaluation score calculation. Additionally, we have provided detailed documentation of the evaluation score calculation methodology to ensure reproducibility and transparency.

      Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate pseudouridylation across various RNA species in multiple bacterial strains using an optimized BID-seq approach. It examined both conserved and divergent modification patterns, the potential functional roles of pseudouridylation, and its dynamic regulation across different growth conditions.

      Strengths:

      The authors optimized the BID-seq method and applied this important technique to bacterial systems, identifying multiple pseudouridylation sites across different species. They investigated the distribution of these modifications, associated sequence motifs, their dynamics across growth phases, and potential functional roles. These data are of great interest to researchers focused on understanding the significance of RNA modifications, particularly mRNA modifications, in bacteria.

      We thank Reviewer 3 for the supportive and positive assessment. We are particularly grateful for the reviewer’s acknowledgment of the value of our analyses on modification distribution, sequence motifs, growth‑phase dynamics, and potential functional roles, which we hope will be of broad interest to researchers studying bacterial RNA modifications, particularly mRNA Ψ.

      Weaknesses:

      (1) The reliability of BID-seq data is questionable due to a lack of experimental validations.

      We thank Reviewer 3 for the constructive feedback. We have undertaken comprehensive revisions to address the concerns regarding manuscript structure and information organization. We have incorporated pseU‑TRACE experiments and data quality results to provide orthogonal validation of Ψ detection, strengthening the robustness of our work.

      Here we copied the response in Reviewer 1 section:

      “To further demonstrate strong reproducibility and sensitivity. We selected three positive Ψ sites from two independent biological replicates for experimental validation, alongside one negative control site, using pseU‑TRACE method[6]. Ct values were first normalized to the corresponding Ct value of the negative control site, and the treated samples were then further normalized to their corresponding input controls (new Supplementary Fig. 2e ).

      Four Ψ sites were tested with pseU‑TRACE: Ψ site at position 944 on 23S rRNA, a negative control site located within guaA gene, a Ψ site within clpV1 gene, and an intergenic Ψ site located between guaA and guaB genes. We successfully validated these Ψ sites in P. aeruginosa. The detailed pseU‑TRACE experimental procedures and corresponding data figures have been added to the revised manuscript, in either Results or Methods sections (Line 171-175, 594–617).”

      (2) The manuscript is not well-written, and the presented work shows a major lack of scientific rigor, as several key pieces of information are missing.

      We thank Reviewer 3 for the suggestion. We restructured the main text to present a clearer logical flow, with key objectives (Lines 83–96, 171–175, 428–447, 517-523) explicitly stated in Introduction section and Conclusions section, with data figures directly addressing these stated aims (Supplementary Fig. 1–7).

      (3) The manuscript's organization requires significant improvement, and numerous instances of missing or inconsistent information make it difficult to understand the key objectives and conclusions of the study.

      We thank Reviewer 3 for the constructive feedback. All supplementary figures have been updated with detailed figure legend, methodology description, and consistent formatting. We also systematically inspected and resolved instances of missing or inconsistent information throughout the main text and supplementary materials (Supplementary Fig. 1–7; Supplementary Table 1). To enhance computational reproducibility, we have updated our GitHub repository with well-documented code and developed user-friendly prediction tools for Ψ identification across the four bacterial species examined in this study.

      (4) The rationale for selecting specific bacterial species is not clearly explained, and the manuscript lacks a systematic comparison of pseudouridylation among these species.

      We thank Reviewer 3 for the constructive feedback. The bacterial species analyzed in this study were selected based on both diversity and significance. K. pneumoniae, B. cereus, and P. aeruginosa are top model human pathogens responsible for a wide range of clinically significant infections, yet transcriptome-wide pseudouridylation has not been systematically explored in these organisms[7–9]. P. syringae, the most important model plant pathogen, was included to extend our analysis beyond human pathogens and to examine Ψ modification in a distinct ecological and evolutionary context, where epitranscriptomic regulation also remains poorly characterized[10]. Importantly, the selected species represent both Gram-positive (B. cereus) and Gram-negative (K. pneumoniae, P. aeruginosa, and P. syringae) bacteria, spanning substantial differences in genome size, GC content, lifestyle, and pathogenic strategies. This diversity enables a comparative framework for examining conserved and species-specific pseudouridylation patterns across bacterial lineages.

      To address the reviewer’s concern, we have revised the manuscript to more clearly articulate the rationale for species selection and have added a comparative analysis highlighting similarities and differences in Ψ site distribution and modification levels among these species (Lines 83–96). We systematically compared Ψ-carrying motif for analyzing sequence context of 10 bases flanking Ψ sites in bacterial mRNA, with Supplementary Fig. 4 added.

      Reference

      (1) Leppik, M., Liiv, A. & Remme, J. Random pseuoduridylation in vivo reveals critical region of Escherichia coli 23S rRNA for ribosome assembly. Nucleic Acids Res. 45, (2017).

      (2) Rajan, K. S. et al. A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei. Nat. Commun. 14, (2023).

      (3) Li, H. et al. Quantitative RNA pseudouridine maps reveal multilayered translation control through plant rRNA, tRNA and mRNA pseudouridylation. Nat. Plants 11, 234–247 (2025).

      (4) Dai, Q. et al. Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution. Nat. Biotechnol. 41, 344–354 (2023).

      (5) Xu, H. et al. Absolute quantitative and base-resolution sequencing reveals comprehensive landscape of pseudouridine across the human transcriptome. Nat. Methods 21, 2024–2033 (2024).

      (6) Fang, X. et al. A bisulfite-assisted and ligation-based qPCR amplification technology for locus-specific pseudouridine detection at base resolution. Nucleic Acids Res. 52, (2024).

      (7) Wyres, K. L., Lam, M. M. C. & Holt, K. E. Population genomics of Klebsiella pneumoniae. Nature Reviews Microbiology vol. 18 Preprint at https://doi.org/10.1038/s41579-019-0315-1 (2020).

      (8) Kerr, K. G. & Snelling, A. M. Pseudomonas aeruginosa: a formidable and ever-present adversary. Journal of Hospital Infection vol. 73 Preprint at https://doi.org/10.1016/j.jhin.2009.04.020 (2009).

      (9) Ehling-Schulz, M., Lereclus, D. & Koehler, T. M. The Bacillus cereus Group: Bacillus Species with Pathogenic Potential . Microbiol. Spectr. 7, (2019).

      (10) Xin, X. F., Kvitko, B. & He, S. Y. Pseudomonas syringae: What it takes to be a pathogen. Nature Reviews Microbiology vol. 16 Preprint at https://doi.org/10.1038/nrmicro.2018.17 (2018).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This important study functionally profiled ligands targeting the LXR nuclear receptors using biochemical assays in order to classify ligands according to pharmacological functions. Overall, the evidence is solid, but nuances in the reconstituted biochemical assays and cellular studies and terminology of ligand pharmacology limit the potential impact of the study. This work will be of interest to scientists interested in nuclear receptor pharmacology.

      Strengths:

      (1) The authors rigorously tested their ligand set in CRTs for several nuclear receptors that could display ligand-dependent cross-talk with LXR cellular signaling and found that all compounds display LXR selectivity when used at ~1 µM.

      (2) The authors tested the ligand set for selectivity against two LXR isoforms (alpha and beta). Most compounds were found to be LXRbeta-specific.

      The majority of ligands were found to be LXRβ-selective; however, examples of non-selective and LXRα-selective ligands were identified. It should be noted that this is a small compound set of literature ligands with reasonable structural diversity.

      (3) The authors performed extensive LXR CRTs, performed correlation analysis to cellular transcription and gene expression, and classification profiling using heatmap analysis-seeking to use relatively easy-to-collect biochemical assays with purified ligand-binding domain (LBD) protein to explain the complex activity of full-length LXR-mediated transcription.

      Weaknesses:

      (1) The descriptions of some observations lack detail, which limits understanding of some key concepts.

      Changes to the submitted manuscript hopefully add clarity. Several observations reinforce aspects of the literature and are a corollary of the observation that the majority of ligands with agonist activity more strongly stabilize/induce coactivator-bound complexes with LXRβ. This results in general LXRβ selectivity for agonists and also more variability in the response of LXRα to different ligand chemotypes. The most significant observations were for partial agonists that stabilize corepressor binding, in particular of the complex with LXRα.

      (2) The presence of endogenous NR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data.

      This is generally a confounding factor for ligands with apparent antagonist activity and is a source of ambiguity in designating inverse agonists across the nuclear receptor research field. Theoretically, this could also impact weak and partial agonists; however, this requires further study.

      (3) The normalization of biochemical assay data could confound the classification of graded activity ligands.

      Normalization to TO (100%) and vehicle (0%) is applied to most data. It is not clear how this confounds data interpretation. TO is a very reliable and reproducible agonist without significant bias towards LXR isoforms.

      (4) The presence of >1 coregulator peptide in the biplex (n=2 peptides) CRT (pCRT) format will bias the LBD conformation towards the peptide-bound form with the highest binding affinity, which will impact potency and interpretation of TR-FRET data.

      Multiplex assays must be optimized to balance binding affinity of the coregulator peptides (bear in mind these are somewhat-artificial small peptide constructs that are hoped to reflect binding of the much larger coregulator protein itself). Since the dominant theory of NR tissue-selectivity is based on the cellular availability (read concentration) of coregulators, this balance exists in a cellular context.

      (5) Correlation graphical plots lack sufficient statistical testing.

      Correlations are now supported by statistical data and we have added hierarchical clustering analysis.

      (6) Some of the proposed ligand pharmacology nomenclature is not clear and deviates from classifications used currently in the field (e.g., hard and soft antagonist; weak vs. partial agonist, definition of an inverse agonist that is not the opposite function to an agonist).

      Classifications used currently in the field vary from one NR to another and the use of partial and inverse agonist, in particular, is usually qualitative, unclear, and often misleading. We expand on these classifications with respect to our use of labels to classify pCRT response to LXR ligands. In agreement with the reviewer, we have replaced IA (inverse agonist) with (RA) reverse agonist as a label specifically associated with pCRT analysis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript by Laham and co-workers, the authors profiled structurally diverse LXR ligands via a coregulator TR-FRET (CRT) assay for their ability to recruit coactivators and kick off corepressors, while identifying coregulator preference and LXR isoform selectivity.

      The relative ligand potencies measured via CRT for the two LXR isoforms were correlated with ABCA1 induction or lipogenic activation of SRE, depending on cellular contexts (i.e, astrocytoma or hepatocarcinoma cells). While these correlations are interesting, there is some leeway to improve the quantitative presentation of these correlations. Finally, the CRT signatures were correlated with the structural stabilization of the LXR: coregulator complexes. In aggregate, this study curated a set of LXR ligands with disparate agonism signatures that may guide the design of future nonlipogenic LXR agonists with potential therapeutic applications for cardiovascular disease, Alzheimer's, and type 2 diabetes, without inducing mechanisms that promote fat/lipid production.

      Strengths:

      This study has many strengths, from curating an excellent LXR compound set to the thoughtful design of the CRT and cellular assays. The design of a multiplexed precision CRT (pCRT) assay that detects corepressor displacement as a function of ligand-induced coactivator recruitment is quite impressive, as it allows measurement of ligand potencies to displace corepressors in the presence of coactivators, which cannot be achieved in a regular CRT assay that looks at coactivator recruitment and corepressor dissociation in separate experiments.

      Weaknesses:

      I did not identify any major weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Page 2. "The endogenous ligands ... activate LXR via canonical or alternate mechanisms." What is an alternate mechanism?

      Small modifications to Fig. 1 caption identify a mechanism alternative to the canonical mechanism: LXR transcriptional complexes are RXR heterodimers that can be activated by a canonical mechanism of coregulator recruitment or an alternative de-repression mechanism

      (2) Page 5: "Notably, the 25 amino acid SRC-1 peptide is the only coactivator tested for LXR binding that has the fluorophore remote from the coactivator peptide." What does this mean, and could it influence the results?

      The sentence has been expanded to clarify the meaning. Notably, the 25 amino acid SRC-1 peptide is the only coactivator, amongst those tested for LXR binding, which has the fluorophore remote from the coactivator peptide: i.e., the only coactivator tested that uses a fluorophore labeled anti-tag antibody to bind the tagged coactivator rather than a fluorophore-labeled coactivator. In methods based on fluorescent tags (CRT, TR-FRET, fluorescence polarization, etc.), a fluorophore that interacts directly with the receptor can generate a maximal signal that differs depending on this interaction: i.e. the identity of the coregulator used in CRT can influence the response. As seen in Figures 6 and S6, maximal response is dependent on ligand and coregulator.

      (3) Page 5: "The [CRT] assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." The dose-dependent activity in the CRT assays is more classically defined as a functional "potency", not "affinity".

      The text is changed to remove “measure of affinity”: The assay measures the ligand-dependent EC<sub>50</sub> for ligand-induced coactivator recruitment to LXR; the affinity of the ligand for the LXR:coregulator complex contributes to this potency

      (4) Page 5: "Perhaps surprisingly, considering the description of multiple LXR ligands as partial agonists, most agonists studied gave maximal response at the same level as T0, behaving as full agonists." Can the authors speculate as to why partial agonist activity is not observed in their CRT assays when it has been observed in CRT assays for other nuclear receptors?

      This section has been reworded and please note the apparent partial agonist activity observed in CRT assays for multiple coactivators as shown in Figures 6 and S6 (also see (2) above). Although many LXR ligands have been reported to display partial agonist activity, most agonists studied in this specific biotin-SRC-1 CRT assay, gave maximal response at the same level as T0, behaving as full agonists.

      (5) Page 5: "Conformational cooperativity of LBD residues beyond these two amino acids leads to different conformations of Leu274 and Ala275 that generally favor ligand binding to LXRβ." Where are these residues located? Why are they important?

      We have simplified this paragraph that introduces the interesting observations and interpretation of Ding et al. to illustrate potential contributions to isoform selectivity: The ligand binding pockets of the two LXR isoforms differ by only one amino acid located in helix-3. (H3: LXRα-Val263 and LXRβ-Ile277) Interestingly, correction of this difference by mutation of these residues to alanine (V263A and I277A) was observed to lower, but not to ablate isoform selectivity in reporter assays.[108] Supported by modeling studies, this observation by Ding et al. led to the suggestion that conformational cooperativity of LBD residues beyond these two amino acids, generally favors ligand binding to LXRβ. Therefore, most reported ligands, including those examined in the current work, are LXRβ-selective or non-selective.

      (6) Some correlation plots are described to show "poor" correlations without showing the underlying statistical fits. All correlation plots should show Pearson and Spearman correlation coefficients and p-values within the figures.

      This section of the manuscript has been completely reworked with full correlation analysis and stats . There is no substantive change in data interpretation.

      (7) The normalization of TR-FRET data could introduce undesired bias when comparing activities. The methods section should provide more details about normalization of CRT data, including stating whether the control compounds' activity data were collected on the same CRT 384-well plate on the same day, or different plates, or different days, etc.

      This is now clarified in SI materials and methods section. In-plate controls are always used.

      (8) The authors describe their pCRT assay as "multiplex", whereas "biplex" might be more accurate, as they only used two peptides.

      Biplex is commonly used referring to qPCR. Bio-Plex is a commercial version of an antibody assay. Duplex is obviously a term used in nucleic acid research. Therefore, multiplex is a simpler, more generic term that we feel is suitable and can be extended to add a third coregulator.

      (9) The pCRT assays use the same peptide concentrations (200 nM). However, the peptides will have different affinities for the LBD, which may bias ligand-dependent pCRT profiles. The peptide that binds with higher affinity in the absence of ligand will bias the LBD conformation and impact ligand affinity. Can the authors comment on any limitations of the pCRT approach vs. a normal CRT? Did the authors perform any optimization to see if increasing peptide concentrations (>200 nM) or having different concentrations (e.g., 400 nM SRC1 and 200 nM NCorR2) influences the pCRT data, extracted parameters, correlations, etc.?

      As we write in the Limitations section, our assays are focused on ligand-dependence, whereas other excellent studies focus more on coregulator-dependence. The length and affinity of peptide constructs varies and therefore it is important to “balance” corepressor and coactivator concentrations. The most important conclusions from our pCRT assays concern the ability of some ligands to stabilize corepressor binding in the monoplex CRT and the universal ability of coactivator complex stabilization to eject the corepressor in the multiplex assay. Furthermore, without measurements and correlations in “natural” cellular contexts, the CRT data obtained in cell-free conditions is somewhat artificial. We evaluated a range of peptide concentrations to assess signal-to-background and overall assay performance. Each new receptor added to the panel underwent rigorous optimization to establish robust and reliable assay conditions. This included identifying a suitable positive control for each receptor, determining the optimal coregulator selection and concentration, and refining other key parameters such as buffer composition and total well volume. The concentrations reported represent the optimized balance—producing a strong, reproducible signal without oversaturation or disproportionate contribution from any individual assay component.

      (10) Page 11. The authors introduce a few ligand classification terms that are not standard in the field and unclear: "soft" vs. "hard" antagonist, "weak" vs. "partial" agonist, and their definition of an inverse agonist that, in classical pharmacologic terms, should have an opposite (inverse) function to an agonist. Furthermore, the presence of endogenous LXR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data. See the following paper for an example of ligand-dependent classification and activation mechanisms when there are endogenous cellular ligands at play: https://elifesciences.org/articles/47172

      The paragraph discussing nomenclature went through many iterations of terminology and a further paragraph was removed that discussed problems with ligand classification in the broader field of NR pharmacology: this has now been added back. We apologise for not citing the excellent Strutzenberg et al. paper on RORa pharmacology, which is now included. In this paper, Griffin and co-workers also use terms that are not standard in the field, such as “silent agonist”, which covers, in part, ligands that we describe as “weak agonists”. A standard, definitive lexicon of terms across NRs is unfortunately problematic. We have added 2 paragraphs:

      The nomenclature for NR ligands often lacks precision and differs across NR classes. SERM (a subset of selective NR modulator) is used to describe varied families of ER ligands that show tissue-selective agonist and/or antagonist actions. Unfortunately, “partial agonist” is also widely used to describe SERMs, even though its use is usually pharmacologically incorrect and biased agonist may be a more accurate label.[124] The majority of reported ER ligands are SERMs, even some that cause ER degradation, because they are transcriptionally active. Consequently, the term “pure antagonist” (PA) has been used to differentiate transcriptionally null ligands[125]; although, pure antagonist/antiestrogen was originally introduced to describe antagonism of both AF1 and AF2 functions.[90]

      Elegant work by Griffin’s team on RAR-related orphan receptor C (RORɣ) is interesting, because it used a combination of HDX-MS and CRT and defined categories of RORɣ ligands.[126] In addition to full agonist, “silent agonist” was introduced to include endogenous and synthetic partial agonists; although, by definition, partial agonists should antagonize full agonists. On the antagonist side of the spectrum, “active antagonist” was used to describe ligands that reduce cellular activity to baseline; and “inverse agonist” for ligands that reduce cellular transcription below baseline and induce recruitment of corepressors. Curiously, inverse agonist has almost never been used to describe ER ligands and is used frequently for other NR ligands, mostly for ligands that reduce transcription below baseline, without any evidence for corepressor recruitment. GSK2033 and SR9238 show inverse agonist activity in cells (Figs 3, 5); however, neither is capable of recruiting SMRT2 or NCOR2 to LXR (Fig. 7).

      (11) Figure 9A and Figure S8. Could hierarchical clustering analysis be used to more rigorously compare the activities of the ligands?

      We have now added hierarchical clustering analysis (Figs 4 S4). It should be noted that the value of such an analysis is much higher when the number of ligands is increased.

      (12) How does cellular potency correlate to pCRT vs. CRT potencies? Does pCRT better explain cellular potency?

      We have added this specific correlation (multiplex CRT vs. monoplex CRT).

      (13) The authors should provide an SI table of parameters (potency values) used for correlation and heatmap analyses.

      Tables have been added to SI accordingly.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has many strengths, but can still be improved by addressing the following critiques:

      (1) I am surprised the team did not find a ligand with a higher efficacy than T0. Please would you explain why T0 seems to have maxed out ligand efficacy for both LXRalpha and LXRbeta?

      Several ligands gave superior efficacy to T0 in cell-based reporter assays and in CRT assays shown in Figures 6 and S6: AZ876, BE1218, and MK9 gave maximal response higher than that of T0.

      (2) In the subsection, "Activity and isoform selectivity of LXR ligands", you mentioned that "The assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." This is incorrect. EC50 is a measure of ligand potency, not affinity.

      See Reviewer-1 (3)

      (3) In Figure 3 it is unclear what was used to normalize the antagonist responses in Panel F. Also, I recommend changing the y-axis of Panel F to -100 to 50 to get a better view of the response.

      This has been clarified: zero is vehicle control. Change to y-axis is made.

      (4) In Figure 4, the correlation R-squared values should be presented as a Table to have a better qualitative assessment of the correlations. It is challenging to judge which correlations are better by relying only on visual inspection. I also recommend moving the two panels from Figure S3 to Figure 4 as panels E and F.

      Extensive changes to Figure 4 have been made in response to this comment and that of Reviewer 1, who wanted these values in the figures: Reviewer-1 points (6) and (12).

      (5) In Figure 5, the fold changes in panels G, H, and I could better be presented as a bar graph. Also, the cytotoxicity of ligands needs to be assessed. For instance, in BE1218, there is a sharp decrease in fold change going from ~1 uM to ~10 uM. This will also confirm if the downward trends for SR9238 and GSK2033 are "real" and not as a result of cells dying off at higher ligand concentrations.

      Across our many studies on potent NR ligands, at concentrations above 3 uM, cell growth inhibition is observed. This is true for ER ligands, such as tamoxifen, with explanations in the literature including membrane disruption and low-affinity cytoplasmic binding proteins. We include cell viability measurements in Supplemental as a specific response to the reviewer’s query. There is no loss of cell viability in HepG2 cells.

      (6) Several ligands induce recruitment of coactivators but with minimal ability to displace corepressors. Physiologically, what would be the expected effect of these ligands on LXR activity?\

      We have defined such ligands from pCRT analysis as weak agonists (WA); however, pCRT shows WA ligands induce corepressor loss in the presence of coactivator. Depending on coregulator balance and isoform expression and the importance of the derepression mechanism in a specific cell context, WA ligands might be expected to be differentiated from SA (strong agonist) ligands.

      (7) In the subsection, "synchronous coregulator recruitment by multiplex, precision CRT" you mentioned that "For LXRbeta, the correlation between SRC1 recruitment in monoplex and multiplexed CRT is good," but the data is not shown. I think it would be better to show this data for transparency.

      See query (4) and Reviewer-1. Done.

      (8) In Figure 9, Panel A, the heat map is quantitated as 0-150. Is this fold change? If so, add this label to the figure legend.

      It is Normalized Response as %, which is now added.

      (9) In Figure 9, Panel B, please explain why in all cases, CoA-bound LXR resides at a higher energy level than the CoR-bound, and the apo LXR is at a lower energy level than the CoA-bound protein. A coregulator-bound (holo) protein structure is generally a lower energy (more stable) structure than the unbound (apo) protein. The binding of a coregulator stabilizes the protein's conformation and shifts the equilibrium towards a more thermodynamically favorable state. Using the same argument, it does not make sense to me that the CoR-bound LXR is on the same energy level as the apo LXR.

      This schema reflects our observations in pCRT. No signal was observed for coactivator-bound (holo) protein in the absence of ligand; whereas, a signal was observed for corepressor-bound (holo) protein in the absence of ligand. Therefore, the CoA-bound LXR is higher energy than apo-LXR (+ unbound CoA). Conversely, the signal for CoR-bound LXR can be reduced or increased by ligands, requiring the CoA-bound LXR to be of similar energy to apo-LXR (+ unbound CoR).

      (10) In the Figure 9b caption, "measured at 1uM" pertains to the concentration of ligand or coregulator? This is unclear. You should report the concentration of both ligand and coregulator.

      Clarified in caption.

      (11) In Figure S4, signal for SR9238 shoot up to ~300 units for ligand concentrations >3 uM. Please explain what could have contributed to this anomalous activation and why this was moved to the Supplementary File and not shown in the main figure (Figure 5).

      The HepG2-SRE assay is a nano-luc reporter assay, unlike the CCF-ABCA1 that is a firefly luciferase assay. There is substantial anecdotal evidence that furimazine/nano-luc is susceptible to stabilization enhancement. The RT-PCR data presented in Fig. 5 confirms that this is an artifact for some biphenyl sulfones.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete. It could be strengthened by the use of sensitive RNA in situ hybridization approaches.

      Thank you for your valuable assessment. RNA in situ hybridization evidence has been added to the revised manuscript (Figure 5A-D) to support that GSC tumors produce BMP ligands.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells.

      (3) Appropriate use of quantification and statistics.

      We greatly appreciate your valuable comments.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      This is a good question. Because the SGC phenotype depends on the presence of both germline tumor clones and out-of-niche wild-type germ cells, our quantification was restricted to germaria containing both. In 14-day-old fly ovaries, 70% of germaria (432/618) met this criterion (Line 103). Each of them contained an average of 1.5 SGCs (Figure 1K).

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      Our attempts to induce ovarian hs-FLP germline clones by heat-shocking adult flies were unsuccessful, with very few clones being observed. Therefore, we shifted our approach to an earlier developmental stage. Successful induction was achieved by subjecting late-L3/early-pupal animals to a twice-daily heatshock at 37°C for 6 consecutive days (2 hours per session with a 6-hour interval, see Lines 331-335) (Zhao et al., 2018).

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional character rization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      These 20-25% of SGCs are bamP-GFP<sup>+</sup> dad-lacZ<sup>-</sup>, not bam<sup>+</sup> dad-lacZ<sup>+</sup> (see Figure 2C and 3D). They would be cystoblast-like cells that may have initiated a differentiation program toward forming germline cysts (see Lines 122-130). The 70-75% of SGCs that have low BMP signaling exhibit GSC-like properties, including: 1) dot-like spectrosomes; 2) dad-lacZ positivity; 3) absence of bamP-GFP expression. While additional markers would be beneficial, we think that this combination of properties is sufficient to classify these cells as GSC-like.

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      Yes, we initially identified the SGC phenotype through hs-FLP-mediated mosaic analysis of bam or bgcn mutant in ovaries. However, as noted in our response to Weakness (2), this approach was very labor-intensive. Therefore, we switched to using the more convenient nos>FLP system for subsequent experiments. To our observation, there was no difference in inducing the SGC phenotype by these two approaches.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      These are very good questions. The SGC phenotype was consistent over the 14-day analysis period (Figure 1J) and was specifically dependent on the presence of germline tumor clones. In 14-day-old fly ovaries, these clones were both larger and more frequent than in younger flies. This age-dependent enhancement in clone size and frequency significantly improved our quantification efficiency (see Lines 101-112).

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      Thank you for this valuable comment. It was also difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. In Figure 4A-F, to resolve this problem, we used a triple-color system, in which red germ cells (RFP<sup>+/+</sup> GFP<sup>-/-</sup>) are bam mutant, yellow germ cells (RFP<sup>+/-</sup> GFP<sup>+/-</sup>) are wild-type, and green germ cells (RFP<sup>-/-</sup> GFP<sup>+/+</sup>) are punt or med mutant. In Figure 4G-J, we quantified the SGC phenotype only in black germ cells (GFP<sup>-/-</sup>), which are wild-type (control) or mad mutant. In Figure 6, we quantified the SGC phenotype only in green germ cells (both GFP<sup>+/+</sup> and GFP<sup>+/-</sup>), all of which are wild-type.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      Thank you for this critical comment. The settings of immunofluorescent staining and confocal parameters in the original Figure 5A were the same as those in 5B. To our observation, the levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results from the revised Figure 5. Instead, the HCR-FISH data have been added (Figure 5A-D) to support that bam mutant germline tumors secret BMP ligands.

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      No. Given that bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in inducing the SGC phenotype (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would be redundant and would not alter the key conclusion of our study. Thank you for your understanding!

      Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      We apologize for any confusion. In our manuscript, the term "SGC" is reserved specifically for wild-type germ cells that maintain a GSC-like morphology outside the niche. bam or bgcn mutant germ cells are referred to as GSC-like tumor cells (Lines 89-90), not SGCs.

      (a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wild-type cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      The SGCs analyzed in Figure 2A-C are wild-type, GSC-like cells located outside the niche. They were generated using the same genetic strategy depicted in Figures 1C and 1E (with the schematic in Figure 1B). The complete genotypes for all experiments are available in Source data 1.

      (b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      The strategy to generate SGCs in Figure 4B-F (with the schematic in Figure 4A) is different from that in Figure 1C-F, H, and I (with the schematic in Figure 1B). In Figure 4B-F, we needed to distinguish punt<sup>-/-</sup> (or med<sup>-/-</sup>) with punt<sup>+/-</sup> (or med<sup>+/-</sup>) germ cells. As noted in our response to Reviewer #1’s Weakness (6), it was difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. Therefore, we chose to use the triple-color system to distinguish these germ cells in Figure 4B-F (see genotypes in Source data 1).

      (c) Additionally, bam<sup>+/-</sup> GSCs (the first bar in Figure 4E) should appear GFP<sup>+</sup> and Red>sup>+</sup> (i.e., yellow). It would be helpful if the authors could indicate these bam<sup>+/-</sup> germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam<sup>+/-</sup> cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      The color-to-genotype relationships for the schematics in Figures 2A and 4E are provided in Figures 1B and 4A, respectively. Due to the high density of germ cells, it is impractical to label each genotype directly in the images. In contrast to Figure 4E, the colors in Figure 4F do not represent genotypes; instead, blue denotes the percentage of SGCs, and red denotes the percentage of germline cysts, as indicated below the bar chart.

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      Thank you for this constructive suggestion. These quantification data have been added to the revised Figure 1 (Figure 1J, K).

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      Unlike wild-type cystocytes, which undergo incomplete cytokinesis and lack midbodies, those with USP8 loss undergo complete cell division, with the presence of midbodies (white arrow, Figure 1F’ from Mathieu et al., 2022) as a marker of the late cytokinesis stage (Mathieu et al., 2022).

      (a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      (b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      We appreciate your critical comment. bam mutant mosaic germaria indeed contained wild-type germline cysts, as evidenced by an SGC frequency of ~70%, rather than 100% (see Figures 2H, 4F, 4J, 6F, 6I, and Figure 6-figure supplement 3C). Since the SGC phenotype depends on the presence of bam or bgcn mutant germline tumors, we quantified it as “the percentage of SGCs relative to the total number of SGCs and germline cysts that are surrounded by germline tumors” (see Lines 103-108). Quantifying the SGC phenotype as "the percentage of germaria with SGCs" would be imprecise. This is because the presence and number of SGCs were variable among germaria with bam or bgcn mutant germline clones, and a small number of germaria entirely lacked these clones. The data of "SGCs per germarium with both germline clones and out-of-niche wild-type germ cells" have been added to the revised Figure 1 (Figure 1K).

      (c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      Such germaria could be found in Figure 2G, 3B, 3C, 6D, 6E, and 6H. The percentage of germline cysts can be calculated by “100% - SGC%”.

      (d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      As noted in our response to points a) and b) above, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for analyzing the phenotype. For this experiment, we examined >50 such germline cysts via confocal microscopy. As the analysis was performed on a defined cellular population, this sample size should be sufficient to support our conclusion.

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      We assayed for the presence of midbodies or not specifically within the wild-type germline cysts surrounded by bam or bgcn mutant tumors, not within the tumors themselves (Lines 96-97). As detailed in Lines 90-100, the absence of midbodies was used as a key criterion to exclude the possibility of dedifferentiation.

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      Thank you for your constructive comment. RNA in situ hybridization data have been added to support that bam or bgcn mutant germline tumors secret BMP ligands (Figure 5A-D).

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] SGCs and [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      Schematic cartoons in Figure 6A and 6B demonstrate that these analyses were performed in a dpp<sup>+/-</sup> background. Figure 6-figure supplement 1 indicates tha dpp<sup>+/-</sup> or gbb<sup>+/-</sup> does not affect GSC maintenance, germ cell differentiation, and female fly fertility. Figure 6C is the control for 6D and 6E, and 6G is the control for 6H, with quantification in 6F and 6I. We used nos>FLP, not the heat shock method, to induce germline clones in these experiments (see genotypes in Source data 1).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      Thank you for teaching us! We have included the introduction of these two papers in the revised manuscript (Lines 197-199).

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      SGC is readily distinguishable from multi-cellular germline cyst based on morphology. In some clonal-analysis experiments, fusome staining was not feasible due to technical limitations such as channel saturation or antibody incompatibility. Thank you for your understanding!

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      The figure panels were arranged to optimize space while ensuring that related panels are grouped in close proximity for logical comparison. We would be happy to consider any specific suggestions for an alternative layout that could improve clarity.

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

      The detailed quantification information is labeled directly in figures or described in figure legends, and all raw quantification data are provided in Source data 2.

      Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      We greatly appreciate your valuable comments.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here. Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche." Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like single-germ-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      Thank you for your critical comment. The revised manuscript now includes a time-course analysis of the SGC phenotype (Figure 1J). Our data in Figure 6 demonstrate that BMP ligands from germline tumors are required to inhibit SGC differentiation. Furthermore, we have incorporated into the manuscript the possibility that disruption of the differentiation niche may also contribute to the SGC phenotype (Lines 197-199).

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution. In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpp-lacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      We greatly appreciate your critical comment. In our data, the expression levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results in the revised Figure 5. RNA in situ hybridization data have been added to visualize the expression of BMP ligands within bam mutant germline tumor cells (Figure 5A-D).

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      Although a structure resembling a "branched fusome" is visible in Figure 6E (right of the white arrowhead), it is an artifact resulting from the cytoplasm of GFP-positive follicle cells, which also stain for α-Spectrin, projecting between germ cells of different clones (see the merged image). In both our previous (Zhang et al., 2023) and current studies, bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in its ability to block GSC differentiation and induce the SGC phenotype (Figure 1J). Given this, we believe that repeating the extensive experiments in Figure 6 with the bam<sup>Δ86</sup> allele would be scientifically redundant and would not change the key conclusion of our study.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      Dpp and Gbb are the key niche signals from cap cells for maintaining GSC stemness. Our work demonstrates that germline tumors can specifically mimic this signaling function, not the full suite of cap cell properties, to create a non-cell-autonomous differentiation block. The current title “Tumors mimic the niche to inhibit neighboring stem cell differentiation” reflects this precise concept: a partial, functional mimicry of the niche's most relevant activity in this context. We feel it is an appropriate and compelling summary of our main conclusion.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

      Because of the highly variable expression levels in terminal filament and cap cells, we have omitted the dpp-lacZ results in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Not all readers may be familiar with the nos>FLP/FRT or hs-FLP/FRT systems. It would be helpful if the authors could briefly introduce these genetic mosaic systems and explain how they were used in this study before presenting the results.

      Thank you for this constructive suggestion. Such brief introduction has been added to the revised manuscript (Lines 64-70).

      (2) Line 68-70: "Surprisingly, ...outside the niche retained a GSC-like single-germ-cell (SGC) morphology, even when encapsulated within egg chambers (Figure 1C, D, Figure 1- figure supplement 1).

      (3) The figure citation is not appropriate, as Figures 1C and 1D do not show "single germ cells (SGCs) encapsulated within egg chambers." To improve clarity, the authors could revise the sentence as follows: "Surprisingly, wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology (Figures 1C and D), even when encapsulated within egg chambers (Figure 1-figure supplement 1)." This modification would make the description consistent with the figure content and easier for readers to follow.

      Thank you for teaching us! The manuscript has been revised following this suggestion (Lines 70-73).

      (4) Line 106-110. The description is confusing. The authors state, "Under normal conditions... Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). However, Figure 2B shows the bam mutant mosaic germaria, and Figure 2C does not specify the genotypes of the germaria used for the analysis of GSCs, CBs, and SGCs. The authors should clarify the experimental conditions and genotypes corresponding to each panel. In addition, it would be more informative to indicate how many germaria these quantified GSCs, CBs, and SGCs were derived from.

      (5) Throughout the manuscript, the authors report the number of SGCs analyzed (e.g., Lines 149-151). However, it would be more informative to also indicate how many germaria these quantified SGCs were derived from. Providing this information would help readers assess the sampling size and variability across biological replicates.

      Thank you for your suggestion. As shown in Figure 2B, these wild-type (RFP-positive) GSCs and CBs were also derived from bam mutant mosaic germaria. The phrase "under normal conditions" has been deleted from the revised manuscript to prevent any potential ambiguity. Given the specificity of the SGC phenotype, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for its quantification (Lines 103-108). The data of “SGCs per germarium with both germline clones and out-of-niche wild-type germ cells” have been added to the revised Figure 1K.

      Reviewer #3 (Recommendations for the authors):

      (1) Additionally, the authors should clarify what the "red dot" signal in the GFP-positive cap cell in Figure 3 F (left panel) represents.

      The “red dot” is an asterisk that is used to mark a cap cell (Line 620).

      (2) Finally, on line 266, "bamP-GFP-positive" should be corrected to "bamP-GFP-negative."

      It should be “bamP-GFP-positive”, not “bamP-GFP-negative” (see Figure 2B).

      Reference:

      Mathieu, J., Michel-Hissier, P., Boucherit, V., and Huynh, J.R. (2022). The deubiquitinase USP8 targets ESCRT-III to promote incomplete cell division. Science 376, 818-823.

      Zhang, Q., Zhang, Y., Zhang, Q., Li, L., and Zhao, S. (2023). Division promotes adult stem cells to perform active niche competition. Genetics 224.

      Zhao, S., Fortier, T.M., and Baehrecke, E.H. (2018). Autophagy Promotes Tumor-like Stem Cell Niche Occupancy. Curr Biol 28, 3056-3064.e3053.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work provides evidence that slender T. brucei can initiate and complete cyclical development in Glossina morsitans without GlcNAc supplementation, in both sexes, and importantly in non-teneral flies, including salivary-gland infections.

      Comparative transcriptomics show early divergence between slender- and stumpy-initiated differentiation (distinct GO enrichments), with convergence by ~72 h, supporting an alternative pathway into the procyclic differentiation program.

      The work addresses key methodological criticisms of earlier studies and supports the hypothesis that slender forms may contribute to transmission at low parasitaemia.

      Strengths:

      (1) Directly tackles prior concerns (no GlcNAc, both sexes, non-teneral flies) with positive infections through to the salivary glands.

      (2) Transcriptomic time course adds some mechanistic depth.

      (3) Clear relevance to the "transmission paradox"; advances an important debate in the field.

      Weaknesses:

      (1) Discrepancy with Ngoune et al. (2025) remains unresolved; no head-to-head control for colony/blood source or microbiome differences that could influence vector competence.

      We acknowledge that a direct head-to-head comparison was not performed and that microbiome composition can affect vector competence. However, both the tsetse flies used in Ngoune et al. (2025) and those in our study originated from the same colony and were maintained under comparable standard laboratory conditions. In both cases, flies were fed on sheep blood through identical silicon membrane systems, minimizing potential differences.

      (2) Lacks in vivo feeding validation (e.g., infecting flies directly on parasitaemic mice) to strengthen ecological relevance.

      Our study deliberately focused on controlling experimental variables through the use of an artificial feeding system, which allows for standardization of parasite dose and exposure conditions. This approach facilitates reproducibility and direct comparison with previous studies. Also, to us it appears questionable if feeding flies on infected laboratory mice really adds ecological relevance.

      (3) Mechanistic inferences are largely correlative (although not requested, there is no functional validation of genes or pathways emerging from the transcriptomics).

      Functional validation of individual genes or pathways was not undertaken in this study. Instead, the aim was to identify and compare transcriptional signatures associated with slender-to-procyclic versus stumpy-to-procyclic differentiation, and to directly address previous criticism of original finding that slender bloodstream forms are capable of infecting the tsetse fly.

      (4) Reliance on a single parasite clone (AnTat 1.1) and one vector species limits external validity.

      Incorporating additional pleomorphic T. brucei clones and alternative tsetse species would undoubtedly broaden our understanding of parasite-vector interactions, and studies using fresh field isolates and wild-caught tsetse flies would be even more informative. However, in order to directly address the specific concerns raised against our original study (Schuster et al., 2021), it was essential to employ the same parasite clone and vector species.

      We further emphasize that the pleomorphic clone used here is a well-characterized and widely employed T. brucei strain that closely reflects parasites encountered under natural conditions. Likewise, Glossina morsitans represents the standard vector species used in the majority of tsetse laboratories, thereby ensuring reproducibility and facilitating comparison with existing work in the field.

      Reviewer #2 (Public review):

      Summary:

      This paper is an exciting follow-up to two recent publications in eLife: one from the same lab, reporting that slender forms can successfully infect tsetse flies (Schuster, S et al., 2021), and another independent study claiming the opposite (Ngoune, TMJ et al., 2025). Here, the authors address four criticisms raised against their original work: the influence of N-acetyl-glucosamine (NAG), the use of teneral and male flies, and whether slender forms bypass the stumpy stage before becoming procyclic forms.

      Strengths:

      We applaud the authors' efforts in undertaking these experiments and contributing to a better understanding of the T. brucei life cycle. The paper is well-written and the figures are clear.

      Weaknesses:

      We identified several major points that deserve attention.

      (1) What is a slender form? Slender-to-stumpy differentiation is a multi-step process, and most of these steps unfortunately lack molecular markers (Larcombe et al, 2023). In this paper, it is essential that the authors explicitly define slender forms. Which parameters were used? It is implicit that slender forms are replicative and GFP::PAD1-negative. Isn't it possible that some GFP::PAD1-negative cells were already transitioning toward stumpy forms, but not yet expressing the reporter? Transcriptomically, these would be early transitional cells that, upon exposure to "tsetse conditions" (in vitro or in vivo), could differentiate into PCF through an alternative pathway, potentially bypassing the stumpy stage (as suggested in Figure 4). Given the limited knowledge of early molecular signatures of differentiation, we cannot exclude the possibility that the slender forms used here included early differentiating cells. We suggest:

      (1.1) Testing the commitment of slender forms (e.g., using the plating assay in Larcombe et al., 2023), assessing cell-cycle profile, and other parameters that define slender forms.

      (1.2) In the Discussion, acknowledging the uncertainty of "what is a slender?" and being explicit about the parameters and assumptions.

      We appreciate the critical evaluation concerning the identity of slender forms and potential presence of intermediate forms displaying slender morphology yet exhibiting cell-cycle arrest, as proposed in Larcombe et al. (2023). Indeed, our original paper is entitled “Unexpected plasticity in the life cycle of Trypanosoma brucei.” It is precisely this phenotypic plasticity that enables slender parasites to transition directly into the procyclic insect stage. Notably, we have shown that even monomorphic trypanosome strains are capable of undergoing this transition in the fly, and such strains are not considered to represent “intermediate” or “half-stumpy” forms. Consequently, while the question “what constitutes a slender parasite?” may be of conceptual interest, it currently is, in our view, not central to the biological conclusions of this study.

      Nevertheless, we now have included an additional section in our Discussion that compares the slender cells used in our study with the commitment classification introduced by Larcombe et al. Our infection experiments were conducted using cells that meet the Larcombe-criteria of “true slender cells”, characterized by the absence of PAD1 expression and the maintenance of a slender morphology (Supplementary Figure 3A, B, following FACS sorting). Moreover, these cells are not cell-cycle arrested but continue to proliferate (Supplementary Figure 3C). Accordingly, our experimental assumptions and parameters align those of previous studies, in which continuous cell division, lack of cell cycle arrest, lack of PAD1 expression, and slender morphology are still established markers defining the slender bloodstream form.

      (1.3) Clarifying in the Materials and Methods how cultures were maintained in the 3-4 days prior to tsetse infections, including daily cell densities. Ideally, provide information on GFP expression, cell cycle, and morphology. While this will not fully resolve the concern, it will allow future reinterpretation of the data when early molecular events are better understood.

      We thank the reviewer for this helpful suggestion. Details on the maintenance of T. brucei cultures and culture conditions, including cell density, are provided in our previous publication (Schuster et al., 2021). In the present study, cultures were routinely monitored prior to infection to ensure that the cells used were GFP-negative and exhibited the characteristic slender morphology.

      For infections performed with higher cell numbers, fluorescence-activated cell sorting (FACS) was used to obtain a 100% GFP-negative population, thereby avoiding the need for daily monitoring of GFP fluorescence. This approach ensured that all infection experiments were initiated with a homogeneous population of slender bloodstream forms.

      (2) Figure 1: This analysis lacks a positive control to confirm that NAG is working as expected. It would strengthen the paper if the authors showed that NAG improves stumpy infection. Once confirmed, the authors could discuss possible differences in the tsetse immune response to slender vs. stumpy forms to explain the absence of an effect on slender infections.

      The enhancing effect of N-acetylglucosamine (NAG) on stumpy-form infections of T. brucei is well established and widely accepted in the field (e.g. Peacock et al., 2006, 2012). In the present Research Advance, our objective was to directly address the specific concerns raised in response to our previous publication (Schuster et al., 2021), in which NAG supplementation during stumpy infections was already included and shown to function as expected. Accordingly, the aim here was not to reiterate the established role of NAG in promoting stumpy infections, but rather to directly examine infections initiated by slender bloodstream forms in the absence of NAG, thereby approximating more natural conditions.

      (3) Figure 2. To conclude that teneral flies are less infected than non-teneral flies, data from Figures 1 and 2 must be directly comparable. Were these experiments performed simultaneously? Please clarify in the figure legends. Moreover, the non-teneral flies here are still relatively young (6-7 days old), limiting comparisons with Ngoune, TMJ et al. 2025, where flies were 2-3 weeks old.

      The experiments presented in Figures 1 and 2 were not performed simultaneously. Importantly, the comparison between teneral and non-teneral flies was not intended as a direct quantitative comparison across experiments, but rather to assess infection outcomes under distinct physiological states of the vector. It is well established that teneral flies are generally more susceptible to T. brucei infection than non-teneral flies, a phenomenon commonly referred to as the “teneral phenomenon.”

      Our objective was to demonstrate that slender bloodstream forms are capable of establishing infections also in non-teneral flies, thereby directly addressing concerns in the comment to our original study (Schuster et al.) that the experimental set-up may have created an unnaturally permissive environment. The data presented here in fact support the conclusion that slender forms can contribute to disease transmission under more natural conditions.

      A key determinant of the increased susceptibility of teneral flies is the incomplete maturation of the peritrophic matrix (PM) (Walshe et al., 2011; Haines, 2013). In Glossina morsitans morsitans, the PM reaches its full length along the midgut approximately 84 hours post-eclosion (Lehane and Msangi, 1991). In addition, teneral flies have not yet taken a bloodmeal prior to the infective one, a factor known to further increase susceptibility (Haines, 2013).

      In the present paper, non-teneral flies were selected that had received two non-infectious bloodmeals prior to the infective challenge. At 6-7 days post-eclosion, these flies possessed a fully established PM, which is known to increase refractoriness to infection (Walshe et al., 2011), while still being sufficiently young to survive the time required for T. brucei to complete its developmental cycle. This is an important point, as our timing allowed robust interpretation of infection outcomes, without the substantial loss of flies (approximately 40%) that has been reported to occur prior to dissection in Ngoune et al., 2025.

      (4) Figure 3. The PCA plot (A) appears to suggest the opposite of the authors' interpretation: slender differentiation seems to proceed through a transcriptome closer to stumpy profiles. Plotting DEG numbers (panel C) is informative, but how were paired conditions selected? Besides, plotting of the number of DEGs between consecutive time points within and between parasite types is also necessary. There may also be better computational tools to assess temporal relationships. Finally, how does PAD1 transcript abundance change over time in both populations? It would also be important to depict the upregulation of procyclic-specific genes.

      Regarding the PCA plot (Figure 3A), we agree that slender form differentiation transiently exhibits transcriptomic similarities to stumpy form profiles. However, as discussed in the paper, this overlap specifically reflects shared early differentiation responses rather than the adoption of a full stumpy-like transcriptome. The overall trajectory and clustering pattern indicate that slender-derived parasites follow a distinct differentiation path that - as expected -ultimately converges with the procyclic stage, consistent with our interpretation.

      For the DEG analysis (Figure 3C), paired conditions were selected based on biologically meaningful time points corresponding to key stages in the differentiation process, allowing for direct comparisons between slender- and stumpy-derived populations either for the same timepoints following addition of cis-aconitate (Supplementary Figure 5) or timepoints plotting close on the PCA (Supplementary Figure 6).

      We also appreciate the recommendation to consider alternative computational approaches for assessing temporal relationships. While our current analysis provides robust insights into transcriptomic transitions, we agree that future studies employing different tools could further refine our observations.

      Finally, we have included the expression dynamics of PAD1 and PAD2 in the Supplementary Data (Supplementary Figure 8). The expression profile for procyclic-specific genes can now be found in Supplementary Figure 9.

      (5) Could methylcellulose in the medium sensitize parasites to QS-signal, leading to more frequent and/or earlier differentiation, despite low densities? If so, cultures with vs. without methylcellulose might yield different proportions of early-differentiating (yet GFP-negative) parasites. This could explain discrepancies between the Engstler and Rotureau labs despite using the same strain. The field would benefit from reciprocal testing of culture conditions. Alternatively, the authors could compare infectivity and transcriptomes of their slender forms under three conditions: (i) in vitro with methylcellulose, (ii) in vitro without methylcellulose, and (iii) directly from mouse blood.

      The original description of stumpy induction factor (SIF)-mediated quorum sensing in Trypanosoma brucei was performed by the Boshart laboratory using (a) the same cell line employed in the present study and (b) an identical HMI-9 medium supplemented with the same amount of methylcellulose (Reuner et al., 1997; Vassella et al., 1997). All relevant controls were comprehensively reported in those studies in the late 1990s. There is therefore no experimental or historical basis to suggest that methylcellulose sensitises parasites to stumpy differentiation. Moreover, the viscosity of HMI-9-methylcellulose remains well below the threshold required to impose a diffusion barrier for small molecules such as peptides. Consequently, accumulation of SIF as a result of increased medium viscosity can be excluded on physical grounds.

      The present Research Advance was conducted with a focused objective, namely, to directly address the specific concerns raised in response to our original publication (Schuster et al., 2021). Expanding the study to include additional experimental conditions, such as systematic comparisons of cultures grown with and without methylcellulose, or analyses of parasites freshly isolated from mouse blood, would have extended the scope well beyond what is useful for a Research Advance and would have diluted the central purpose of this contribution.

      Recommendations for authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your perseverance in filling the gaps flagged by others - these data strengthen the story.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: The use of teneral flies is not mentioned in the text or the legend

      Thank you: we added this to the main text and figure legend (lines 103 and 140).

      (2) Figure 1 legend (line 2): Typo - "with or 60 nm" should read "with or without 60 nm."

      Thank you: this has been corrected (line 141).

      (3) Figure 2. Please provide the FACS gating strategy and cell numbers before and after sorting

      The cell number before gating is 1x10<sup>7</sup> cells, and 1x10<sup>6</sup> cells were collected via FACS for infection experiments. This is stated in the Materials & Methods section (lines 473 and 478).

      (4) Figure 3. RNAseq data presentation could be improved:

      (a) Clarify which type of differentially expressed genes are shown in panels B and C (presumably those upregulated in slender forms and those upregulated in stumpy forms).

      Thank you: the information has now been added to the figure legend (lines 279 and 282).

      (b) The color code in panel A is inverted relative to panels B and C.

      Thank you: this has been corrected (figure 3B and C).

      (c) The GO-term analysis represents an important conclusion and should be moved to the main figure.

      As a Research Advance, this paper is restricted in the number of figures and therefore the decision had to be made to move the GO-term analysis to the Supplements.

      (d) Provide dataset quality control in the supplement (genes detected per sample, sample consistency, replicate correlations, etc.).

      Sequencing analysis is now explained in detail in the Materials & Methods section (lines 515 - 528).

      (5) Figure legends: Indicate how many times each experiment was performed and the number of independent biological replicates.

      The number of replicates (and flies per replicate) is stated for both infection experiments in the respective figure legends (lines 143 and 203/04). For the RNA sequencing, it is stated in the main text, and we now have also added the information to the figure legend (lines 219 and 276/77).

      (6) Discussion: Despite the ongoing debate about midgut pH, could the authors also comment on other evidence suggesting that stumpy forms are better adapted to the fly?

      The pH of the midgut has been determined by the Acosta-Serrano laboratory. We have cited the paper (Liniger et al. 2003) in lines 328-330 of the discussion. Furthermore, we have discussed the developing mitochondria of stumpy forms as well as expression of Krebs cycle, and the proposed higher resistance to proteolytic stress (Vickerman, 1965; Brown et al., 1973; Hamm et al., 1990; Reuner et al., 1997, Nolan et al., 2000).

    1. Author response:

      Reviewer #1 (Public review):

      (1) While the manuscript convincingly documents distinct expression patterns, the functional consequences of these differences remain unexplored. The conclusions regarding non-redundant roles would benefit from functional perturbation experiments. Relatedly, the authors propose that tnfa and tnfb may play different immunological roles, but the mechanistic basis underlying these differences is not addressed. For example, do the two cytokines engage different receptors or signaling pathways? Do they trigger distinct downstream transcriptional programs?

      We agree functional analysis on Tnfb is relevant to address, however, the focus of the current manuscript (Tools and Resources article type) was to report the generation and validation of the new tnfb-reporter line, we feel that functional data is better suited for a separate manuscripts. In fact, this will be part of a follow manuscript which will be forthcoming soon.

      (2) Some imaging-based observations appear largely qualitative. Additional quantitative analyses, such as statistical comparisons of expression levels across time points or cell populations, would strengthen the robustness of the conclusions. For instance, in Figure 4, the expression levels of tnfa and tnfb reporter transgenes in immune cells should be quantitatively compared between control and amputated conditions.

      In figure 4, we focus on which cells express either cytokine, not on when they express it nor whether the one cell expresses more or less eGFP/mCh. Also, tnfb:mCh-F and tnfa:eGFP-F expression is membrane-bound as these protein is farnesylated, whereas il1b:eGFP is not, and has a cytoplasmic distribution. Because of possible biases due to the different distribution or abundance of cytoplasmic vs farnesylated proteins within a cell, we never compared max eGFP to max mCherry within a treatment group.

      (3) It would also be important to clarify whether the distinct maturation kinetics of the fluorescent reporters were taken into account when interpreting expression timing. Since GFP typically matures more rapidly than mCherry in vivo, the authors should comment on whether this difference could influence the apparent expression kinetics of tnfa versus tnfb.

      In figure 5, we do count the cells expressing either of the cytokine, and use eGFP/mCherry signal to infer on how early these cells express the cytokine. We, however, do not directly compare maximum eGFP or mCherry fluorescence intensity per cell, which, especially in the early time points, could be biased by differences in protein maturation, we only score eGFP or mCherry presence in a cell. We could not really compare or account for differences in protein maturation as we do not possess Il1b and tnfa transgenic lines driving mCherry expression for comparison (and to our knowledge are not available in other laboratories). Based on the obtained results however, it appears that the earlier maturation of eGFP compared to mCherry may not influence the outcome of the analysis, as no single tnfa:eGFP-F+ cells were observed at any time point and single il1b:eGFP+ cells were observed only 6h after amputation, whereas eGFP/mCherry double positive cells could be observed as early as 2h after amputation. Any bias should influence the period between 1h and 2h, and we did not look at time lapses shorter than 1h.

      Reviewer #2 (Public review):

      (1) Lack of functional analysis; these lines are a potentially valuable tool, but so far provide no clue regarding the role of tnfb. Is it a pro-inflammatory cytokine acting in synergy with tnfa, or is it an antagonist? What are its receptor(s)? What signalling pathways and downstream genes does it induce? Addressing at least some of these questions should greatly increase the impact of the paper.

      Please refer to response to Reviewer #1 point 1.

      We will address the other recommendation to the authors as they will improve the manuscript.

    1. Author Response:

      eLife assessment:

      The study provides an important advance towards understanding how spatial and temporal transcriptional programs are integrated to regulate lineage-specific chromatin and enhancer activation. The functional evidence is currently incomplete, but the current data provide a solid correlative and conceptual foundation. Functional experiments directly linking Gsb occupancy to chromatin state and regulation of some lineage-specific targets would further strengthen the causal interpretation of the model. Clarifying the scope of conclusions and explicitly acknowledging the technical limitations of current chromatin assays would provide a more balanced interpretation of the manuscript.

      We thank the reviewers and editors for their comments on our manuscript. We address here the concerns raised by them.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      It has long been known that Drosophila embryonic ventral nerve cord neuroblasts incorporate both spatial and temporal transcription factor expression to generate 30 distinct neuroblasts and lineages per hemisegment. This manuscript aims to elucidate the mechanism by which this integration of spatial and temporal transcription factors occurs through "direct regulation" or "epigenetic regulation". Direct regulation is defined as both spatial and temporal factors binding to open chromatin and working together to dictate specific lineages. Epigenetic regulation is defined as a spatial factor priming the chromatin in a neuroblast-specific manner to allow for the integration of temporal factors to generate specific lineages. The authors conclude that there is a two-step model in which a spatial transcription factor code "primes" the chromatin in terms of accessibility and then recruits temporal factors to ensure lineage-specific enhancer activation.

      We thank the reviewer for this clear and succinct summary and for accurately capturing the central idea of the model we propose. In particular, we appreciate that the reviewer highlights the distinction between the previously proposed “direct regulation” and “epigenetic regulation” models, which our work suggests may operate together within neuroblast lineages through a combinatorial spatial transcription factor code.

      Strengths:

      The authors tested two models, "direct regulation" vs "epigenetic regulation" in a well-defined pool of neural stem cells during normal development.

      We thank the reviewer for recognizing this aspect of the study.

      Weaknesses:

      The data in this study cannot clearly substantiate these two models.

      Overall, there are a number of issues that are inconsistent and not supportive of the model proposed in this manuscript. Firstly, there is no evidence of pioneer factor activity in any of the NB lineages described - i.e., any changes in chromatin accessibility being shown over time. The authors must show chromatin conformation changes during the window of spatial transcription factor expression in order to convince the readers of this phenomenon.

      Thank you for raising this point. In most studies, pioneer or chromatin-priming activity is inferred from a transcription factor’s ability to bind regions of relatively low accessibility and to remodel chromatin upon perturbation, rather than from direct developmental time-course measurements of chromatin accessibility.

      In our study we provide two lines of evidence consistent with such activity. First, TaDa profiling shows that Gsb occupies both accessible loci and regions that are relatively less accessible in NB5-6. Second, ectopic expression of Gsb in the non-cognate NB7-4 lineage results in clear chromatin remodelling, with loci both gaining and losing accessibility (Fig. 6). These perturbation experiments demonstrate that Gsb is sufficient to alter chromatin accessibility in vivo and therefore support a chromatin-priming role for it.

      We agree that a developmental time-course would be very informative. The difficulty is that, in this system, the relevant sequence unfolds extremely rapidly and across two different cellular contexts. Spatial transcription factors such as Gsb are expressed in the neuroectoderm, neuroblasts are then specified and delaminate, and Hb expression begins almost immediately after NB formation — on the order of minutes to tens of minutes. Before delamination there is no neuroblast to target with NB-specific drivers, and once the NB forms the temporal program is already underway. More generally, resolving chromatin accessibility changes across this transition would require temporally precise profiling at very high resolution in vivo, likely with live or near-live methods, and is not feasible with the Dam-based lineage-restricted approaches currently available.

      Secondly, the phenotypic data do not align with the sequencing data - the story would be more cohesive if the sequencing data and phenotypic data were in the same NB subtypes. On one hand, we are shown that Gsb misexpression induces loss of chromatin accessibility in NB 7-4, however in the widespread loss model, we are not shown a phenotype in these NB7-4 - which suggest that the chromatin accessibility at these sites (sites that have already been distinguished as SoIs for that NB subtype) does not play an important role in distinguishing NB 7-4 identity. However, the authors report loss of NB3-5 identity but have no evidence as to how the chromatin has changed (or if it has at all) in that subtype, leaving the readers to wonder how the loss of identity occurred

      Thank you for raising this point regarding the alignment between the chromatin and phenotypic analyses. The reviewer’s comment made us realise that the rationale for these experiments may not have been sufficiently clear in the original manuscript and could therefore be perceived as misaligned. We therefore explain the logic of the experimental design here and will edit the manuscript in the revision to clarify this point for readers.

      The chromatin experiments were designed to test whether Gsb is capable of remodelling chromatin when introduced into a non-cognate lineage. For this purpose, NB7-4 provided a suitable lineage with clean genetic access for TaDa/CATaDa experiments, allowing us to assess whether ectopic Gsb expression can alter chromatin accessibility in vivo.

      The functional role of Gsb, however, was examined within the spatial domain in which it is normally expressed. We knocked-down Gsb broadly and early in development and assayed its effects on NB5-6. Consistent with its established role in row-5/6 patterning, reduction of Gsb disrupted the specification of NB5-6 identity. In the converse experiment, broad misexpression of Gsb led to a partial expansion of NB5-6 markers. Because spatial patterning in the ventral nerve cord is organized into mutually exclusive row identities, changes in NB5-6 specification can be accompanied by reciprocal effects in neighbouring lineages. In our experiments, this is reflected in changes in markers of adjacent identities, particularly NB3-5. For this reason, NB3-5 markers provide a sensitive and informative readout of altered NB5-6 specification in the phenotypic analyses.

      We recognize that this point may not have been clear in the original manuscript. To avoid similar confusion for readers, we will make this reasoning explicitly clear in the revision.

      Reviewer #2 (Public review):

      Summary:

      This article by Bhattacharya et al. investigates how neural stem cells (NSCs, NBs) in Drosophila integrate spatial and temporal cues to activate neuron-specific terminal selector (TS) genes. Prior to this work, it was understood that NSCs utilize spatial transcription factors (STFs) and temporal transcription factors (TTFs) to determine lineage identity and birth order, but the mechanisms of integration were not fully elucidated. The authors employed chromatin profiling techniques to analyze the binding of STFs and TTFs in two specific neuroblast lineages, NB5-6 and NB7-4. They found that Gsb (an STF) binds both accessible and less-accessible chromatin in NB5-6, while En (another STF) binds only to pre-accessible chromatin in NB7-4. The findings support an "STF code" where the combination of pioneer and non-pioneer spatial factors, along with temporal factors, triggers neuroblast-specific enhancer activation and determines lineage identity.

      We appreciate the reviewer’s careful summary of our findings and their clear articulation of the STF-code framework that emerges from the work.

      Strengths:

      The experiments are well-executed, the interpretations are generally sound, and the figures are clear and elegant. However, some conclusions are drawn too broadly without essential functional data. Therefore, additional work is needed to more effectively convey the central message.

      We thank the reviewer for their positive assessment of the experiments, interpretation, and figures, and we respond to their specific concerns below.

      Weaknesses:

      (1) Integration of TaDa and functional data on Gsb for the STF model

      The authors demonstrate that TaDa profiling maps Gsb binding across the genome and identifies candidate chromatin-priming sites in NB5-6. Gsb LOF/GOF experiments reveal effects on NB identity. Combining TaDa data with LOF and GOF analyses indicates that Gsb influences NB5-6 specification by binding to both open and relatively closed chromatin, helping maintain NB5-6 identity while limiting NB3-5 fate.

      However, the study does not establish a direct link between specific LOF/GOF phenotypes and particular genomic targets. For instance, analyzing Gsb occupancy at lineage-specific identity factors or terminal selector genes (such as Lbe, Ap, or Eya for NB5-6; and Ems, etc., for NB3-5) in wild-type and manipulated conditions (Gsb misexpression) would directly connect chromatin binding to the regulation of fate determinants. These investigations would strengthen the mechanistic connection between the correlative TaDa profiles and the observed identity changes, supporting the idea that Gsb functions as a context-dependent chromatin-priming factor within the STF code, rather than as a generic transcription factor.

      We thank the reviewer for this very helpful suggestion. We agree that illustrating how the TaDa binding profiles relate to known lineage determinants will help connect the genome-wide chromatin data to the developmental phenotypes. In the revision therefore, we will examine Gsb occupancy at several genes associated with NB5-6 and NB3-5 identity (including Lbe, Ap, Eya, and Ems).

      (2) Gsb misexpression reveals bidirectional chromatin remodelling

      Experiments with ectopic Gsb expression demonstrate bidirectional chromatin remodeling in NB7-4, showing decreases in accessibility at some binding sites and increases at others. While the authors show that Gsb can disrupt chromatin upon misexpression, interpreting its "pioneer-like" or chromatin-priming activity is complex due to several factors: the misexpression occurs in a non-native lineage, the direct versus indirect effects rely on whole-embryo Dam-Gsb peaks instead of NB7-4-specific binding, and heat-shock-induced chromatin changes are not fully accounted for. These issues make it challenging to definitively determine Gsb's role in chromatin priming.

      A complementary approach would be to perform Gsb knockdown/loss-of-function in its native NB5-6 lineage and profile chromatin accessibility (TaDa or CATaDa). This would allow a cleaner, more physiologically relevant assessment of Gsb's contribution to priming, SoI establishment, and Hb recruitment. Such an experiment would strengthen the causal link between Gsb occupancy and chromatin state and clarify whether Gsb truly acts as a context-dependent pioneer in vivo, rather than producing indirect effects due to ectopic misexpression.

      We thank the reviewer for this thoughtful comment. We agree that the ectopic Gsb misexpression experiment in NB7-4 should be interpreted as a test of chromatin-remodelling capacity rather than as a fully physiological assay of Gsb function in its native NB5-6 context. At the same time, we note that ectopic expression in a non-native lineage is a standard approach used to assess pioneering or chromatin-remodelling capacity, precisely because it tests whether a factor can alter chromatin outside its endogenous setting. In the revision, we will explicitly discuss this distinction.

      We also agree that NB7-4-specific Gsb occupancy under misexpression would provide a cleaner distinction between direct and indirect effects. In the current manuscript, we infer likely direct effects from overlap with whole-embryo Gsb Dam profiles: loci that lose accessibility upon Gsb misexpression overlap whole-embryo Gsb binding, whereas loci that gain accessibility generally do not. We interpret this as support for the idea that decreased accessibility is more likely to reflect direct Gsb action, whereas increased accessibility is more likely to be indirect. We will clarify this logic in the revision.

      Regarding the reviewer’s suggestion of profiling chromatin accessibility after Gsb loss in native NB5-6, we completely agree that this would be an important complementary experiment. However, this experiment is not currently possible in our system. Gsb is required before NB specification/delamination, whereas available NB5-6 Gal4 drivers turn on only after this stage, precluding the use of RNAi. Early mutant analysis is also technically difficult because homozygous mutant embryos cannot be readily identified at the required stage, and the TaDa/CATaDa approach in this system requires large amounts of input material collected during the very short Hb window. We also tested an early CRISPR-based strategy using maternally contributed Cas9, but in this context the NB5-6 driver is lost, preventing TaDa/CATaDa profiling. We will therefore revise the manuscript to acknowledge that the current misexpression data support chromatin-remodelling capacity and are consistent with context-dependent priming, while not definitively establishing endogenous priming activity in NB5-6.

      (3) En is not a pioneer factor

      The authors conclude that Engrailed (En) is not a pioneer factor, based on the observation that En binding correlates with accessible chromatin and that En is not enriched at NB5-6-specific SOIs. However, this conclusion is not sufficiently supported by the functional data.

      We thank the reviewer for raising this point. We agree that, in several places, our wording was stronger than warranted by the data. For example, we stated that this pattern “argues against a pioneer role for En” and that the results “indicate that En does not act as a pioneer factor.” We agree that these statements are too definitive given the current evidence. Below, we address each of the reviewer’s specific concerns and explain the reasoning behind our original interpretation.

      First, the absence of En binding at NB5-6-specific SOIs does not necessarily indicate an inability to engage closed chromatin. These regions were not selected for the presence of En consensus motifs, so their lack of occupancy may simply reflect the absence of En binding motifs rather than a lack of pioneering capacity. A systematic motif analysis at NB5-6-specific SOIs is needed to determine whether En binding sites are present but unoccupied.

      We agree that the absence of En binding at NB5-6-specific SOIs alone would not be sufficient to infer a lack of pioneering activity, particularly if these loci do not contain En consensus motifs. That observation was only the starting point for our interpretation. Our reasoning was based on several additional lines of evidence from the genome-wide analysis:

      (1) When we examined En binding genome-wide, we consistently found that En occupancy in NB7-4 is restricted to regions of accessible chromatin.

      (2) Loci that are less accessible in NB7-4 show no detectable En occupancy.

      (3) Accessibility is strongly predictive of En binding: chromatin accessibility is markedly higher at En-bound loci than at En-unbound loci.

      Taken together, these patterns suggested to us that En binding in this lineage occurs primarily at pre-accessible chromatin rather than at less accessible regions that would require priming.

      Our interpretation was also guided by the broader literature. To our knowledge, neither Drosophila Engrailed nor its vertebrate homologues (EN1/EN2) have been reported to bind nucleosome-occluded DNA or initiate chromatin opening, which further informed our original interpretation.

      That said, we agree with the reviewer that these observations are suggestive rather than definitive. We will therefore temper the language throughout the manuscript so that we do not make categorical claims about En lacking pioneer activity. We will also perform the suggested motif analysis at NB5-6-specific SOIs to determine whether En binding motifs are present at these loci, which should help clarify whether the lack of En occupancy reflects motif availability or chromatin state.

      Second, the claim that En lacks pioneer activity relies solely on a single steady-state TaDa/DamID occupancy assay at one developmental stage. Because pioneer factor interactions can be transient, low-affinity, and stage-specific, such binding may not be detected by TaDa, which also depends on local GATC density and methylation kinetics and may yield false negatives. Given these technical limitations, the absence of En binding at less accessible regions does not definitively rule out a priming role.

      We take the reviewer’s point that our data cannot definitively rule out En as a pioneer. At the same time, it may be useful to clarify that TaDa is not a snapshot assay. Because Dam-mediated methylation accumulates over time while the fusion protein is expressed, even weak or transient interactions can leave a detectable signal when averaged across many cells and across the duration of the expression window.

      This cumulative nature of the assay is why our consistent observation of strong enrichment of En at accessible loci, and no detectable enrichment at less accessible regions across the genome, led us to infer that En binding in NB7-4 is strongly conditioned on chromatin accessibility. We nevertheless agree that this does not definitively exclude rare or transient interactions below the detection threshold of the assay, and we will temper the language in the manuscript accordingly.

      In the absence of direct functional assays (En LOF/GOF), the authors should explicitly acknowledge these technical and conceptual limitations and tone down the claim that "En lacks pioneer activity".

      Yes, we will do that!

      (4) Clarity of STF-code Model and Central Message

      The manuscript begins by presenting two models, direct and epigenetic, but the central takeaway of the paper is not clear. Specifically, the nuanced roles of the spatial factors Gsb and En as chromatin-priming versus stabilizing/effector factors within an STF code, and the resulting division of labor, are not clearly illustrated. The distinction between Gsb as a chromatin-priming factor and En as a cofactor-dependent activator/stabilizer should be explicitly presented in a stepwise model for better clarity. The authors could strengthen this by providing a schematic with two sequential stages illustrating how neuroblast identity factors (STF code) change chromatin states to drive lineage-specific enhancer activation. The schematic can be shown from the neuroectoderm to individual NB lineages to make it more panoramic.

      We thank the reviewer for this suggestion and for clearly articulating the conceptual point. As the reviewer points out, the literature has generally framed spatial–temporal integration as two alternative models—direct regulation at pre-accessible enhancers versus epigenetic priming by spatial factors. Our results suggest that elements of both mechanisms may operate within a lineage through a combinatorial STF code, with different spatial factors playing distinct roles (for example, Gsb contributing to chromatin priming, while En acts primarily at pre-accessible enhancers together with Hb). We agree that this central idea would benefit from being illustrated more explicitly. In the revision we will add a schematic summarizing this proposed two-step model and clarify the relevant parts of the text.

      (5) Identification of Priming Factors in NB7-4

      While the authors suggest that an unknown priming factor might be responsible for establishing sites of integration in NB7-4, they do not identify or explore potential candidates for this role. Further investigation into what factors might be involved in chromatin priming in NB7-4 could provide a more complete understanding of the mechanisms at play.

      We agree that identifying the factor responsible for establishing sites of integration in NB7-4 would be very informative. However, doing so would require substantial additional experiments to systematically test candidate spatial factors and assess their effects on chromatin accessibility in this lineage. Our goal in the present study was to establish how spatial and temporal cues are integrated at lineage-specific enhancers rather than to fully dissect all components of the STF code in each lineage. Identifying the priming factor in NB7-4 is therefore an important next step that we intend to pursue in future work, and we will clarify this point in the Discussion.

      (6) Functional Validation of STF Code Components

      The study proposes an STF code for each neuroblast lineage, but the specific components of these codes, beyond Gsb and En, are not fully explored. Identifying and validating additional factors that contribute to the STF code in each lineage could strengthen the conclusions.

      We agree that identifying additional components of the STF codes operating in each lineage would be very informative. Our goal in this study was not to comprehensively define all spatial factors involved in each lineage, but rather to understand how spatial and temporal inputs are integrated at lineage-specific enhancers. By examining two well-characterized spatial factors with distinct properties -- Gsb in NB5-6 and En in NB7-4 -- we aimed to illustrate how different members of an STF code can play distinct roles in shaping chromatin accessibility and enhancer activation. Identifying additional factors that contribute to these lineage-specific codes will be an important direction for future work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript titled "Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom 1 Simulations", Brown et al describe outcomes of all-atom simulation of a model outer membrane of mycobacteria. This compelling study provided three key insights:

      (1) The likely conformation of the unusually long chain alpha-branched beta-methoxy fatty acids, mycolic acids in the mycomembrane, to be the extended U or Z type rather than the compacted W-type. (2) Outer leaflet lipids such as PDIM and PAT provide regional vertical heterogeneity and disorder in the mycomembrane that is otherwise prevented in a mycolic acid-only bilayer. (3) Removal of specific lipid classes from the symmetric membrane systems leads to significant changes in membrane thickness and resilience to high temperatures.

      In addition to the three key insights, we would like to add one more; (4) asymmetric mycomembrane presents a phase transition from a disordered outer leaflet to an ordered inner leaflet.

      Strengths:

      The authors take a step-wise approach in building the complexity of the membrane and highlight the limitations of each of the approaches. A case in point is the use of supraphysiological temperature of 333 K or even higher temperatures for some of the simulations. Overall, this is a very important piece of work for the mycobacterial field, and will help in the development of membrane-disrupting small molecules and provide important insights for lipid-lipid interactions in the mycomembrane.

      We appreciate Reviewer’s positive view on our work.

      Weaknesses:

      (1) The authors used alpha-mycolic acids only for their models. The ratios of alpha, keto, and methoxy-mycolic acids are known in the literature, and it may be worth including these in their model. Future studies can be aimed at addressing changes in the dynamic behavior of the MOM by altering this ratio, but the inclusion of all three forms in the current model will be important and may alter the other major findings of the current study.

      We agree that adjusting the ratios of mycolates may impact the dynamic behavior of the MOM. However, including various ratios of these lipids would require much work and introduce unnecessary complexity to our model; believe or not, the current work took more than 3 years. Investigations into the effects of mycolate structure in the MOM would be interesting and suitable for future studies.

      (2) The findings from the 14 different symmetric membrane systems developed with the removal of one complex lipid at a time are very interesting but have not been analysed/discussed at length in the current manuscript. I find many interesting insights from Figures S3 and S5, which I find missing in the manuscript. These are as follows:

      (a) Loss of PDIM resulted in reduced membrane thickness. This is a very important finding given that loss of PDIM can be a spontaneous phenomenon in Mtb cultures in vitro and that this is driven by increased nutrient uptake by PDIM-deficient bacilli (Domenech and Reed, 2009 Microbiology). While the latter is explained by the enhanced solute uptake by several PE/PPE transporter systems in the absence of PDIM (Wang et al, Science 2020), the findings presented by Brown et al could be very important in this context. A discussion on these aspects would be beneficial for the mycobacterial community.

      Following Reviewer’s suggestion, we have added the following to the Discussion section.

      “The outer leaflet symmetric bilayers, comprised of trehalose-derived glycolipids and PDIMs, reveal PDIM-dependent thickness. As observed in both symmetric outer leaflet systems and asymmetric systems, PDIM migrates to the bilayer midplane, causing the upper leaflet to bulge and increasing the overall thickness. Reduced thickness in the systems lacking PDIM, an important virulence factor for Mtb, may allow for higher nutrient uptake. This corroborates a 2009 study in which Domenech and Reed found a correlation between PDIM absence in vitro and attenuated virulence (Domenech and Reed, 2009).”

      (b) I find it interesting that loss of PAT or DAT does not change membrane thickness (Figure S3). While both PAT and PDIM can migrate to the interleaflet space, loss of PDIM and PAT has a different impact on membrane thickness. It is worth explaining what the likely interactions are that shape membrane thickness in the case of the modelled MOM.

      We have added the following to the section titled “Outer leaflet lipids drive unexpected membrane heterogeneity and softness of the Mycomembrane”.

      “Although PAT also migrates to the bilayer midplane, the PAT-deficient bilayers did not exhibit reduced thickness as the PDIM-deficient thickness did (Supporting Information Table S1). This may be due to fewer PAT than PDIM moving to the bilayer midplane. In the All_Lipids systems, PDIM migrates first, bulging the upper leaflet and reducing lipid headgroup crowding (Supporting Information Figs. S5, S6). In this slightly less crowded environment, hydrophobic forces from PAT’s tails overcome the hydrophilic forces from the trehalose headgroup, causing some PATs to move deeper into the hydrophobic region.”

      (c) Figure S5: Is the presence of SGL driving PDIM and PAT to migrate to the inter-leaflet space? Again, a discussion on major lipid-lipid interactions driving these lipid migrations across the membrane thickness would be useful.

      We have added the following to the section titled “Outer leaflet lipids drive unexpected membrane heterogeneity and softness of the Mycomembrane”.

      “Additionally, in SGL-deficient bilayers, fewer PDIMs and PATs move to the bilayer midplane. This may be due to the highly methylated lipid tails of SGL. When present in the bilayer, these methyl groups may disrupt lipid packing and increase fluidity, allowing more PDIMs to move into the hydrophobic region. Supporting Information Figure S8 shows the average lipid order parameter along each lipid tail for all outer leaflet symmetric systems. Without SGL, lipid tails are consistently more ordered, supporting the notion that SGL’s methylated tails are disrupting lipid packing. Further studies are necessary to investigate the effect of glycolipid-deficient compositions on the dynamic properties of the asymmetric MOM.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.

      The Reviewer is correct in that this is the first MD simulation of the Mtb outer membrane with diverse lipide types.

      Strengths:

      The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound. The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.

      We appreciate Reviewer’s positive view on our work.

      Weaknesses:

      Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.

      Major Points:

      (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure?

      We acknowledge the potential for simulations of drug transport through our MOM model. However, we believe with the current timescale, these simulations may be better suited for a coarse-grained model of the MOM. We plan to do this in the future, but it is out of the scope of the current study. We have added the following to the Discussion section to address this point.

      “Additionally, coarse-grained models of the outer membrane could aid in drug-transport studies, potentially revealing energetic pathways by which novel antibiotics penetrate the complex cell envelope over larger timescales.”

      (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.

      We have added the following to the Discussion section to show the effect of glycolipid composition on the deuterium order parameter.

      “The outer leaflet symmetric bilayers, comprised of trehalose-derived glycolipids and PDIMs, reveal PDIM-dependent thickness. As observed in both symmetric outer leaflet systems and asymmetric systems, PDIM migrates to the bilayer midplane, causing the upper leaflet to bulge and increasing the overall thickness. Reduced thickness in the systems lacking PDIM, an important virulence factor for Mtb, may allow for higher nutrient uptake. This corroborates a 2009 study in which Domenech and Reed found a correlation between PDIM absence in vitro and attenuated virulence (Domenech and Reed, 2009). Although PAT also migrates to the bilayer midplane, the PAT-deficient bilayers did not exhibit reduced thickness as the PDIM-deficient thickness did. This may be due to fewer PAT than PDIM moving to the bilayer midplane. In the All_Lipids systems, PDIM migrates first, bulging the upper leaflet and reducing lipid headgroup crowding. In this slightly less crowded environment, hydrophobic forces from PAT’s tails overcome the hydrophilic forces from the trehalose headgroup, causing some PATs to move deeper into the hydrophobic region. Additionally, in SGL-deficient bilayers, fewer PDIMs and PATs move to the bilayer midplane. This may be due to the highly methylated lipid tails of SGL. When present in the bilayer, these methyl groups may disrupt lipid packing and increase fluidity, allowing more PDIMs to move into the hydrophobic region. Supporting Information Figure S8 shows the average lipid order parameter along each lipid tail for all outer leaflet symmetric systems. Without SGL, lipid tails are consistently more ordered, supporting the notion that SGL’s methylated tails are disrupting lipid packing. Further studies are necessary to investigate the effect of glycolipid-deficient compositions on the dynamic properties of the asymmetric MOM.”

      (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.

      The difference in headgroup movement at different temperatures can be attributed to higher kinetics at 333K, causing the lipids to move faster. The relatively slow speed and computational load of running all-atom simulations make it difficult to simulate these lower temperatures on the timescales necessary to observe full aggregation of PDIM. However, CG simulations may be sufficient to sample these events. We have addressed this by adding the following to the Results section.

      “We also observed a stark difference in the speed with which PDIM and PAT migrate to the center at different temperatures. PDIM molecules do not fully aggregate at the membrane center until about 1500 ns at 313K, whereas they accumulate within 500 ns at 333K (Fig. 5B, 5D). This can be attributed to higher kinetics at 333K, causing the lipids to move faster. Coarse-grained models may be sufficient to observe full aggregation of hydrophobic species at the membrane midplane at lower temperatures.”

      (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towards the interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?

      We have added the following to the Results section to address this comment.

      “In all symmetric outer leaflet simulations, PDIM and PAT sit just below the headgroups of other lipids at the start of production, due to our equilibration scheme. During the last step of equilibration, lipid headgroups are allowed to move freely, which initiates migration to the membrane center and causes the slight difference between PDIM/PAT and the other lipids’ headgroup positions (Supporting Information Figs. S5, S6).”

      Minor Point:

      In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.

      This work represents the first atomistic simulation of the mycobacterial outer membrane. While not perfectly realistic, as it does not include arabinogalactan or peptidoglycan, it does have extensive descriptions of each lipid simulated and their relevance to the survival of Mtb.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The interface to build and set up all atom coordinates of the outer membrane of Mycobacterium tuberculosis should be available from CHARMM-GUI.

      The current manuscript is meant as a proof of concept for simulating bilayers composed of complex mycobacterial lipids. The current study itself took more than 3 years. Since we have developed CHARMM-GUI, the lipids described in this paper may be available in CHARMM-GUI in the future, but that is not the aim of this paper. Initial structures and final 50 ns of the simulations are available to readers (see Data Acknowledgements).

      (2) The difference between symmetric and asymmetric systems in Figures 2K and 2L is not at all clear, neither in the legend to the figure nor in the manuscript text. The color codes in 2K and 2L should be described with clarity. The authors should provide schematic diagrams similar to Figure 1 to explain each of the simulation systems they are discussing. This will clarify the difference between symmetric and asymmetric systems.

      We have updated Figure 1 to clearly show which systems are symmetric and which are asymmetric.

      (3) The first two sub-sections of the RESULT section discuss symmetric mycolic acid bilayers. The observations on thermal resilience and phase transitions are interesting, but the relevance of symmetric mycolic acid bilayers (Figures 3 & 4) to the major focus of the current manuscript (i.e., outer membrane consisting of multiple lipids) is not clear.

      Most previous simulations only focused on monolayers of mycolic acids. Our symmetric bilayers are used to provide reasonable APL and system compositions for the asymmetric membrane, so as to avoid area mismatch. We can also gain insights into how these unique lipids behave in symmetric bilayers, which may be useful to scientists aiming to study simpler membranes in the context of drug permeation or pore formation. These points have been addressed in the following addition to the Introduction section.

      “We have also used the equilibrated symmetric bilayers to estimate reasonable areas per lipid and facilitate the modeling of stable asymmetric systems.”

    1. Author response:

      General Statements

      First, we would like to thank the editor at Review Commons for the efficient handling of our manuscript. We also apologize for our delayed response.

      We would like to thank all three reviewers for their careful evaluation of our work and their constructive feedback, which will provide a valuable basis for improving the figures and the text, as described below. We expect to be able to complete the revision following the plan described below quickly.

      We would like to note that the reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the following point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this does not restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). We will revise the manuscript text accordingly to clarify this point.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And, do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not easy even for a smaller selection of sybodies. We have data that show direct binding of Smc to sybodies by various methods including ELISA, pull-downs and by biophysical methods (GCI). Initially, we omitted these data from the manuscript as we are convinced that the mapping data obtained with chimeric SMC proteins is more definitive and relevant.  During the revision we will incorporate the ELISA data showing direct binding and also indicating a lack of preference for a specific state of Smc.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the main binding site is located on the SMC coiled coils, the later scenario would likely be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This showed that they are all roughly equally expressed and that they localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We will include this data in the revised version of the manuscript.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As eluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes. We will add the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state (add Vazquez Nunez et al., 2021).”

      “ELISA data confirm that nearly all clones bind Smc-ScpAB; however, their binding shows little or no dependence on the presence of ATP or DNA.”

      Minor comments:

      (1) It was surprising that no sybodies were found that could target both bacillus and spneu Smc. For example, sybodies targeting the head regions of Smc that might work in a more universal manner. Could the authors comment on the coverage of the sybodies across the protein structure?

      It is rather common that sybodies (like antibodies and nanobodies) exhibit strong affinity differences between highly conserved proteins (> 90 % identity). The underlying reasons for such strong discrimination are i) location of less conserved residues primarily at the target protein surface and ii) the large interaction interface between sybody and target which offers multiple vulnerabilities for disturbance, in particular through bulky side chains resulting in steric clashes. Another frequently observed phenomenon is sybody binding to a dominant epitope, which also often applies to nanobodies and antibodies. A great example for this are the dominant epitopes on SARS-CoV-2 RBDs.

      (2) Growth curves (Fig. S3) show a large jump in recovery in growth under sybody induction conditions. Could the authors address this observation here and in the text?

      We suppose that this recovery represents suppressor mutants and/or (more likely) improved growth in the absence of functional Smc during nutrient limitation (see Gruber et al., 2013 and Wang et al., 2013). We will add this statement to the text.

      (3) L41- Sentence correction: Loop can be removed.

      Ah, yes, sorry for this confusing error. Thank you.

      (4) L525 - bsuSmc 'E' :extra E can be removed.

      To do. Thank you.

      (5) References need to be properly formatted.

      To do. Thank you.

      (6) The authors should add in figure legend for Fig 1i) details on representation of the purple region, and explain the grey strokes for orientation of the loop.

      To do.

      (7) How many cells were analysed in the cell biological assays? Legends should include these information.

      To Be Included.

      Reviewer #1 (Significance):

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Review: "Single Domain Antibody Inhibitors Target the Coiled Coil Arms of the Bacillus subtilis SMC complex" by Ophélie Gosselin et al, Review Commons RC-2025-03280 Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      In summary, the authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Some specific comments:

      Line 75: "likely stabilizing otherwise rare intermediates of the conformational cycle." - sorry, why is that being concluded? Why not stabilizing longer-lived oncformations?

      We will clarify this statement!

      Line 89: Sorry, possibly our lack of understanding: why first ribosome and then phage display?

      Ribosome display offers to screen around 10^12 sybodies per selection round (technically unrestricted library size), while for phage display, the library size is restricted to around 10^9 sybodies due to the fact that production of a phage library requires transformation of the phagemid plasmid into E. coli, thereby introducing a diversity bottleneck. This is why the sybody platform starts off with ribosome display. It switches to phage display from round 2 onwards because the output of the initial round of ribosome display is around 10^6 sybodies, which can be easily transferred into the phage display format. Phage display is used to minimize selection biases. For more information, please consult the original sybody paper (PMID: 29792401).

      Line 100: Why was only lethality selected? Less severe phenotypes not clear enough?

      Yes, colony size is more difficult to score robustly, as the sizes of individual transformant colonies can vary quite widely. The number of isolated sybodies was at the limit of further analysis.

      Line 106: Could it be tested somehow if convex and concave library sybodies fold in Bs?

      We did not focus on the non-functional sybody candidates and only sybodies of the loop library turned out to cause functional consequences at the cellular level. Notably, we will include gfp-imaging showing that non-lethal sybodies are expressed to similar levels that toxic sybodies. Given the identical scaffold of concave and loop sybodies (they only differ in their CDR3 length), we expect that the concave sybodies fold in the cytoplasm of B. subtilis. For the convex sybodies exhibiting a different scaffold, this will be tested.

      Line 125: Could Pxyl be repressed by glucose?

      To our knowledge and experience, repression by glucose (catabolite repression) does not work well in this context in B. subtilis.

      Line 131: The SMC replacement strain is a cool experiment and removes a lot of doubts!

      Thank you! (we agree).

      Line 141: The mapping is good and looks reliable, but looks and feels like a tour de force? Of course, some cryo-EM would have been lovely (lines 228-229 understood, it has been tried!).

      Yes, we have made several attempts at structural biology. Unfortunately, Smc-ScpAB is not well suited for cryo-EM in our hands and crystallography with Smc fragments and sybodies did not yield well-diffracting crystals.

      Line 179: Mmmh. Do we not assume DNA binding on top of the dimerised heads to open the CC (clamp)?

      We will clarify the text here.

      Line 187: Having sybodies that presumably keep the CC together (closing) and some that do not allow them to come together correctly (opening) is really cool and probably important going forward.

      Thank you!

      Figure 1 Ai is not very colour-blind friendly.

      We are sorry for this oversight. We will try to make the color scheme more inclusive. Thank you for the notification.

      Optional: did the authors see any spontaneous mutations emerge that bypass the lethal phenotype of sybody expression?

      No, we did not observe spontaneous mutations suppressing the phenotype, possibly due to the limited number of cell generations observed. We tried to avoid suppressors by limiting growth, but this may indeed be a good future approach for further fine map the binding sites and to obtain insights into the mechanism of inhibition.

      Optional: we think it would be nice to try some biochemical experiment with BMOE/cysteine-crosslinked B. subtilis Smc in the mid-region (4N or next to it) of the Smc coiled coils to try to further strengthen the story. Some of the authors are experts in this technique and strains might already exist?

      We have indeed tried to study the impact of sybody binding on Smc conformation by cysteine cross-linking. However, we were not convinced by the results and thus prefer not to draw any conclusions from them. We will add a corresponding note to the text.

      Reviewer #2 (Significance):

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Thank you!

      Reviewer #3 (Evidence, reproducibility and clarity):

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition oft he Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the „transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc „neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism ist hat the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only idenfity sybodies that bind to a rather small part oft he large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      As explained above, we are quite confident the Smc ATPase mutation did not bias the selection in an obvious way. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results much, but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then likely few other sybodies are effectively lethal in B. subtilis, with the exception of the ones isolated and characterized. We have added this notion to the manuscript. We have also tested the expression of non-lethal sybodies by gfp-tagging and imaging. These results will be included in the revision.

      Fig. 2B: is is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the „counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point and will add a corresponding comment to the text.

      Testing binding sites of sybodies tot he SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we will add ELISA results and briefly discuss grating coupled interferometry (GCI) data and pull-downs.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and will carefully rephrase this statement. Thank you.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils, which are otherwise largely neglected in the SMC literature, likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Reviewer #3 (Significance):

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

      Description of analyses that authors prefer not to carry out

      As pointed out above, there are a few minor points that we prefer not to experimentally address. In particular, we do not consider it as necessary to determine the expression levels of sybodies which were non-inhibitory. We also wish to note that we attempted to obtain structural additional biochemical data and to that end performed cryo-EM, crystallography and cysteine cross-linking experiments. Unfortunately, we did not obtain sybody complex structures and the cross-linking data were unfortunately not conclusive.  We also wish to note that the first author has finished her PhD and left the lab, which limits our capacity to add additional experiments. However, as the reviewers also pointed out, the main conclusions are well supported by the data already.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tkacik et al describe their efforts to reconstitute and biochemically characterize ARAF, BRAF, and CRAF proteins and measure their ability to be paradoxically activated by current clinical and preclinical RAF inhibitors. Paradoxical activation of MAPK signaling is a major clinical problem plaguing current RAF inhibitors, and the mechanisms are complex and relatively poorly understood. The authors utilize their preparations of purified ARAF, BRAF, and CRAF kinase domains to measure paradoxical activation by type I and type II inhibitors, utilizing MEK protein as the substrate, and show that CRAF is activated in a similar fashion to BRAF, whereas ARAF appears resistant to activation. These data are analyzed using a simple cooperativity model with the goal of testing whether paradoxical activation involves negative cooperativity between RAF dimer binding sites, as has been previously reported. The authors conclude that it does not. They also test activation of B- and CRAF isoforms prepared in their full-length autoinhibited states and show that under the conditions of their assays, activation by inhibitors is not observed. In a particularly noteworthy part of the paper, the authors show that mutation of the N-terminal acidic (NtA) motif of ARAF and CRAF to match that of BRAF enhances paradoxical activation of CRAF and dramatically restores paradoxical activation of ARAF, which is not activated at all in its WT form, indicating a clear role for the NtA motif in the paradoxical activation mechanism. Additional experiments use mass photometry to measure BRAF dimer induction by inhibitors. The mass photometry measurements are a relatively novel way of achieving this, and the results are qualitatively consistent with previous studies that tracked BRAF dimerization in response to inhibitors using other methods. Overall, the paper establishes that WT CRAF is paradoxically activated by the same inhibitors that activate BRAF, and that ARAF contains the latent potential for activation that appears to be controlled by its NtA motif. The biochemical activation data for BRAF are qualitatively consistent with previous work.

      Strengths:

      While previous studies have put forward detailed molecular mechanisms for paradoxical activation of BRAF, comparatively little is known about the degree to which ARAF and CRAF are prone to this problem, and relatively little biochemical data of any sort are available for ARAF. Seen in this light, the current work should be considered of substantial potential significance for the RAF signaling field and for efforts to understand paradoxical activation and design new inhibitors that avoid it.

      Weaknesses:

      There are, unfortunately, some significant flaws in the data analysis and fitting of the RAF activation data that render the primary conclusion of the paper about the detailed activation mechanism, namely that it does not involve negative cooperativity between active sites, unjustified. This claim is made repeatedly throughout the manuscript, including in the title. Unfortunately, their data analysis approach is overly simplistic and does not probe this question thoroughly. This is the primary weakness of the study and should be addressed. A full biochemical modeling approach that accurately captures what is happening in the experiment needs to be applied in order for detailed inferences to be drawn about the mechanism beyond just the observation of activation.

      The authors' analysis of their RAF:MEK "monomer" paradoxical activation data (Figures 1, 3, and Tables 1, 2) suffers from two fundamental flaws that render the resulting AC50/IC50 and cooperativity (Hill) parameters essentially uninterpretable. Without explaining or justifying their choice, the authors use a two-phase cooperative binding model from GraphPad Prism to fit their activation/inhibition data. This model is intended to describe cooperative ligand binding to multiple coupled sites within a preformed receptor assembly, and does not provide an adequate description of what is happening in this complicated experiment. Specifically, it has two fundamental flaws when applied to the analysis in question:

      (a) It does not account for ligand depletion effects that occur with high-affinity drugs, and that profoundly affect the shapes of the dose-response curves, which are what are being fit 

      The chosen model is one of a class of ligand-binding models that are derived by assuming that the free ligand concentration is effectively equal to the total ligand concentration. Under these conditions, binding curves have a characteristic steepness, and the presence of cooperativity can be inferred from changes in this steepness as described by a Hill coefficient. However, many RAF inhibitors, including most of the type II inhibitors in this study, bind to the dimerized forms of at least one of the RAF isoforms with ultra-high affinity in the picomolar range (particularly apparent in Figure 1 with LY inhibiting BRAF). Under these conditions, the model assumption is not valid. Instead, binding occurs in the high-affinity regime in which the drug titrates the receptor and effectively all the added drug molecules bind, so there is hardly any free ligand (see e.g. Jarmoskaite and Herschlag eLife 2020 for a full description of this "titration" regime). The shapes of the curves under these conditions reflect the total amount of RAF protein (and to some extent drug affinity), rather than the presence of cooperativity. Fitting dose response curves with the chosen model under these conditions will result in conflating binding affinity and protein concentration with cooperativity.

      (b) It does not model the RAF monomer-dimer equilibrium, which is dramatically modulated by drug binding, rendering the results RAF-concentration dependent in a manner not accounted for by the analysis.

      The chosen analysis model also fails to consider the monomer-dimer equilibrium of RAF. This has two ramifications. Since drug binding is coupled to dimerization to a very strong degree, the observed apparent affinities of drug binding (reflected in AC50 and IC50 values) are functions of the concentration of RAF molecules used in the experiment. Since dimerization affinities are likely different for ARAF, BRAF, and CRAF, the measured AC50 values also cannot be compared between isoforms. This concentration dependence is not addressed by the authors. A related issue is that the model assumes drug binding occurs to two coupled sites on preformed dimers, not to a mixture of monomers and dimers. "Cooperativity" parameters determined in this manner will reflect the shifting monomer-dimer equilibrium rather than the cooperativity within dimers. Additionally, the inhibition side of the activation/inhibition curves is driven by binding of the drug to the single remaining site on the dimer, not to two coupled sites, and so one cannot determine cooperativity values for this process in this manner.

      As a result of both of these issues, the parameters reported in the tables do not correctly reflect cooperativity and cannot be used to infer the presence or absence of negative cooperativity between RAF dimer subunits. To address these major issues, the authors would need to apply a data analysis/fitting procedure that correctly models the biochemical interactions occurring in the sample, including both the monomer-dimer equilibrium and how this equilibrium is coupled to drug binding, such as that developed in e.g., Kholodenko Cell Reports 2015. Alternatively, the authors should remove the statements claiming a lack of negative cooperativity from the manuscript and alter the title to reflect this.

      The bell-shaped dose response model that we employed models the sum of two dose-response curves – one that activates and one that inhibits. That is a simple way of capturing the essence of paradoxical activation -- the superposition of drug-induced activation at low inhibitor concentrations with inhibition at higher concentrations. That said, we agree completely with the reviewer that the model does not capture the complexity of what is happening in the experiment. We worked extensively with the Kholodenko model (which we implemented in Kintek Explorer), which accounts for the effect of drug on the monomer/dimer equilibrium and for the affinity of drug for each protomer of a dimer (and can therefore model positive or negative cooperativity as well as non-cooperative binding). We could obtain excellent fits with this model with positive cooperativity – perhaps not surprising considering that this is a 12 parameter model – with reasonable Kd values for drug binding and monomer/dimer equilibrium. However, we ultimately chose not to include this analysis when we realized that the fits were not at steady-state. The underlying Kon and Koff rates for the reasonable Kd’s for monomer/dimer formation were unreasonably slow. We could also obtain superficially reasonable fits with negative or non-cooperative binding, but close inspection revealed that they did not accurately fit the steepness of the inhibition phase of the dose-response curves for type II inhibitors. Even the Kholodenko model does not capture all the key aspects of our experiment. Perhaps most notably competition with ATP, the effect of ATP on the monomer dimer equilibrium, and the divergent conformations of the kinase required for binding ATP vs a type II inhibitor. We put some effort into explicitly including ATP in the model, but quickly decided that it was beyond our modeling expertise (and it also was not feasible to implement in Kintek explorer). In the end, we settled on the bell-shaped dose-response model because it was the simplest model that fit the data. We expect to include a supplemental figure/note in the revised manuscript to discuss our work with the Kholodenko model. We will also acknowledge the limitations of the bell-shaped dose response model.

      This reviewer is also concerned that the steepness of the inhibition phase of the curves may be the result of enzyme-titration with these tight-binding inhibitors, rather than a result of positive cooperativity. We are reasonably sure that this is not the case. The shape of these curves and the IC50/AC50 values obtained is relatively insensitive to enzyme concentration, and we will include additional data in our revision to demonstrate this. Also, the steep hill slopes are unique to the type II inhibitors, which require a distinct inactive conformation of the kinase. Type I inhibitor SB590885 is similarly potent to the type II inhibitors, but does not exhibit this effect. If we were simply titrating enzyme, we would expect to see this with SB590885 as well.

      Also, we will clarify in the revised manuscript that our interpretation of positive cooperativity of inhibition by type II inhibitors is also supported by our prior work with 14-3-3-bound RAF dimers (Tkacik et al, JBC 2025). This is a much simpler experiment, as dimers are pre-formed. We have now done a thorough study of the effect of enzyme concentration on the IC<sub>50</sub> and apparent cooperativity in dimer inhibition, which we will include in our revised manuscript. These experiments confirm that we are not in a regime where we are titrating enzyme.

      As an aside, with respect to models that incorporate free inhibitor concentration, we did try to fit our 14-3-3-bound dimer inhibition data (in Tkacik et al, JBC 2025) with the Morrison equation for tight-binding inhibitors, which does take into account free ligand concentration. The fits were not reasonable with type II inhibitors, at least in part due to the non-ATP-competitive behavior of the type II drugs. Also the Morrison equation does not model cooperativity.

      Some other points to consider

      (1) The observation that ARAF is not activated by type II inhibitors is interesting. A detailed comparison of the activation magnitudes between inhibitors and between A-, B-, and CRAF is hampered by the arbitrary baseline signal in the assay, which arises from a non-zero FRET ratio in the absence of any RAF activity. The authors might consider background correcting their data using a calibration curve constructed using MEK samples of known degrees of phosphorylation, so that they can calculate turnover numbers and fold activation values rather than an increase over baseline. This will likely reveal that the activation effects are more substantial than they appear against the high background signal.

      We will explore this for our revision.

      (2) The authors note that full-length autoinhibited 14-3-3-bound RAF monomers are not activated by type I and II inhibitors. However, since this process involves the formation of a RAF dimer from two monomers, the process would also be expected to be concentration dependent, and the authors have only investigated this at a single protein concentration. Since disassembly of the autoinhibited state must also occur before dimerization, it might be expected to be kinetically disfavored as well. Have the authors tested this?

      Good points. We have carried out this experiment at more than one enzyme concentration and differing reaction times, and also failed to see activation. However, we have not systematically explored either variable.

      (3) ATP concentration modulates activation. While this is an interesting observation, some of this analysis suffers from the same issue discussed above, of not considering high-affinity binding effects. For instance, LY is not affected by ATP concentration in their data (Figure 4D), but this is easily explained as being due to its very tight binding affinity, resulting in titration of the receptor and the shape of the inhibition curve reflecting the amount of RAF kinase in the experiment and not the effective Kd or IC50 value.

      As discussed above, we’ve convinced ourselves that we are not simply titrating enzyme. It occurred to us that such an effect could explain both the steepness of the inhibition curves with LY and other type II inhibitors and the apparent ATP-insensitivity. Our studies of concentration-dependence and the correlation of this effect with the type II binding mode argue against this possibility.

      Finally, as an overarching comment to this Reviewer and the others, we understand well that our enzyme inhibition studies (here and in Tkacik 2025) do not rise to the level of a formal demonstration of cooperative ligand binding. We envision a future study in which we could address this directly, perhaps by using single molecule fluorescence to observe on/off rates for binding of fluorescently tagged inhibitors to immobilized RAF dimers. (This is clearly beyond the scope of the present work).

      Reviewer #2 (Public review):

      This manuscript by Tkacik et al. uses in vitro reconstituted systems to examine paradoxical activation across RAF isoforms and inhibitor classes. The authors conclude that paradoxical activation can be explained without invoking negative allostery and propose a general model in which ATP displacement from an "open monomer" promotes dimerization and activation. The biochemical work is technically sound, and the systematic comparison across RAF paralogs (along with mutational/functional analysis) across inhibitor classes is a strength.

      However, the central mechanistic conclusions are overgeneralized relative to the experimental systems, and several key claims, particularly the dismissal of negative allostery and the proposed unifying model in Figure 6, are not directly supported by the data presented. Most importantly, the absence of RAS, membranes, and relevant regulatory context fundamentally limits the physiological relevance of several conclusions, especially regarding the current clinical type I.5 RAF inhibitors and paradoxical activation.

      Overall, this is a potentially valuable biochemical study, but the manuscript would benefit from more restrained interpretation, clearer framing of scope, and revisions to the model and title to better reflect what is actually tested.

      (1) A central issue is that the biochemical system lacks RAS, membranes, 14-3-3 and endogenous regulatory factors that are known to be required for paradoxical RAF and MAPK activation in cells. As previous work has repeatedly shown and the authors also acknowledge, paradoxical activation by RAF inhibitors is RAS-dependent in cells, and this dependence presumably explains why full-length autoinhibited RAF complexes are refractory to activation in the authors' assays.

      Importantly, the absence of paradoxical activation by type I.5 inhibitors in this system is therefore not mechanistically informative. Type I.5 inhibitors (e.g., vemurafenib, dabrafenib, encorafenib), but not Paradox Breakers (e.g., plixorafenib), robustly induce paradoxical activation in cells because binding of the inhibitor to inactive cytosolic RAF monomer promotes a conformational change that drives RAF recruitment to RAS in the membrane, promoting dimerization. The inability of the type 1.5 inhibitor to suppress the newly formed dimers is the basis of the pronounced paradoxical activation in cells. In the absence of RAS and membrane recruitment, failure to observe paradoxical activation in vitro does not distinguish between competing mechanistic models.

      As a result, conclusions regarding inhibitor class differences, and especially the generality of the proposed model, should be substantially tempered.

      We will emphasize the limitations of our highly simplified experimental system in the revised manuscript, and temper some of our interpretations. And while the lack of membranes/RAS/14-3-3 in our system and the lack of observed PA with type I.5 inhibitors is a limitation of our study, we disagree that it renders our study of type I.5 inhibitors mechanistically uninformative. As seen here and consistent with prior studies, the binding mode of these compounds disfavors formation of the kinase dimer. While this may be overcome by 14-3-3 binding and other effects in the cellular context, it reflects a fundamental mechanistic difference as compared with type I and type II inhibitors, which also exhibit paradoxical activation.

      (2) The authors argue that their data argue against negative allostery as a central feature of paradoxical activation. However, the presented data do not directly test negative allostery, nor do they exclude it. The biochemical assays do not recreate the cellular context in which negative allostery has been inferred. Further, structural data showing asymmetric inhibitor occupancy in RAF dimers cannot be dismissed on the basis of alternative symmetric structures alone, particularly given the dynamic nature of RAF dimers in cells.

      Most importantly, negative allostery was proposed to explain paradoxical activation by Type I.5 RAF inhibitors, yet these inhibitors do not paradoxically activate in the assays presented here. The absence of paradoxical activation in this system, therefore, cannot be used to argue against a mechanism that is specifically invoked to explain cellular behavior not recapitulated by the assay.

      To be clear, we are not dismissing the possibility of negative cooperativity. And we do not think of our model as an alternative to the negative cooperativity model – rather it is a generalization that can account for paradoxical activation by diverse inhibitor classes, irrespective of positive, negative or non-cooperative modes of inhibition. We will emphasize these points in the revised manuscript.

      If negative allostery were a requisite feature of PA, we would not expect to see PA with type II inhibitors. As discussed in our response to Reviewer 1, we see clear evidence of positively cooperative inhibition of 14-3-3-bound RAF dimers by type II inhibitors (Tkacik JBC 2025) and in the present study, we find clear paradoxical activation by type II inhibitors (and there are many reports in the literature of PA by type II inhibitors in cellular contexts).

      (3) The model presented in Figure 6 is conceptually possible but remains speculative. Key elements of the model, including RAS engagement, membrane recruitment, 14-3-3 rearrangements, and the involvement of cellular kinases and phosphatases, are explicitly absent from the experimental system. Accordingly, the model is not tested by the data presented and should not be framed as a validated or general mechanism. The figure and accompanying text should be clearly labeled as a working or conceptual model rather than a mechanistically supported conclusion.

      We will revise the text to more clearly reflect that this is a working model, and importantly, that it is based on a large literature in this area in addition to the relevant experimental work in this manuscript.

      (4) The manuscript states that type I.5 inhibitors do not induce paradoxical activation in the biochemical assay because their C-helix-out binding mode disfavors dimerization. While this is true in isolation, it overlooks the well-established fact that type I.5 inhibitors (with the exception of paradox breakers) clearly promote RAS-dependent RAF dimerization in cells. This distinction is critical and should be explicitly acknowledged when interpreting the in vitro findings.

      We will explicitly make this point in the revised manuscript.

      (5) The title suggests a general mechanism for paradoxical activation across RAF isoforms and inhibitor classes, whereas the data primarily address type I and type II inhibitors acting on isolated kinase-domain monomers. A more accurate framing would avoid the term "general" and confine the conclusions to C-helix-in (type I/II) RAF inhibitors in a reduced biochemical context.

      As noted above, and in our response to Reviewer 3 below, we will clarify the contribution of data in present manuscript to the model and that it is based more broadly on the literature on PA and our insights into RAF structure and regulation. We will also revise the title to avoid the implication that the model arises mainly from the experimental data in the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Tkacik et al. systematically characterized all three RAF kinase isoforms in vitro with all three types of RAF inhibitors (Type I, I1/2, and II) to investigate the mechanism underlying paradoxical activation.

      In this study, the authors reconstituted heterodimers of A-, B-, and C-RAF kinase domains bound to non-phosphorylable MEK1 (SASA), mimicking the monomeric auto-inhibited state of RAF. These "RAF monomers" were tested for MEK phosphorylation with an increasing concentration of all three types of RAF inhibitors (Type I, I1/2, and II). This study is reminiscent of a previous study of the same team measuring RAF kinase activity in the presence of all three types of inhibitors in the context of dimeric RAF isoforms stabilized by 14-3-3 proteins (Tkacik et al 2025 JBC). RAF monomers had little to no activity at low concentrations of inhibitors (consistent with their "monomeric state"). Addition of type I1/2 inhibitor did not induce paradoxical activation as, in this context, they do not induce RAF dimerization required for activation, as observed by MP. Addition of type I and type II inhibitors led to paradoxical activation consistent with the RAF dimerization induced by these inhibitors, as observed by MP. Interestingly, type II inhibitors induced activation only for B- and C-RAF and not A-RAF.

      At high concentrations of type II inhibitors, kinase activity is inhibited with a strong or weak positive cooperativity for BRAF and CRAF, respectively. This observation is very similar to what the authors previously observed with their dimeric RAF system. Interestingly, when the NtA motif is modified by phosphomimetic mutations in A- and C-Raf, basal kinase activity is stronger, but most importantly, inhibitor-induced paradoxical activation is much stronger with both type I and II inhibitors. This demonstrates that mutation of the NtA motif of ARAF and CRAF sensitized them to paradoxical activation by type II inhibitors.

      The authors also tested the effect of ATP in the paradoxical activation observed in their RAF "monomer" system. As previously published in their assay with 14-3-3 stabilized dimeric RAF, the authors observed an expected shift of the IC50 with Type I inhibitors, while Type II inhibitors seem to behave as a non-competitive inhibitor. The authors next reconstituted the MAP kinase pathway (with RAF monomers at the top of the phosphorylation cascade) to test paradoxical activation amplification. Again, Type I1/2 inhibitors did not induce paradoxical activation, while Type I and II inhibitors did. The authors tested the inhibitors with FL auto-inhibited RAF/MEK/14-3-3 complexes, where, contrary to the "RAF monomers" experiments, FL B- and C-RAF were not paradoxically activated but were inhibited by all three types of inhibitors.

      Overall, Tkacik et al. tackle an important question in the field for which definitive experiments and thorough biochemical investigation to understand the molecular mechanisms for the inhibitor-induced paradoxical activation are still missing, and of high importance for future drug development.

      Strengths:

      The biochemical experiments here are rigorously executed, and the results obtained are highly informative in the field to decipher the intricate mechanisms of RAF activation and inhibitor-induced paradoxical activation.

      Weaknesses:

      The interpretation of the results in the context of the current state of the art is ambiguous and raises questions about the relevance of introducing a new model for inhibitor-induced paradoxical activation, particularly since the findings presented here do not clearly contradict established paradigms. I believe some clarification and precision are required.

      While our model does not conflict with established paradigms (because it can allow for negative cooperativity) our experimental findings (here and in Tkacik et al JBC 2025) are in conflict with the negative allostery model. We will work to clarify this in the revised manuscript.

      Main comments:

      (1) Figure 2:

      The authors comment on the expected greater increase (for a cascade assay) in the magnitude of ERK phosphorylation compared to what was observed for MEK phosphorylation. However, this observation might be reflective of the stoichiometries used in the assay, with 40 times more MEK compared to RAF concentration (250nm vs 6nM), which might favour pERK vs pMEK.

      The authors should clarify their rationale for the protein concentration used in this assay and explain how protein stoichiometry was taken into account for the interpretation of their results.

      The Reviewer makes a good point, the concentrations and ratios chosen are expected to make a substantial difference in observed amplification. We intended this experiment more as a qualitative demonstration of cascade amplification and will clarify this in the revised manuscript.

      In addition, the authors should justify comparing pMEK and pERK TR-FRET values when different anti-phospho antibodies were used. Antibodies may have distinct binding affinities for their epitopes. Could this not lead to differences in FRET signal amplitudes that complicate direct comparison?

      Also a good point, we will note this limitation in the revised manuscript.

      (2) Supplementary Figure 2:

      The author mentioned that the inhibitors did not activate the FL auto-inhibited RAF complexes; however, they did inhibit the TR-FRET signal.

      Can the authors comment on the origin of the observed basal activity? Would the authors expect self-release of the RAF kinase protein from the auto-inhibited state in the absence of RAS, leading to dimerization and activation? Alternatively, do the inhibitors at low-concentration relieve the auto-inhibited state, thereby driving dimerization and activation?

      We think that the baseline activity that is being inhibited is due to low concentrations of active dimer in our autoinhibited state preparations.

      Did the author test the addition of RAS protein in their in vitro system to determine whether "soluble" RAS is sufficient to release the protective interactions with RBD/CRD/14-3-3 and lead to inhibitor-induced paradoxical activation of FL RAF?

      We did not, but we’ve thought about it. We expect that soluble RAS would not be activating. We have previously carried our extensive studies of BRAF activation by soluble vs. farnesylated RAS in a membrane environment (liposomes) and observed partial activation in the latter (Park et al, Nature Communications 2023).

      (3) Figure 5B:

      The authors said that the Kd values obtained from their MP assay are consistent with prior studies of RAF homodimerization and RAF:MEK heterodimerization. While this is true from the previous studies of RAF:MEK interaction by BLI (performed from the same team), the Kd of isolated RAF kinase homodimerization has been measured around ~30µM by AUC in the cited ref (24,27 & 37).

      The authors should discuss the discrepancy between their Kd of homodimerization and the reported Kd values in the literature. At the concentration used for MP, it is surprising to observe RAF dimerization while the Kd of homodimerization has been measured at ~30µM (in the absence of MEK).

      We will cite/discuss these differences in our revised manuscript.

      Would the authors expect the presence of MEK to influence the homodimerization affinity for the isolated KD?

      Perhaps, but likely only modestly. We do not think this explains the discrepancy noted above.

      (4) Conclusions:

      Several times in the introduction and the conclusion, the authors suggest that the negative allostery model (where "inhibitor binding to one protomer of the dimer promotes an active but inhibitor-resistant conformation in the other") is a model that applies to all types of RAF inhibitors (I, I1/2, and II).

      However, from my understanding and all the references cited by the authors, this model only applies to type I1/2 inhibitors, where indeed the aC IN conformation in the second (inhibitor-free) protomer of the RAF dimer might be incompatible with the type I1/2 inhibitors inducing aC OUT conformation. The type I and type II inhibitors are aC IN inhibitors and are expected to bind both protomers from RAF dimers with similar affinities. Therefore, the negative allostery model does not apply to the type I and type II inhibitors. The difference in the mechanism of action of inhibitors is even used to explain the difference in the concentration range in which inhibitor-induced activation is observed in cells. The description of the state of the art in this study is confusing and does not help to properly understand their argumentation to revise the established model for paradoxical RAF activation.

      We will work to clarify these complicated issues in the revised manuscript. While the reviewer is correct that the negative allostery model was developed in the context of Type 1.5 inhibitors, there are many examples in the literature of it being used to explain PA by type I and type II inhibitors as well.

      Can the authors clarify their analysis of the state of the art on the different mechanisms of action for the paradoxical activation of RAF by the different types of RAF inhibitors?

      We’ll try!

      5) Conclusions:

      "Our results suggest that negative allostery (or negative cooperativity) is not a requisite feature of paradoxical activation. The type I and type II inhibitors studied here induce RAF dimers and exhibit paradoxical activation but do so without evidence of negative cooperativity, nor do they appear to inhibit intentionally engineered RAF dimers with negative cooperativity (25). Indeed, type II inhibitors exhibit apparent positive cooperativity while type I inhibitors are non-cooperative inhibitors of RAF dimers (25)."

      Can the authors explain how results on the paradoxical activation induced by type I and type II inhibitors inform or challenge a model that specifically applies to type I1/2 inhibitors?

      As noted above, the negative allostery model has also been widely applied irrespective of inhibitor type (rightly or wrongly). Essentially any review or discussion of the topic will explain in one way or another how inhibitor binding to one side of a dimer leaves the opposite side active but resistant to inhibitor. Our model is agnostic with respect to cooperativity of inhibition – essentially we are pointing out a simple circumstance that seems to have been lost in the focus on negative allostery. Paradoxical activation is a result of drug action on RAF monomers, while inhibition is a result of drug action on RAF dimers. Because these are distinct molecular species/complexes, they can be expected to differ in their affinity for RAF inhibitors, irrespective of type. Because binding of ATP in the active site of RAF monomers stabilizes the inactive monomeric state, displacing ATP can promote activation/dimerization. For any inhibitor that is more potent at displacing ATP from a monomer that from an active dimer, we could expect to observe a window of paradoxical activation.

      The authors often refer to their previous study (reference 25), where they tested the inhibition of all three types of inhibitors with engineered RAF dimers. While I agree with the authors that in reference 25 the Type I and type II inhibitors inhibit RAF dimers without exhibiting negative cooperativity (as expected from the literature and the current model), the authors did observe some negative cooperativity for Type I1/2 inhibitors in their study most particularly for the type I1/2 PB (with hill slope ranging from -0.4 to -0.9, indicative of negative cooperativity).

      Correct! Although we do note the caveat that weak inhibition can also give rise to apparent negative cooperativity.

      While the observations that type II inhibitors display positive cooperativity is both novel and very interesting, from what I understand the results from thakick et al 2025 and the current study appear more in line with the current paradigm in the field (which describe paradoxical activation with negative cooperativity for type I1/2 inhibitors and no negative cooperativity for the Type I and II inhibitors) rather than disapproving of the current model and supporting for a new model. 

      In this context, can the authors clarify how their results challenge the current model for paradoxical activation?

      While the difference in binding modes and structural effects of type I.5 vs type I and type II inhibitors are well known in the field, we do not know of any work that suggests paradoxical activation arises from anything other than negative allostery. As one example to the contrary, Rasmussen et al. observe allosteric coupling asymmetry in binding of type II inhibitors to BRAF and attribute the observed paradoxical activation to “induction of dimers with one inhibited and one catalytically active subunit” (Rasmussen et al., Elife 2024). They also studied type I inhibitors in this work, but did not observe paradoxical activation.

      (6) Conclusions:

      The authors describe the JAB34 experiment from Poulikakos et al. 2010 to conclude that "While this experiment cleanly demonstrates inhibitor-induced transactivation of RAF dimers, it is important to recognize that the differential inhibitor sensitivity of the two subunits in this experiment is artificial - it is engineered rather than induced by inhibitor binding as the negative allostery model proposes."

      Indeed, the JAB34 experiment demonstrated the inhibitor-induced transactivation, but the Poulikakos et al. 2010 study does not discuss differential inhibitor sensitivity. The negative allostery model was proposed later by poulikakos team in other papers (Yao et al 2015 and Karoulia et al, 2016), in which JAB34 was not used.

      Can the authors clarify how the JAB34 experiments question differential inhibitor sensitivity?

      Good point, we neglected to discuss the Yao and Karoulia papers and will do so in our revised manuscript.

      (7) Conclusions:

      "Considering that the conformation required for binding of type I.5 inhibitors destabilizes RAF dimers, it is unclear how an inhibitor binding to one protomer would be able to transmit an allosteric change to the opposite protomer, if that inhibitor's binding causes the existing dimer to dissociate."

      The authors should comment on whether 14-3-3 proteins might overcome negative regulation by type I1/2 inhibitors, similar to what has been shown for ATP, which acts as a dimer breaker like type I1/2 inhibitors.

      Certainly we expect that they will, and we will discuss this in our revised manuscript.

      (8) Conclusions:

      "Furthermore, the complex effects of type I.5 inhibitors on dimer stability and the clear resistance of active RAF dimers to these inhibitors complicates interpretation of inhibition data - weak or incomplete inhibition of an enzyme can be difficult to discern from true negative cooperativity (43). As we discuss below, the clear resistance of RAF dimers to type I.5 inhibitors is alone sufficient to explain their ineffective inhibition during paradoxical activation, without invoking negative allostery." 

      The authors should explain how they reconcile this statement and their proposal of a new model that does not rely on negative allostery with their previous findings showing negative cooperativity for RAF dimer inhibition with type I1/2 inhibitors.

      As discussed above and in responses to other Reviewers, we do not exclude negative cooperativity for Type I.5 inhibitors. That said, we are skeptical, even in light of our own findings of apparent negative cooperativity by type 1.5 compounds, due in part to the caveats the reviewer highlights above.

      (9) Conclusions:

      Here, the authors propose a new universal model to explain paradoxical activation of RAF by all types of RAF inhibitors:

      " Our findings here, in light of structural studies of RAF complexes and prior cellular investigations of paradoxical activation, lead us to a model for paradoxical activation that does not rely on negative allostery and is consistent with activation by diverse inhibitor classes. In this model, the open monomer complex is the target of inhibitor-induced paradoxical activation (Figure 6). Binding of ATP to the RAF active site stabilizes the inactive conformation of the open monomer, which disfavors dimerization. Displacement of ATP by an ATP-competitive inhibitor, irrespective of class, alters the relative N- and C-lobe orientations of the kinase to promote dimerization (30, 35). Once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK."

      From my understanding, the novelty of this new model is twofold: a) the open monomer is the target of the inhibitor-induced paradoxical activation and b) once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK.

      Novelty a) implies, as the authors stated, that "Inhibitor-induced activation and inhibition act on distinct species - activation on the open monomer and inhibition on the 14-3-3-stabilized dimer". The authors should explain what they mean by "activation of the open monomer", while only RAF dimers are catalytically active (except for BRAF V600E mutant)?

      We will clarify – by activation we mean promoting conversion of the open monomer to a dimer.

      For novelty b), the authors should explain more clearly what experimental results support this new model.

      We will more explicitly detail how our results here as well as prior work in the field support this model.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Interestingly, the observed rearrangements induced by Zn<sup>2+</sup> were not limited to the protein region proximal to the extracellular binding site but extended to the intracellular side of the channel. This finding agrees with previous studies showing that some extracellular H<sub>v</sub>1 inhibitors, such as Zn<sup>2+</sup> or AGAP/W38F, can cause long-range structural changes propagating to the intracellular vestibule of the channel (De La Rosa et al. J. Gen. Physiol. 2018, and Tang et al. Brit J. Pharm 2020). The authors should consider adding these references.

      We added the suggested references to the Results section.

      Since one of the main goals of this work was to validate Acd incorporation and the spectral FRET analysis approach to detect conformational changes in hHv1 in preparation for future studies, the authors should consider removing one subunit from their dimer model, recalculating FRET efficiencies for the monomer, and comparing the predicted values to the experimental FRET data. This comparison could support the idea that the reported FRET measurements can inform not only on intrasubunit structural features but also on subunit organization.

      We calculated the predicted intrasubunit FRET efficiency and presented the results in the new Figure S10. Pearson’s coefficient decreased from 0.48 for the dimer to 0.18 for the monomer, suggesting the experimental FRET contains information about subunit organization. This was added to the text.

      Reviewer #2 (Public review):

      (1) Tryptophan and tyrosine exhibit similar quantum yields, but their extinction coefficients differ substantially. Is this difference accounted for in your FRET analysis? Please clarify whether this would result in a stronger weighting of tryptophan compared to tyrosine.

      We accounted for differences in the extinction coefficients of Trp and Tyr in our calculations, which are detailed in the Supplementary Text. The assumptions result in a stronger contribution from Trp than from Tyr.

      (2) Is the fluorescence of acridon-2-ylalanine (Acd) pH-dependent? If so, could local pH variations within the channel environment influence the probe's photophysical properties and affect the measurements?

      The acridone fluorescence, which is the fluorophore in Acd, is not pH-dependent between pH 2 and 9 (Stephen G.S. and Sturgeon R.J. Analytica Chimica Acta. 1977). This was added to the text.

      (3) Several constructs (e.g., K125Tag, Y134Tag, I217Tag, and Q233Tag) display two bands on SDS-PAGE rather than a single band. Could this indicate incomplete translation or premature termination at the introduced tag site? Please clarify.

      Yes, the additional bands in the WB are due to the termination of translation for the mentioned protein constructs. We added a note in the legend of Figure 2 regarding this point.

      (4) In Figure 5F, the comparison between predicted FRET values and experimentally determined ratio values appears largely uninformative. The discussion on page 9 suggests either an inaccurate structural model or insufficient quantification of protein dynamics. If the underlying cause cannot be distinguished, how do the authors propose to improve the structural model of hHv1 or better describe its conformational dynamics?

      We understand the confusion about this point. We are not planning to improve the structural model with FRET between Trp/Tyr and Acd. We modified the text to avoid confusion regarding this point. We plan to use Acd as a transition metal ion FRET (tmFRET) donor to study the conformational dynamics of hH<sub>v</sub>1 in the future (Discussion). 

      (5) Cu<sup>2+</sup>, Ru<sup>2+</sup>, and Ni<sup>2+</sup> are presented as suitable FRET acceptors for Acd. Would Zn<sup>2+</sup> also be expected to function as an acceptor in this context? If so, could structural information be derived from zinc binding independently of Trp/Tyr?

      Transition metal ion FRET (tmFRET) uses a fluorophore as the donor and a transition metal ion chelator as the acceptor. For FRET to occur between these donor-acceptor pairs, the fluorescence spectrum of the donor must overlap the absorption spectrum of the metal ion (Zagotta et al., eLife. 2021; Zagotta et al., Biophys J. 2024; Gordon et al., Biophys J. 2024). Zn<sup>2+</sup> does not absorb visible light, so tmFRET cannot occur for this divalent metal.

      (6) The investigated structure is most likely dimeric. Previous studies report that zinc stabilizes interactions between hHv1 monomers more strongly than in the native dimeric state. Could this provide an explanation for the observed zinc-dependent effects? Additionally, do the detergent micelles used in this study predominantly contain monomers or dimers?

      Our full-length hH<sub>v</sub>1 in Anz3-12 detergent micelles is predominantly a dimer, as demonstrated in the new panel of Figure S5. From our data, we cannot compare the effects of zinc between monomers and dimers.

      (7) hHv1 normally inserts into a phospholipid bilayer, as used in the reconstitution experiments. In contrast, detergent micelles may form monolayers rather than bilayers. Could the authors clarify the nature of the micelles used and discuss whether the protein is expected to adopt the same fold in a monolayer environment as in a bilayer?

      We used Anzergent 3-12 detergent micelles, which stabilize hH<sub>v</sub>1 in solution. We indicated this in the Results and Materials and Methods sections. We are also intrigued by whether protein folding and conformational dynamics differ between detergent micelles and proteoliposomes, but our data do not provide an answer to this question. We found that the proteoliposomes used for measuring the hH<sub>v</sub>1 function don’t have enough Acd signals to record their spectra, preventing us from performing the same FRET measurements between Trp/Tyr and Acd in liposomes. Still, detergent-solubilized hH<sub>v</sub>1 is functional upon reconstitution, demonstrating that its functional folding is not irreversibly altered in micelles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) On page 9, the reference to Figure S11 should be corrected to Figure S10.

      We thank the reviewer for catching this mistake. It was corrected in the updated version.

      (2) On page 9, multiple prior studies describing zinc binding to hHv1 should be acknowledged, for example:

      Musset et al. (2010), J. Physiol., 588, 1435-1449;

      Jardin et al. (2020), Biophys. J., 118, 1221-1233.

      References were added to the text.

      (3) On page 11, the statement "with Acd incorporated ... we can interrogate its gating mechanism in unprecedented detail" appears overly strong relative to the data presented. Another phrasing might be appropriate.

      The sentence was changed. It now reads: “With Acd incorporated at multiple sites in full-length hH<sub>v</sub>1, it will be possible to interrogate conformational changes across the protein’s different structural domains using Acd as a tmFRET donor to understand its molecular mechanisms.”

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While the authors have proved their hypothesis by temporally increasing the activity of cholinergic neurons at different life stages through the auxin-inducible degron system, their work raises two major concerns. First, they might want to discuss the conflicting data from Zullo et al (Nature 2019, vol 574, pp 359-364). For example, the authors show that increasing the activity of acr-2-expressing neurons after the 7th day of adulthood increases lifespan. However, Zullo et al (2019) show that the reciprocal experiment, inhibiting cholinergic neuron activity on the 1st day or the 8th day of adulthood, also increases lifespan. Is this because the two studies are using different promoters, that of the acr-2 ACh receptor (this work) versus that of the unc-17 vesicular ACh transporter (Zullo et al., 2019)? The two genes are expressed in different subsets of cells that do not completely overlap. CeNGEN shows that acr-2 is expressed in motor and non-motor neurons, but some of these neurons are also different from those that express unc-17. Is it possible that different cholinergic neurons also have opposite lifespan effects during adulthood? Or is it because both lack of signaling and hypersignaling can lead to a long-life phenotype? Leinwand et al (eLife 2015, vol 4, e10181) previously suggested that disturbing the balance in neurotransmission alone can extend lifespan. A simple discussion of these possibilities in the Discussion section is likely sufficient. Or can the auxin treatment and removal be confounding factors? Loose and Ghazi (Biol Open 2021, vol 10, bio058703) show that auxin IAA alone can affect lifespan and that this effect can depend on the time the animal is exposed to the auxin.

      We thank the reviewer for the thoughtful comments and valuable suggestions. In response, we have expanded the Discussion section to address the points raised, as detailed below.

      We fully agree with the reviewer that the different results between our study (activating acr-2-expressing neurons) and Zullo et al. (inhibiting unc-17- expressing neurons) are most likely due to the distinct cholinergic neurons targeted. Our new preliminary data further support this neuron-specific model, as inhibition of acetylcholine synthesis at mid-late life stages produces opposing lifespan effects in different cholinergic neurons. At the same time, we cannot rule out the alternative possibility raised by the reviewer (eLife, 2015) that both activation and inhibition of neuronal activity may extend lifespan by similarly disrupting the balance of neurotransmission. This hypothesis requires further experimental validation in the context of cholinergic motor neurons. Regarding the potential technical concern related to auxin exposure (Biol Open, 2021), our control experiments using 0.5 mM auxin did not show non-specific lifespan effects.

      Accordingly, in the revised manuscript, we have discussed the first two possibilities in the Discussion by stating (page 17-18): “Nevertheless, it is still unclear whether other neuronal populations share similar temporal regulatory mechanisms. A previous study reported that inhibiting cholinergic neurons activity (using unc-17 promoter) extends lifespan regardless of timing[2], which is different from the temporal lifespan regulation we observed in cholinergic motor neurons (using acr-2 promoter). This discrepancy is likely due to differences in subsets of neurons, as the unc-17 promoter labels a broad repertoire of cholinergic neurons, while the acr-2 promoter mainly marks cholinergic motor neurons[53]. Thus, the distinct lifespan-modulating effects of cholinergic motor neurons may be overshadowed by opposing contributions from other cholinergic subtypes when a mixed population is manipulated. Alternatively, both activation and inhibition of cholinergic activity may perturb neurotransmission balance, leading to similar effects on lifespan[54]. It will be interesting to test these hypotheses in future studies.”

      Second, the daf-16-dependence of the early longevity-inhibiting effect of ACh signaling needs clarification and further experimentation. The authors present a model in Figure 6D, where DAF-16 inhibits longevity. This contradicts published literature. Libina et al (Cell 2003, vol 115, pp 489-502) have shown that intestinal DAF-16 increases lifespan. From the authors' data, it is possible that ACh signaling inhibits DAF-16, not promotes it as they have drawn in Figure 6D.

      We thank the reviewer for this important point. We agree that intestinal DAF-16 promotes longevity. Our original model Figure 6D aimed to show that the larval pathway shortens lifespan by inhibiting DAF-16, not that DAF-16 itself shortens lifespan. The arrowhead style used in the original Fiugure 6D might have given an impression that DAF-16 shortens lifespan. Our apologies. We have now fixed this error in Figure 6D. In addition, as suggested, we have performed additional daf-16 experiments (see below).

      In Figure 3F, the authors used Pacr-2::TeTx, which inhibits cholinergic neuron activity, to show an increase in the expression of DAF-16 targets. Why did the authors not use the worms that express the transgene Pacr-2::syntaxin(T254I), which increases cholinergic neuron activity? What happens to the expression of DAF-16 targets in these animals? Do their expression go down? What happens if intestinal daf-16 is knocked down in animals with increased cholinergic neuron activity, instead of reduced cholinergic neuron activity?”

      Thanks for these insightful questions. In Figure 3F-H, we used TeTx instead of syntaxin(T254I) to investigate the function of DAF-16 in the early stage pathway based on the two main reasons. First, Pacr-2::TeTx transgene extends lifespan in early life by inhibiting cholinergic activity, which provides a genetic background complementary to that of syntaxin(T254I) for characterizing the role of DAF-16. Second, TeTx pathway is expected to activate DAF-16 and upregulate its target genes. This approach is more sensitive than measuring gene downregulation in Pacr-2::syntaxin(T254I) transgenic worms.

      We fully agree with the reviewer that performing the corresponding experiments in the syntaxin(T254I) background would strengthen the overall evidence. As suggested, we have now examined the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, and performed intestine-specific RNAi of daf-16 in the same background. We found that these worms exhibit downregulation of DAF-16 target genes. Furthermore, intestinal daf-16 knockdown did not further shorten the already reduced lifespan of these transgenic worms. Together, these results from both the TeTx and syntaxin(T254I) lines confirms that cholinergic motor neurons require DAF-16 in the intestine to regulate lifespan. These new data has now been described in Figure S5A-5D (page 11-12): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons.”, and “RNAi of daf-16 in the intestine abolished the ability of cholinergic motor neurons to regulate lifespan at early life stage (Figure 3G, 3H and Figure S5C-S5E).”

      Recommendations for The Authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) “The Methods section needs to be clarified/expanded.”

      (a) “For example, are the authors using indole-3-acetic acid or a synthetic auxin? How long does it take for syntaxin to be made after the removal of the auxin?”

      We have now included auxin information and recovery time in the Method for auxin treatment by stating (page 24): “natural auxin indole-3-acetic acid (G&K Scientific)”, and “Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (b) “How much FUDR was used in some of the lifespan assays?”

      2 μg/mL FUDR was used in some of the lifespan assays. We have now included the concentration in the Method for lifespan assay by stating (page 23 line 526): “2 μg/mL 5-Fluoro-2’-deoxyuridine (FUDR) was included in assays involving TeTx transgene worms, unc-31 and unc-17 mutant worms, which show a defect in egg laying.”

      (c) “In line 494 of the Methods section, worms were anesthetized with 50 mM sodium azide. That concentration seems a bit high.”

      It is an error indeed. We used 5 mM NaN3. This has now been fixed in the text and in line 548.

      (d) “What are the concentrations of the transgenes used in the extrachromosomal arrays?”

      We have now included the concentrations in the Method for strains and genetics by stating (line 507-509 on page 22): “Microinjections were performed using standard protocols. Each plasmid DNA listed above in the transgenic line was injected at a concentration of 50 ng/μL. Each marker for RNAi was co-injected at a concentration of 25 ng/μL.”

      (2) “Gene expression can vary in different parts of the worm intestine. Do the measurements in Figure 6C represent the entire intestine or only certain parts of the intestine?”

      We have now included the intestine area used for quantification in the Method for microscopy by stating (page 24): “and the entire intestine area was selected by ImageJ”, and in the legends of Figure 6C by stating (page 36): “The entire intestinal area was selected for measurement.”

      (3) “In Figure S1C, does tph-1 have a slight effect? Might serotonin partly counteract the effects of ACh?”

      We thank the reviewer for raising this interesting point regarding the potential role of serotonin. We have re-examined our data in Figure S2C (the original Figure S1C) and agree that loss of tph-1 partly counteracted the lifespan-shortening effect of Pacr-2::syntaxin(T254I) transgene in early life stage, thought the whole-life suppression effect is slight. To assess whether the acr-2 promoter-driven manipulation might directly affect serotonergic neurons, we checked the CeNGen. We found that the transcript expression of acr-2 can be detected in serotonergic neurons (ADF, HSN, and NSM), but the levels are extremely low. In this regard, it is unlikely that the Pacr-2::syntaxin(T254I) transgene exerts its primary effect by substantially altering serotonin release. While a potential indirect interaction between cholinergic and serotonergic signaling in lifespan regulation remains, it falls beyond the primary focus of the current study. We would like to follow up this in future studies. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4), GABA (unc-25), serotonin (tph-1), dopamine (cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (4) “Where else is GAR-2 expressed? Might there be redundancies between neuronal and intestinal GAR-2?”

      We appreciate this insightful question. Based on available single-cell gene expression atlases of C. elegans at both embryonic and adult stages[1,2], gar-2 expression has been detected not only in neurons and the intestine, but also in additional tissues such as the muscle. Regarding the observed lack of effects upon neuronal or intestinal gar-2 RNAi on the ability of cholinergic motor neurons to extend lifespan in mid-late life, and also suggested by another reviewer, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 in the muscle will be further investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-2 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had an effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (1) Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, doi:10.1126/science.aax1971 (2019).

      (2) Roux, A. E. et al. Individual cell types in C. elegans age differently and activate distinct cell-protective responses. Cell Rep 42, 112902, doi:10.1016/j.celrep.2023.112902 (2023).

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (4) Izquierdo, P. G. et al. Cholinergic signaling at the body wall neuromuscular junction distally inhibits feeding behavior in Caenorhabditis elegans. J Biol Chem 298, 101466, doi:10.1016/j.jbc.2021.101466 (2022).

      (5) “In line 344, please correct "fwork" to "work".”

      This has now been fixed.

      (6) “In line 360, please correct "acts" to "act".”

      This has now been fixed.

      (7) “Please check citations within the main text. Some of the citations do not fit the cited material. For example, in line 112, reference 28 is not about GABAergic neurons.”

      We thank the reviewer for pointing out these important details. We have now carefully checked and corrected the citations throughout the manuscript as suggested.

      Reviewer #2 (Recommendations for The Authors):

      (1) “How are the authors assessing the efficacy of the TeTx manipulations in their strains? Likely TeTx has a concentration-dependent effect. Are there any phenotypes associated with the loss of cholinergic signaling? Also, does TeTx expression in cholinergic neurons alter the neuronal activity of other associated neurons, or alter muscle integrity?”

      Thanks for the question. Our observations show that overexpression of TeTx results in defects including small size, slow growth, egg-laying deficiencies, and severe locomotion impairment, which are all associated with the loss of cholinergic signaling. While we did not directly examine the activity of interconnected neurons in our strains, we tested the muscle integrity by recording muscle reaction to 1 mM levamisole and found that overexpression of TeTx does not affect muscle integrity. To circumvent these pleiotropic complications, we instead employed Syntaxin(T254I) transgenic worms, which exhibits only slight locomotion defects, to further characterize the temporal effect of cholinergic motor neurons on lifespan. This data has now been described in Figure S1A by stating (page 6): “Overexpression of TeTx induces characteristic phenotypes of cholinergic deficiency, such as developmental delay and severe locomotion impairment[32], yet does not compromise muscle function (Figure S1A).”

      (2) “The authors are expressing TeTx throughout the lifespan of the animal, including during development. How does this contribute to the organismal phenotype?”

      As described above, chronic TeTx expression from egg stage results in developmental delay, which is similar to the development phenotype of unc-17 mutant worms defective in acetylcholine transmission. However, unc-17 mutation has no effect on lifespan[3], which is different from TeTx overexpression, indicating that the developmental delay caused by TeTx overexpression may not affect the lifespan phenotype.

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (3) “A previous study has shown that increasing cholinergic activity by altering ACR-2 expression can cause neurodegeneration (DOI: https://doi.org/10.1523/JNEUROSCI.1515-10.2010). Does overexpressing syntaxin, or AID-mediated degradation of syntaxin cause motor neuron degeneration, which could also contribute to the lifespan phenotype?”

      We thank the reviewer for raising this important point regarding potential motor neuron degeneration. In response, we performed confocal microscopy to assess the motor neurons. We found that worms expressing the transgene Pacr-2::syntaxin::mCherry do not exhibit a defect in the number or morphology of labeled neuronal cell bodies compared to control worms expressing Pacr-2::mCherry. This observation indicates that chronic, increased cholinergic activity through syntaxin overexpression, under our experimental conditions, does not induce motor neuron degeneration. This data has now been described in Figure S1B by stating (page 7): “This transgene simply shortened lifespan without causing a pleotropic effect (Figure 1B), and critically, without inducing motor neuron degeneration (Figure S1B).”

      (4) “Figures 1I-1L: The authors do not show how long it takes for the expression of syntaxin to be restored following the removal of auxin from plates. This would be important to assess the age-dependent effects of neuronal signaling.”

      We thank the reviewer for pointing this out. In general, complete restoration of syntaxin expression occurred within 24 hours after auxin withdrawal. We have now pointed this out in the text by stating (the last sentence on page 24):“Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (5) “In Figures S1A-E: Although the mutant backgrounds decrease the lifespan of animals expressing the Pacr2::syntaxin(T254I) transgene, the lifespan of these transgenic animals appears to be extended compared to what was shown in Figure 1B. Is this the case? (can these experiments be repeated alongside wild-type N2s to assess if their lifespan is indeed extended compared to the N2?). Also, if so, could it be that the lifespan effects are modified to different extents by other small neurotransmitters?”

      We thank the reviewer for pointing this out. All the experiments presented in current Figure S2 (original Figure S1) were performed with wild-type N2 controls, which are now included in the updated Figure S2. This data shows that, in the Pacr-2::syntaxin(T254I) transgenic background, loss of unc-25 (GABA) or tph-1 (serotonin) leads to a further extension of lifespan, while loss of other genes had no effect. Importantly, while unc-25 mutation also extends lifespan in wild-type worms, tph-1 mutation does not. This observation indicates that the lifespan effects of cholinergic signaling can be modulated by serotonin. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4),, GABA (unc-25), serotonin (tph-1), dopamine ,(cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (6) “RNAi of several of the receptors appear to modulate wild-type lifespan. Although I understand that this is not the main focus of the manuscript, the fact that this occurs should be mentioned in the results and discussed later on.”

      We thank the reviewer for pointing this out. As suggested by the reviewer, we have now pointed this out in the text by stating (page 9):“Notably, RNAi of several ACh receptors such as acr-11 appears to shorten wild-type lifespan, whereas RNAi of several other ACh receptors such as acr-9 extends wild-type lifespan, suggesting lifespan-modulating potential of ACh receptors (Figure S3).”

      (7) “Cholinergic signaling and ACR-6 have been previously shown to regulate pharyngeal pumping/feeding behavior. (https://doi.org/10.1016/j.jbc.2021.10146”). Could the requirements for ACR-6/cholinergic signaling in longevity be related to caloric restriction/nutritional intake which in turn could be expected to alter DAF-16 and HSF-1 activity? These previous studies should be referenced and discussed.”

      Thanks for the suggestion. As suggested by the reviewer, we have examined the pumping rate of acr-6 mutant worms. Our results showed that acr-6 mutation slightly reduced the pumping rate. As the decrease is relatively minor, we do not expect a major DR effect, though we cannot completely rule out such a possibility. Furthermore, as acr-6 acts in the pharynx to regulate pumping but in the intestine to regulate the role of cholinergic signaling in lifespan, we do not expect this would have a major contribution to our pathway. This new data has now been described in Figure S4I. As suggested by the reviewer, we have now pointed this out in the text by stating (page 10): Previous data has shown that cholinergic signaling and ACR-6 may control pharyngeal pumping[42]. As expected, we found that acr-6 mutation slightly reduced pumping rates (Figure S4G).”

      (8) “The expectation for the studies in Figure 3/DAF-16, is that animals expressing Ex[Pacr-2::syntaxin(T254I)], should have downregulated DAF-16 in the intestine. This needs to be shown through some method (increased daf-16 activation upon loss of cholinergic signaling does not necessarily imply that the converse is also true).”

      We thank the reviewer for the insightful suggestion. The reviewer has suggested us performing additional measurements to confirm that DAF-16 is the downstream transcription factor in the intestine. Specifically, the reviewer suggested testing if syntaxin(T254I) transgene signaling could inhibit DAF-16 activity. We have now followed the reviewer’s suggestion by performing two different assays. First, as also suggested by the first reviewer, we detected the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, which exhibited downregulation of these genes, consistent with the notion that increasing cholinergic motor neuron activity inhibits DAF-16. This data has now been described in Figure S5A. Second, we performed an assay to detect DAF-16 subcellular localization pattern in the intestine. We found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, suggesting that ACR-16 inhibits DAF-16, which is consistent with our model. This new data has now been described in Figure S5E. As suggested by the reviewers, we have now pointed this out in the text by stating (page 11): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons. To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promoted nuclear translocation of DAF-16, confirming that ACh signaling inhibits DAF-16 activity (Figure S5B).”

      (9) “Similarly, it would be good to have additional lines of evidence that signaling through GAR-3 impinges on HSF1, and that the lifespan effects are not due to non-specific effects of hsf-1 knockdown, which could lead to several un-related deficiencies and compromise lifespan (Figure 5b).”

      We thank the reviewer for the valuable suggestions. The reviewer correctly noted that the observed lifespan effect from hsf-1 RNAi could involve non-specific deficiencies. In response, we performed an assay to detect HSF-1 subcellular localization in the intestine upon gar-3 overexpression by using the strain EQ87 (iqIs28[pAH71(hsf-1p::hsf-1::gfp) + pRF4(rol-6)]). We found that the induced nuclear translocation of HSF-1 was weak. This result suggests that GAR-3 may modulate HSF-1 activity through a mechanism distinct from, or more subtle than, robust nuclear accumulation, or that its effect is highly dependent on the expression level and timing.

      (10) “Figure 6: An N2 control should be provided to assess the specificity of the mCherry signal from the intestine (given autofluorescence in the animals' gut).”

      Thanks for the suggestion. As suggested by the reviewer, we have now included the control in Figure S10.

      Reviewer #3 (Recommendations for The Authors):

      (1) “While the model is consistent with the data, there are alternatives that were not addressed. Additionally, there are some deficiencies in the interpretation of results that should be addressed, in my opinion. Possibly most importantly given the claims, the authors should address an alternative model: that it is the level of acetylcholine signaling that matters. Is it possible that the level auxin-inducible degradation of syntaxin(T254I) in acr-2 expressing cells is age dependent, such that one level increases lifespan and the other shortens it, and that the timing doesn't matter at all? A chronic dose response to auxin concentration would address if the level of syntaxin is a non-monotonic determinant of lifespan.”

      We sincerely thank the reviewer for raising this important alternative model. The reviewer suggested that the apparent temporal effect we observed might instead be explained by an age-dependent change in the efficiency of AID system in degrading syntaxin(T254I) in acr-2 expressing cells. That is, different levels of acetylcholine signaling, rather than timing, produce opposite lifespan outcomes. We agree that this is a formal possibility that our current data cannot fully rule out. On the other hand, other data in the manuscript suggests otherwise. For example, the expression of ACR-6 and GAR-3 in the intestine exhibited a temporal switch in early and mid-late life, providing support for a time-dependent mechanism. In addition, the differential requirement of the downstream transcription factors DAF-16 and HSF-1 in the early and mid-late life, respectively, provides further evidence supporting a temporal mechanism. Thus, while we agree that the possibility raised by the reviewer cannot be formally ruled out, the temporal mechanism we proposed may play an important role.

      The reviewer suggested performing a chronic dose-response experiment with varying auxin concentrations. Actually when we first employed the AID system to temporally manipulate motor neuron output at different life stages, we tested potential effects of auxin concentration. Using the soma-expressed TIR1 system, we found that, restoring syntaxin(T254I) activity from day 10 of adulthood extends lifespan, regardless of whether the prior suppression was maintained with 0.1 mM or 0.5 mM auxin. This suggests that the pro-longevity effect is likely not triggered by differences in the efficacy of prior suppression within this concentration range. We acknowledge that the tested dose range may not cover potential threshold concentrations. Furthermore, we cannot exclude the possibility of a non-linear relationship between auxin concentration and degradation efficiency. We agree that a comprehensive chronic dose-response analysis remains a valuable future direction, and we plan to employ more precise tools in the future to investigate the interplay between signal level and temporal context in lifespan regulation. The auxin concentration data have now been described in Figure S1C-1D by stating (page 7): “Comparable outcomes were obtained with both 0.1 mM and 0.5 mM auxin treatments (Figure S1C-1D).” As suggested by the reviewer, we have discussed the alternative model in the Discussion by stating (page 19): “An alternative mechanism based on differential levels of cholinergic signaling could also contribute to the observed lifespan effects.”

      (2) “Several times, including in several section headings, it is claimed that daf-16 (eg line 205-206) and acr-6 (eg line 185-186) function "early in life". This was not tested, so the claim is not warranted. For instance, these genes could act later in life to respond to signals made or sent early in life, or they could act both early and late, or only early (as they claim).”

      We thank the reviewer for this precise and important clarification. The reviewer is correct that our genetic interventions do not by themselves define the temporal window.

      Our experimental rationale was based on the observation that the lifespan-shortening effect of Pacr-2::syntaxin(T254I) expression is similar whether it is induced throughout life or specifically during larval stages (early life), indicating the detrimental effect results from enhanced motor neuron output in early life. Therefore, we used the lifelong expression paradigm as a tool to genetically dissect the downstream pathway triggered by early-life neuronal activation. We acknowledge the reviewer's point that this design does not formally prove that daf-16 or acr-6 acts only in early life; they could be required continuously or again later. However, we would like to note that our expression data show that the gut expression of ACR-6 is restricted to early life, which is consistent with a primary early-life function in this context.

      To reflect this more accurate interpretation, we have revised all relevant statements, including section headings. We now consistently state that daf-16 is required for the lifespan-shortening effect of cholinergic motor neuron, rather than claiming it functions "in early life". We have also toned down the discussion regarding their temporal function by stating (page 12): “Because this lifespan-shortening effect results from enhanced motor neuron output in early life and overwrites its beneficial effect at later stages, we propose this signaling circuit mediates the lifespan-shortening effect in early life.”

      (3) “In line 118, they note that such intervention led to a complex effect on the lifespan curve "by initially promoting worm's survival followed by inhibiting it at later stages." I think that while findings from later experiments support a time-dependent lifespan effect stemming from syntaxin function in the cholinergic motor neurons, this experiment's TeTx expression in those neurons is not time-dependent. Lifespan is an endpoint measure, so there is no sense in which a non-timed perturbation has an early or late effect on an individual. Rather, the effect on survival they observed is at the population level, their intervention increases the average lifespan while decreasing the worm-to-worm variation in lifespan.”

      We thank the reviewer for the critical and precise comment regarding our interpretation of the survival curves of TeTx transgenic worms. As suggested by the reviewers, we have revised the text by stating (page 6): “Surprisingly, such intervention led to a complex effect on the population survival curve by reducing both early mortality and the proportion of long-lived individuals (Figure 1A). Specifically, the 25% lifespan of these worms was prolonged, while their 75% and maximal lifespan were slightly shortened, leading to a mean lifespan slightly increased or unchanged compared to that of wild-type worms. This suggests that inhibiting cholinergic motor neurons may exert temporally distinct effects on survival, leading to decreased individual variation in lifespan.”

      (4) “The layout of the plots separating the responses of wild type and mutants to different panels makes it often difficult to interpret the results. For instance, do acr-6, gar-3, and other receptor mutants or knockdowns affect lifespan on their own? If they do, it matters to the interpretation whether they live longer or shorter than the wild type: which of the mutants phenocopy the lack of a lifespan-extending signal that activates them? Which phenocopy lacks a lifespan-shortening signal that activates them? Could they phenocopy the effect of an inhibitory signal? And critically, are the effects of these mutants on lifespan consistent with their model?”

      “The paper would be stronger if they determined when ACR-6 and GAR-3 functions are necessary and sufficient. Is it possible that the receptor doesn't matter, just that there be one of the two expressed in the intestine, and that other mechanisms determine the lifespan response to modulation of syntaxin(T254I)? What does time-dependent knockdown of these receptors do to daf-16 and hsf-1 localization and to the transcription of the targets of these transcription factors?”

      We thank the reviewer for these insightful comments. We have addressed the points as follows:

      As suggested, we have reorganized the lifespan data in Figure S4 to directly compare wild type and mutant/RNAi conditions within the same panels. This new presentation clarifies the autonomous effects of these genes. The data shows that loss of acr-6 or gar-2 (via RNAi or mutation) has minimal effect on lifespan. Notably, acr-8 RNAi shortens lifespan, whereas the acr-8 mutation does not, supporting our hypothesis of tissue-specific or compensatory roles for this receptor, as detailed in our following response to point (5). The reviewer's key question regarding when these receptors are necessary and sufficient is central to our model. We agree with the reviewer that complementary loss-of-function experiments with temporal precision, such as time-specific knockdown of the two receptors, would provide even stronger evidence. To this end, we attempted to generate endogenous degron-tagged alleles of acr-6 and gar-3 to apply the AID system for precise, stage-specific degradation. Unfortunately, despite multiple design attempts and screening efforts, we were unable to obtain homozeygous strains with the desired genomic edits using the same gRNA we used to knock in mCherry or other gRNAs. This is rather frustrating. Consequently, we are currently unable to perform the ideal temporally controlled loss-of-function experiments suggested by the reviewer.

      (5) “Why does RNAi but not mutation of acr-8 and gar-2 suppress the lifespan shortening effect of Pacr-2::syntaxin(T254I)?”

      Thanks for this important question regarding the differential effects of feeding RNAi versus mutation of acr-8 and gar-2. The discrepancy likely arises from the potential off-target effects of RNAi. RNAi is not strictly specific as it may target other related genes, generating a non-specific effect, whereas precise mutations in acr-8 and gar-2 alone may not produce the same effect.

      (6) “sid-1(-); Ex[Pacr-2::tetx lives longer than sid-1(-); in daf-16(+) worms in Figure 3G; so it is very hard to interpret the lack of effect of Pacr-2::tetx in daf-16(-) worms, since this transgene behaves differently in sid-1 mutants than in wild type worms. This would be clear if the two plots were combined (appropriately, since it is the same experiment). It looks like daf-16 RNAi has a shortening effect in the sid-1 mutant, but not in in sid-1 mutants expressing Pacr-2::text.”

      Thanks for this helpful suggestion. As suggested by the reviewer, we have now merged Figure 3G and 3H into one figure to present as Figure S5F. This combined presentation clarifies the comparison and shows that intestinal daf-16 RNAi shortens lifespan in both sid-1 mutants and sid-1 mutants expressing Pacr-2::TeTx.

      Reviewer #4 (Recommendations for The Authors):

      (1) “Lines 50-52: I would replace "leading to increased incidents in age-related diseases and probability of death" with "leading to the onset of age-related diseases and increased probability of death". Instead of "such an aging process" I would use "the aging process".”

      This has now been fixed.

      (2) “Figure 2E-F: By rescuing the expression of ACR-6 in neurons or intestinal cells alone, the authors show that the release of ACh from cholinergic neurons has effects on the intestine to shorten lifespan. Is ACR-6 expressed in other tissues (e.g. muscle?) It might be interesting to assess whether ACh also regulates lifespan through activating the ACR-6 receptor in other tissues or specifically targets the intestine. This question is partially answered with the tissue-specific RNAi experiments for DAF-16, but it is possible that ACR-6 also modulates other pathways beyond the tested transcription factors.”

      Analyzing the role of other tissues could also be applied to understand how GAR-3 influences lifespan. Along these lines, it would be interesting to expand the tissue-specific knockdown experiments for GAR-3 to other tissues. More importantly, these experiments can address whether activation of ACR-6 and GAR-3 can also have different effects on lifespan by regulating distinct tissues in addition to the intestine, and not only due to temporal expression patterns. For instance, whereas DAF-16 regulates lifespan primarily through its effects in the intestine, HSF1 could have effects on additional tissues. Although it would interesting to perform these experiments, I understand that the authors main focus is the nervous system-gut axis.

      We thank the reviewer for the insightful suggestions regarding the potential tissue-specific functions of ACR-6 and GAR-3. As noted in our response to point #6, endogenous expression imaging indicates that ACR-6 and GAR-3 are primarily expressed in neurons and the intestine with weak expression of GAR-3 in the muscle, so we tested the muscle. We found that muscle-specific RNAi of gar-2 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, whereas muscle-specific RNAi of gar-3 does not. This result further supports that GAR-3 primarily exerts this effect in the intestine.

      (3) “Can the authors specify in the corresponding figure legend at what age they tested sod-3 and mtl-1 expression in Pacr-2::TeTx worms (Figure 3F)? This is important to support the conclusions of the paper. Along these lines, can the authors also specify at what age they quantified the expression of HSF-1 targets (Figure 5F).”

      Thanks for the suggestion. As recommended, we have now provided the worm age in Figure 3F (day 1 adult) and Figure 5F legends (day 10 adult).

      (4) “To further strengthen the authors' conclusions, it might be interesting to examine the intracellular localization of DAF-16 in the intestine of Pacr-2::TeTx and syntaxin(T254I) worms compared to controls.”

      We thank the reviewer for this valuable suggestion, which was also raised by another reviewer. In response, we examined the subcellular localization of DAF-16 in the intestine. Direct imaging in the Pacr-2::TeTx or Pacr-2::syntaxin(T254I) backgrounds was technically challenging because their fluorescent protein tags (YFP or mCherry) would interfere with the detection of DAF-16::GFP. Therefore, we adopted an alternative approach by modulating the activity of acr-6, the intestinal acetylcholine receptor that transmits cholinergic signals from motor neurons to DAF-16. We found that acr-6 RNAi promotes the nuclear translocation of DAF-16. These new data are presented in Figure S5E by stating (page 11): “To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, confirming that ACh signaling modulate DAF-16 activity (Figure S5B).”

      (5) “The results with gar-2 RNAi are fascinating. I am very curious (and I assume potential readers too) about what tissues mediate the mid-late life effects of GAR-2 in longevity. Perhaps the authors could add experiments in a couple of other tissues known to regulate organismal lifespan (e.g. muscle). However, I totally understand why the authors focused on GAR-3, especially because both GAR-3 and ACR-6 have effects on the intestine and this is sufficient for the main conclusions of the paper.”

      We sincerely thank the reviewer for the insightful suggestion and for highlighting the potential role of GAR-2. In response, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in the muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 will be investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-3 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (6) “Figure 6: It seems that the genes are also expressed in the muscle. Can the authors include images of other tissues in supplementary figures?”

      Thanks for the suggestion. As suggested by the reviewer, we have now included images of whole worms expressing mCherry, which was knocked in the endogenous locus off gar-3 or acr-6 by CRISPR in Figure S10. However, we did not detect strong expression of gar-3 or acr-6 in the muscle under the conditions examined, which may be limited by the low endogenous protein expression level of the two genes in the muscle, though the CeNGEN website shows they are expressed in the muscle. Determining the precise spatiotemporal expression profiles of these receptors will likely require more sensitive methods. We plan to address this important question in future studies by using such refined approaches.

    1. Author response:

      General Statements

      We thank all three reviewers for their time taken to provide valuable feedback on our manuscript, and for appreciating the quality and usefulness of our data and results presented in our study. We have improved the manuscript based on their suggestions and provide a detailed, point-by-point response below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you for your positive feedback.

      There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms?

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, (Zeller et al., 2024)), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors?

      While we haven’t profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023)(Zeller et al., 2022). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary?

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn’t expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH?

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern?

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication.

      We thank the reviewer for appreciating the quality of our study.

      Major concerns:

      (1) A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay.

      We focussed on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and “latent” developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27me3 demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      (2) The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue.

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly “spreading” and “stable” states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript (“discussion”). However, in response to this and earlier comment, we went back and searched for genes that show H3K27me3 demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      Minors:

      (1) The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them.

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      (2) bT-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show.

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in “Results”.

      (3) It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C.

      We have added the numbers to the corresponding legends.

      (4) Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages.

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      (5) Figure 4C has not been cited or mentioned in the main text. Please check.

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance):

      Strengths:

      This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish.

      Limitations:

      The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited.

      Advance:

      The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting.

      The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc.

      I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc.

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors.

      Major concerns

      (1) Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset.

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      (2) The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R<sup>2</sup> distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off.

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R<sup>2</sup>> values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R<sup>2</sup> > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R<sup>2</sup> estimates based on permutation tests, and select TFs with a cutoff of padj < 0.01. We have updated our supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      (3) Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes.

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn’t include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      (4) The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression.

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      Minor concerns

      (1) Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development.

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      (2) Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided.

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline “scChICflow” to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      (3) Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added.

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on “Data and code availability”.

      (4) Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference.

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

      Author response image 1.

      (1) (top) expression of tbx16, which was one of the common TFs detected in our study and also targeted by Saunders et al by CRISPR. tbx16 expression is restricted to presomitic mesoderm lineage by 12hpf, and is mostly absent from 24hpf cell types. (bottom) shows DE genes detected in different cellular neighborhoods (circled) in tbx16 crispants from 24hpf subset of cells in Saunders et al. None of these DE genes were detected as “direct targets” in our analysis and therefore seem to be downstream effects. (2) Effect of 3 different concentrations of EZH2 inhibitor (GSK123) on global H3K27me3 quantified by flow cytometry using fluorescent coupled antibody (same as we used in T-ChIC) in two replicates. The cells were incubated between 3 and 10 hpf and collected afterwards for this analysis. We observed a small shift in H3K27me3 signal, but it was inconsistent between replicates.

      References

      Chen, Z., Djekidel, M. N., & Zhang, Y. (2021). Distinct dynamics and functions of H2AK119ub1 and H3K27me3 in mouse preimplantation embryos. Nature Genetics, 53(4), 551–563. den Broeder, M. J., Ballangby, J., Kamminga, L. M., Aleström, P., Legler, J., Lindeman, L. C., & Kamstra, J. H. (2020). Inhibition of methyltransferase activity of enhancer of zeste 2 leads to enhanced lipid accumulation and altered chromatin status in zebrafish. Epigenetics & Chromatin, 13(1), 5.

      Hickey, G. J., Wike, C. L., Nie, X., Guo, Y., Tan, M., Murphy, P. J., & Cairns, B. R. (2022). Establishment of developmental gene silencing by ordered polycomb complex recruitment in early zebrafish embryos. eLife, 11, e67738.

      Huang, Y., Yu, S.-H., Zhen, W.-X., Cheng, T., Wang, D., Lin, J.-B., Wu, Y.-H., Wang, Y.-F., Chen, Y., Shu, L.-P., Wang, Y., Sun, X.-J., Zhou, Y., Yang, F., Hsu, C.-H., & Xu, P.-F. (2021). Tanshinone I, a new EZH2 inhibitor restricts normal and malignant hematopoiesis through upregulation of MMP9 and ABCG2. Theranostics, 11(14), 6891–6904.

      Mei, H., Kozuka, C., Hayashi, R., Kumon, M., Koseki, H., & Inoue, A. (2021). H2AK119ub1 guides maternal inheritance and zygotic deposition of H3K27me3 in mouse embryos. Nature Genetics, 53(4), 539–550.

      Rougeot, J., Chrispijn, N. D., Aben, M., Elurbe, D. M., Andralojc, K. M., Murphy, P. J., Jansen, P. W. T. C., Vermeulen, M., Cairns, B. R., & Kamminga, L. M. (2019). Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan. Development (Cambridge, England), 146(19), dev178590.

      San, B., Rougeot, J., Voeltzke, K., van Vegchel, G., Aben, M., Andralojc, K. M., Flik, G., & Kamminga, L. M. (2019). The ezh2(sa1199) mutant zebrafish display no distinct phenotype. PloS One, 14(1), e0210217.

      Saunders, L. M., Srivatsan, S. R., Duran, M., Dorrity, M. W., Ewing, B., Linbo, T. H., Shendure, J., Raible, D. W., Moens, C. B., Kimelman, D., & Trapnell, C. (2023). Embryo-scale reverse genetics at single-cell resolution. Nature, 623(7988), 782–791.

      Vastenhouw, N. L., Zhang, Y., Woods, I. G., Imam, F., Regev, A., Liu, X. S., Rinn, J., & Schier, A. F. (2010). Chromatin signature of embryonic pluripotency is established during genome activation. Nature, 464(7290), 922–926.

      Zeller, P., Blotenburg, M., Bhardwaj, V., de Barbanson, B. A., Salmén, F., & van Oudenaarden, A. (2024). T-ChIC: multi-omic detection of histone modifications and full-length transcriptomes in the same single cell. In bioRxiv (p. 2024.05.09.593364). https://doi.org/10.1101/2024.05.09.593364

      Zeller, P., Yeung, J., Viñas Gaza, H., de Barbanson, B. A., Bhardwaj, V., Florescu, M., van der Linden, R., & van Oudenaarden, A. (2022). Single-cell sortChIC identifies hierarchical chromatin dynamics during hematopoiesis. Nature Genetics. https://doi.org/10.1038/s41588-022-01260-3

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We have addressed this comment with new GLMs. The new GLM1 includes both non-gazeweighted and gaze-weighted regressors and finds that the vmPFC and striatum reflect gazeweighted sampled value, while the preSMA reflects gaze-weighted accumulated value. We have now dropped the old GLM3 and added two other GLMs, one that explicitly interacts accumulated value with accumulated dwell, and the other that considers only partial gaze discounting. These analyses all support the preSMA as encoding gaze-weighted accumulated value.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the preSMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆_S_V| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in preSMA activity builds naturally on established findings. However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

      Recommendations for the authors:

      Editor Comments:

      Reviewer 1 in particular makes a number of suggestions for additional analyses that would help to strengthen the evidence supporting your conclusions.

      We thank the editor and the reviewers for the helpful suggestions for improving our manuscript. We discuss our efforts to address each point below.

      Reviewer #1 (Recommendations for the authors):

      (1) To address my concerns about GLM2, the first thing to do might be to simply show the correlation between the regressors used across the three different models (e.g., as a figure in the methods). Although the authors have done a good job to ensure that AV and SV are decorrelated when including them both in the same model, they haven't shown us whether the regressors used in, for example, GLM2 are correlated/similar to the regressors used in GLM1. This is important information for interpretation.

      Thank you for raising concerns about the overlap between different models. We agree that additional information regarding the correlation among sample-level regressors would aide readers in understanding the differences among the analyses. We now include this information in Figure 7 in the Methods section, as requested. While |SV| was uncorrelated with gaze-weighted |SV| (|SV<sub>Gaze</sub>|; Pearson’s r = 0.002, p = 0.848), lagged |AV| was significantly correlated with lagged, gaze-weighted |AV| (lagged |AV<sub>Gaze</sub>|; r = 0.365, p < 2.2 × 10<sup.-16</sup>).

      (2) The acid test for gaze-modulation of value signals would be to show that the gazemodulated signals explain the fMRI results over and above the non-gaze-modulated signals. This could simply mean including SVgaze and SV (and equivalent terms for AV) within the same GLM. Following from point (1), the authors may point out that these terms are highly correlated - yes, but the GLM will then test for the effects of SVgaze *over and above* the effects of SV. (In fact, although I'd normally caution against orthogonalisation - it would here be totally legitimate to orthogonalise SVgaze w.r.t. SV).

      We appreciate the reviewer’s suggestions for more robust tests of the presence of gaze-weighted signals. For reasons highlighted in our response above, we were initially hesitant to include both types of regressors in the same model due to their significant correlation. However, we now report the results of this analysis in the main text as the new GLM 1. This model incorporates both gaze-weighted and non-gaze-weighted terms. For each contrast we used the same procedures as reported in the main text (family-wise error corrected at p<0.05 and clusterforming thresholds at p<0.005).

      In the vmPFC, we found significant effects of both |∆SV| (peak voxel: x = -14, y = 44, z = -12; t = 3.90, p = 0.0190) and |∆SV<sub>Gaze</sub>| (peak voxel: x = 4, y = 38, z = -4; t= 5.21 p = 0.004), but no effects of |∆AV| or |∆AV<sub>Gaze</sub>|. The striatum also showed a significant correlation with |∆SV<sub>Gaze</sub>| (peak voxel: x = 22, y = 20, z = -10; t = 5.10 p = 0.014), but no other regressors.

      In the pre-SMA, we found a significantly positive relationship with both |∆AV| (peak voxel: x = 4, y = 14, z = 50; t = 4.75 p < 0.001) and |∆AV<sub>Gaze</sub>| (peak voxel: x = 4, y = 18, z = 50; t = 2.98, p = 0.032). In contrast, the dlPFC (x = 40, y = 34, z = 26; t = 6.83, p < 0.001) and IPS (x = 42, y = -50, z = 42; t = 5.16, p \= 0.010) were only correlated with |∆AV|. No other significant contrasts emerged.

      These results provide direct support for the presence of gaze-modulated value signals in the brain, which we now describe in the main text Results section.

      (3) With regards to GLM3, it would help to provide a bit more detail on what the time series looks like for the gaze regressor in this model - is it the entire timeseries of gaze (which presumably shifts back/forth between options multiple times within each trial) which is being convolved with the HRF? This seems different from how gaze is being calculated in GLM2, where it is amalgamated into an 'average gaze difference' within a sample between left/right options, if I understand the text correctly?

      We apologize for the lack of details regarding how we operationalized the gaze regressors in our analyses. You are correct that the gaze regressor was calculated differently in GLM2 and GLM3.

      However, in response to the reviewer’s points above (Major Point 2) and below (Major Point 4, Minor Point 1), we have decided to drop the old GLM3 from the paper while incorporating a revised GLM1 (combining old GLM1 and GLM2) and two new GLMs (see responses to Major Point 4 and Minor Point 1) to provide clearer evidence for gaze modulation of accumulated value in the brain.

      (4) Also, is there not a reason why it isn't more appropriate to interact AV with *previously deployed gaze difference* (accumulated across previous samples) in this model, rather than the current gaze location? The latter seems to rely upon the indirect linkage via the behavioural modelling result, which seems to weaken the claim.

      We thank the reviewer for this suggestion. We agree that our original GLM3 approach was limited because it interacted AV with current binary gaze location, which relies on the indirect behavioral relationship we established (i.e., that current gaze is negatively correlated with accumulated past gaze).

      The original GLM2 (which is now incorporated into the new GLM1) implemented something similar to what the reviewer is suggesting as it used gaze-weighted values accumulated across all previous samples. Specifically, in GLM2, the gaze-weighted accumulated value (AV<sub>gaze</sub>) was calculated as the sum of all previous sampled values, each weighted by the proportion of gaze allocated to each option during that sampling period.

      However, to more directly test whether accumulated evidence signals are modulated by accumulated gaze allocation we have now run an additional analysis (GLM2). In this analysis we have revised the old GLM3 to include additional regressors: ∆SV, lagged ∆AV, current gaze location, accumulated dwell advantage, ∆SV × current gaze location, and lagged ∆AV × accumulated dwell advantage.

      The two new regressors were defined as follows:

      Accumulated dwell advantage: For each sample t, accumulated dwell advantage represents the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      ∆AV × accumulated dwell advantage: The interaction between accumulated values and accumulated dwell advantage, which directly tests whether brain regions encoding accumulated value are modulated by the history of gaze allocation.

      This approach is conceptually similar to old GLM2’s gaze-weighting method, but allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation.

      Here, we found that the pre-SMA showed a positive correlation with the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.10, p = 0.0258). Surprisingly, the striatum also showed a correlation with this term (peak: x = -16, y = 10, z = -6; t = 4.07, p = 0.0176). No other ROIs showed significant relationships.

      This analysis provides additional evidence that pre-SMA encodes accumulated value signals that are modulated by accumulated gaze allocation, without relying on indirect relationships between current and past gaze. We now report these results in the main text as GLM2 as follows:

      “To more directly test whether accumulated evidence signals were modulated by accumulated gaze allocation throughout a trial, we conducted additional, exploratory analyses. Specifically, we ran a GLM that incorporated the following two terms: accumulated dwell advantage and ∆AV × accumulated dwell advantage, in addition to ∆SV, the current gaze location, and ∆SV × current gaze location.

      We calculated accumulated dwell advantage as follows: For each sample t, accumulated dwell advantage is the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      We also included the interaction between accumulated dwell advantage and ∆AV (i.e., signed accumulated evidence). This interaction term is positive when gaze is primarily to the left and left has more value or when gaze is primarily to the right and right has more value. This interaction term directly tests whether brain regions encoding accumulated evidence are modulated by the history of gaze allocation. This approach allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation itself.

      This GLM revealed a positive correlation between pre-SMA activity and the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.01, p = 0.026). Surprisingly, the striatum also showed a correlation with this term (peak voxel: x = -16, y = 10, z = -6; t = 4.07, p = 0.018). Additionally, activity in the dlPFC was positively correlated with ∆SV (peak voxel: x = -36, y = 34, z = 22; t = 3.96, p \= 0.016). No other ROIs showed significant relations.

      This analysis provides additional evidence that the pre-SMA encodes accumulated value signals that are modulated by the history of gaze allocation.”

      Minor

      (1) "In Trial A, the subject looks left 30% of the time and right 70% of the time. In Trial B, the subject looks left 70% of the time and right 30% of the time. In Trial A, the net input value ("drift rate") would be |0.3 ∙ 7 − 0.7 ∙ 3| = 0. In Trial B, the drift rate would be |0.7 ∙ 7 − 0.3 ∙ 3| = 4." I may be missing something, but isn't this consistent with an aDDM with theta=0, rather than theta=0.3-0.5 as is typically found?

      The reviewer raises an important point about our assumptions regarding attentional discounting. We agree that our approach could be problematic as it may assume stronger discounting than has been observed in the literature.

      To address this concern, we calculated drift on a sample-by-sample basis before aggregating to the trial level. Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>)

      γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>),

      where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent fixating left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. This approach preserves the fine-grained temporal dynamics of gazedependent value accumulation that would be lost by calculating gaze proportions only at the trial level.

      Using this sample-level method in a mixed-effects logistic regression predicting choice (left vs. right), we estimated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25). These estimates are somewhat higher than the typical aDDM findings of attentional bias (θ = 0.3–0.5). This may reflect the drawn-out nature of this task relative to prior aDDM tasks.

      Next, we ran a new GLM that incorporated these θ estimates in the sampled value estimates. For this GLM3, we computed θ-weighted sampled-value (|∆_TW_SV|) as:

      TWSV = (G<sub>Left</sub> × (V<sub>Left</sub> – θV<sub>Right</sub>)) – (G_R × (V<sub>Right</sub> – θV<sub>Left</sub>)).

      Similar to GLM1, we computed an accumulated value signal based on the lagged sum of previous samples’ |∆_TW_SV| (i.e., |∆_TW_AV|).

      We found significant positive effects of |∆TW_SV| in the vmPFC (peak voxel: x = -14, y = 44, z = -12; t = 3.57, _p = 0.0270) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.0198), but in no other ROI.

      In contrast, we found significant positive relationships between |∆TW_AV| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, _p = 0.0014), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.0040), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0000). Notably, we also observed a significant relationship between |∆TW_AV| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, _p = 0.0410). No other significant contrasts emerged.

      We now report this additional analysis as GLM3 in the main text, as follows:

      “In our first set of analyses, we implicitly assumed complete discounting of non-fixated information, in contrast with previous studies that have generally found only partial discounting (Krajbich et al., 2010; Sepulveda et al., 2020; Smith & Krajbich, 2019; Westbrook et al., 2020). To verify that our results are robust to inter-subject variability in attentional discounting, we estimated subject-level attentional discounting parameters and then re-estimated our original GLM with new, recalculated gaze-weighted value regressors.

      Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>) γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>), where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent gazing left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. We then ran a mixed-effects logistic regression predicting choice (left vs. right) as a function of β and γ and then calculated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25).

      Next, for the GLM, we computed θ-weighted sampled-value (|∆SV<sub>θ</sub>|) as:

      SV<sub>θ</sub> = (G<sub>Left</sub> × (V<sub>Left</sub> − _θ_V<sub>Right</sub>)) – (G<sub>Right</sub> × (V<sub>Right</sub> − _θ_V<sub>Left</sub>))

      Similar to the original GLM, we computed an accumulated value signal, |∆AV<sub>θ</sub>|, based on the lagged sum of previous samples’ |∆SV<sub>θ</sub>|.

      We found significant positive effects of |∆SV<sub>θ</sub>| in the vmPFC (peak voxel: x = -14, y = 44, z = 12; t = 3.57 p = 0.027) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.020), but in no other ROI.

      In contrast, we found significant positive relationships between |∆AV<sub>θ</sub>| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, p = 0.001), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.004), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0001). Notably, we also observed a significant relationship between |∆AV<sub>θ</sub>| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, p = 0.041). No other significant contrasts emerged.

      In summary, these analyses provide additional evidence that the vmPFC encodes gaze-weighted sampled value signals and the pre-SMA encodes gaze-weighted accumulated value signals, though other correlations also emerged.”

      (2) The reporting of statistical results in the fMRI could be sharpened - e.g. in the figure legends, don't just say "Voxels thresholded at p < .05.", but make clear whether you mean FWE whole-brain corrected (I think you do from the methods) or whether this is uncorrected for display; similarly, for the peak voxels, report the associated Z statistic at that voxel rather than just "negative beta".

      We agree that it is important to include additional details regarding how we reported the statistical results. We now clarify our procedures in the main text:

      “We report results using FWE-corrected statistical significance of p < 0.05 and a cluster significance threshold of p < 0.005.”

      We now also report the T statistics for peak voxels.

      (3) A couple of the citations are slightly wrong - e.g., Kolling et al 2012 shouldn't be cited as arguing for decision conflict, as in fact it argues strongly against this account and in favour of a foraging account of ACC activity. Similarly, Hunt et al 2018 doesn't provide support for decision conflict; instead, it shows signals in ACC show evidence accumulation for left/right actions over time (although not whether these accumulator signals are gazeweighted, in the same way as the present study).

      We thank the reviewer for pointing out these mistakes in our citations. We have revised the references throughout.

      Reviewer #2 (Recommendations for the authors):

      (1) In some places, the introduction would benefit from fleshing out certain points. For example it is stated “For instance, decisions that are less predictable also tend to take more time (Konovalov & Krajbich, 2019) and can be influenced by attention manipulations (Parnamets et al., 2015; Tavares et al., 2017; Gwinn et al., 2019; Bhatnagar & Orquin, 2022). The quantitative relations between these measures argue for an evidenceaccumulation process.” It is not clear why the relations between them argue for an EA process, and the reader would benefit from some further explanation.

      We thank the reviewer for this helpful suggestion. We agree that the original text did not sufficiently explain why these relationships support evidence-accumulation models. We have revised the introduction to better articulate the mechanistic basis for this claim.

      This revision clarifies these points in the main text:

      “Decisions like this are thought to rely on a bounded, evidence-accumulation process that depends on factors such as the value of the sampled information and shifts in attention. According to this framework, when two options are similar in value, evidence accumulates more slowly towards the decision threshold, resulting in longer response times (RT) and more opportunity for shifts in attention to influence the choice outcome. In contrast, when one option is clearly superior, evidence accumulates more rapidly and the decision is made quickly with less of a relation between gaze and choice. This choice process produces reliable, quantitative patterns in choice, RT, and eye-tracking data (Ashby et al., 2016; Callaway et al., 2021; Gluth et al., 2018; Krajbich et al., 2010; Smith & Krajbich, 2018). For instance, decisions with similar values are more random (i.e., less predictable), tend to take more time (Konovalov & Krajbich, 2019), and can be experimentally manipulated by diverting attention towards one option more than the other (Bhatnagar & Orquin, 2022; Gwinn et al., 2019; Pärnamets et al., 2015; Pleskac et al., 2022; Tavares et al., 2017). Critically, these behavioral measures do not simply correlate; rather, they exhibit precise quantitative relationships consistent with evidence accumulation models (Konovalov & Krajbich, 2019).”

      (2) Some of the study hypotheses also need to be clarified. What are the hypotheses regarding how SV and AV should translate to BOLD in an input vs integrator region? Larger SV/AV = larger BOLD? What predictions would be made for a time-on-task or conflict region? Are the predictions the same or different? Clarifying this will help the reader to understand to what extent the gaze manipulation is pivotal in identifying integrator regions.

      We thank the reviewer for this excellent suggestion. We agree that it is useful to clearly articulate our hypotheses about BOLD signal predictions for different aspects of the model, and why gaze manipulation is critical for distinguishing between them. We have now expanded the introduction to clarify these predictions.

      For input regions, we predicted a straightforward positive relationship: larger sampled value (|ΔSV|) should produce larger BOLD activity. Input regions encode the momentary evidence being sampled (i.e., the relative value of currently presented stimuli). Consistent with prior work (Bartra et al., 2013), we expected such activity in the vmPFC and ventral striatum.

      Critically, we also predicted that these sampled value signals should be modulated by gaze location. The attentional drift-diffusion model (aDDM; Krajbich et al., 2010) posits that attended items receive full value weight while unattended items are discounted. Consistent with prior work (Lim et al., 2011), we expected stronger vmPFC/striatum activity when the higher-value item is fixated compared to when the lower-value item is fixated

      For integrator regions, we predicted an analogous positive relationship: larger accumulated value (|ΔAV|) should produce more BOLD activity. Accumulator regions encode the summed evidence over the course of the decision. Consistent with prior work (Hare et al. 2011; Gluth et al. 2021; Pisauro et al. 2017) we expected such activity in the pre-SMA, dlPFC, and, IPS.

      As with sampled value, we predicted that integrator activity should reflect gaze-weighted accumulated value. Just as inputs are modulated by current gaze, the accumulated evidence should be weighted by the history of gaze allocation over the entire trial.

      Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time.

      The conflict account predicts that BOLD activity should scale with inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long RT (Pisauro et al. 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoid this ambiguity – we analyze the effect of accumulated value at each point in time, not just at the time of decision. In this case, conflict should be inversely correlated with accumulated value. Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of values.

      A more serious concern is the potential link to putative time-on-task BOLD activity. Accumulated value inevitably increases with time, leading to a correlation between the two variables (Grinband et al. 2011; Holroyd et al., 2018; Mumford et al. 2024). This is where the gaze data become particularly important. Time-on-task regions should show no relation with gaze allocation. After accounting for non-gaze-weighted accumulated value, only accumulator, and not time-on-task, regions should show a relation with gaze-weighted accumulated value. The results of the revised GLMs provide exactly such evidence.

      We have edited the manuscript to make clear to readers why our gaze manipulation was not merely exploratory but rather a theoretically-motivated test to distinguish between competing models of decision-related neural activity.

      We have clarified our study hypotheses in the Introduction as follows:

      “We hypothesized that we would find (1) a positive correlation between gaze-weighted |SV| and activity in the reward network (the ventromedial prefrontal cortex (vmPFC) and ventral striatum), and (2) a positive correlation between gaze-weighted |AV| in the pre-supplementary motor area (pre-SMA) (Aquino et al., 2023), dorsolateral prefrontal cortex (dlPFC), and intraparietal sulcus (IPS).”

      We have also added clarifying text about conflict and time-on-task to the Discussion as follows: “Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time. The conflict account predicts that BOLD activity should scale with the inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long response times (Pisauro et al., 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoided this ambiguity by analyzing the effect of accumulated value at each point in time, not just at the moment of decision. Under this approach, conflict should be inversely correlated with accumulated value (as higher accumulated evidence indicates less similarity between options). Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of option values.

      A more serious concern is the potential confound with time-on-task BOLD activity. Accumulated value inevitably increases with time within a trial, leading to a correlation between the two variables (Grinband et al., 2011; Holroyd et al., 2018; Mumford et al., 2024). This is where the gaze data were particularly important. Time-on-task regions should show no relation with gaze allocation patterns. After accounting for non-gaze-weighted accumulated value, only accumulator regions, and not time-on-task regions, should show a relationship with gazeweighted accumulated value. The results of our analyses provide exactly such evidence: preSMA activity was positively correlated with gaze-weighted accumulated value, even when accounting for previous gaze history and individual differences in attention discounting.”

      (3) The authors allude to there being a correlation between SV and AV on this task, but the correlation is never reported. Please report the correlation with and without the removal of T-1.

      We appreciate the reviewer pointing out this omission. We now report all correlations between SV and both the lagged and non-lagged versions of AV in the Methods section (Fig. 7). SV was significantly correlated with the full calculation of AV (Pearson’s r = 0.27). In contrast, this correlation, while still statistically significant, decreased when compared to lagged AV (Pearson’s r = 0.06).

      (4) When examining relationships between SV, AV, and choice probability, the authors note that a larger coefficient for SV compared to AV is an inevitable consequence of an SSM choice process. Please explain why this is the case.

      The reviewer is correct in observing that this point was not made sufficiently clear in the main text. We have now expanded the explanation in the behavioral results section.

      The key insight is that in sequential sampling models, choices occur when accumulated evidence reaches a decision threshold. Importantly, the perceived value of each sample consists of the true underlying value plus random noise. The final sample (SV) is what pushes the accumulated evidence over the threshold, which creates a selection bias: decisions tend to occur when the noise component of SV happens to be positive and large. This means that the perceived final SV systematically overestimates the true SV, biasing upward the regression coefficient for the effect of SV on choice. In contrast, AV represents the sum of all previous sampled evidence, samples that we know did not lead to a choice. These samples are thus more likely to have had a negative or small noise component, meaning that the perceived AV systematically underestimates the true AV. This biases downwards the regression coefficient for the effect of AV on choice.

      In the net, we expect that even when sample evidence is weighted equally over time in the true decision process, regression analyses will inevitably shower larger coefficients for the effects of SV then for those of AV. This is a statistical artefact of the threshold-crossing mechanism, and not a reflection of differential weighting. We have incorporated this explanation into the revised manuscript to make clear why this pattern is an expected consequence of the SSM framework:

      “The larger coefficient for ∆SV compared to ∆AV is an inevitable consequence of an SSM choice process. In SSMs, a choice occurs when accumulated evidence reaches a threshold. Critically, perceived value for any given sample consists of the true underlying value plus random noise. The final sample (∆SV) is what pushes the accumulated evidence over the threshold, which creates a selection effect: decisions tend to be made when the noise component of ∆SV is relatively large and aligned with the ultimate choice, causing the perceived final ∆SV to systematically overestimate the true ∆SV. As a result, the regression coefficient for the effect of final ∆SV on choice is overestimated. In contrast, ∆AV represents the sum of all previous evidence, which includes samples that were insufficient to trigger a choice and thus more likely to have noise components that favored the non-chosen option. This means that the perceived ∆AV systematically underestimates the true ∆AV. As a result, the regression coefficient for the effect of ∆AV on choice is underestimated. This creates an inherent asymmetry between ∆SV and ∆AV: even when the true decision process weights evidence equally over time, regression analyses will show larger coefficients for ∆SV than ∆AV. For any data generated by an SSM, regressing choice probability on final ∆SV and total ∆AV would produce a larger coefficient for ∆SV due to this threshold-crossing selection effect.”

      (5) It is not clear to me why the authors single out the pre-SMA only in the abstract when IPS and dlPFC also show stronger correlations with AV and exhibit gaze modulation in the authors' final non-linear analysis. Further explanation is required in the Discussion and I would also suggest amending the Abstract because the 'Most importantly' claim will not be meaningful for the reader.

      We appreciate the reviewer’s point. In the revised manuscript, we have included several new GLMs, including the new GLM1 that looks at gaze-weighted AV, above and beyond the effect of non-gaze-weighted AV. That analysis only supports pre-SMA. We have now clarified this in the Abstract as follows:

      “Finally, we found gaze modulated accumulated-value signals, above and beyond the non-gazemodulated signals, in the pre-supplementary motor area (pre-SMA), providing novel evidence that visual attention has lasting effects on decision variables and suggesting that activity in the pre-SMA reflects accumulated evidence.”

      (6) Some discussion of statistical power would be warranted given that a sample of 23 is now considered small by current fMRI standards.

      We appreciate the reviewer raising this important issue. We acknowledge that our sample size of 23 subjects (with only 20 having useable eye-tracking data) is on the small side by current fMRI standards. However, we believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆SV| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in pre-SMA activity builds naturally on established findings.

      However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

    1. Author response:

      Thank you for considering our manuscript, “Engineering ATP Import in Yeast Uncovers a Synthetic Route to Extend Cellular Lifespan” (eLife-RP-RA-2025-109761) for publication in eLife. We appreciate the time and effort invested by the reviewers and editors.

      We have carefully read the eLife assessment and both public reviews. After thorough evaluation, we believe there is a significant factual misunderstanding that has propagated through both reviews and fundamentally affected the interpretation of our central findings and the overall evaluation.

      We must also express concern regarding the review process duration. We were informed that the manuscript experienced an extended review period (107 days) due to delay from a third reviewer. Ultimately, we received only two reviews.

      The raised problem of our manuscript containing obvious internal contradictions or technical inconsistencies are not due to flawed data but due to a misinterpretation of measurement directionality.

      We also acknowledge the fact that we should more explicitly describe the figure legend 5, and that the methods sections should include the experimental design that led to the reverse correlation of the AU units.

      Together these facts led to the misinterpretation of the ATP measurements presented in Figure 5, specifically the directionality of the fluorescence-based ATP readout by both reviewers. In this essay, arbitrary units (AU) are reversely correlated with intracellular ATP abundance. Higher AU values correspond to lower ATP levels. This inverse relationship was clearly described in the Results section and figures marked with “Low versus High” of the manuscript, but it appears to have been overlooked. As a result, reviewers interpreted Figure 5 as contradicting Figure 2, when in fact the two datasets are fully consistent.

      Because this misunderstanding affected interpretation of the foundational ATP data, it appears to have influenced evaluation of all downstream conclusions. For example, neither reviewer meaningfully engaged with:

      - The identification of distinct cell death trajectories.

      - The mitochondrial dependency of NTT1-associated toxicity.

      - The integration of ATP depletion with mitochondrial function.

      - The distinction between intracellular ATP manipulation and extracellular ATP sensing mechanisms.

      We fully understand that when foundational data appears contradictory, reviewers naturally deprioritize downstream conclusions. However, in this case, the foundational contradiction does not exist it arises from a misreading of the reporter’s scale.

      From the Results section of the manuscript:

      “Our analysis of ATP abundance throughout the yeast lifespan showed that yeast cells are born with low ATP levels, which gradually increase during their lifespan. Some cells completed their lifespan without any observable reduction in ATP abundance, while others showed a drastic decrease in ATP levels during late life (Fig. 5A–D, Supplementary File S3), consistent with previous observations supporting two modes of yeast lifespan, mediated by mitochondrial and/or SIR2 function (42,46–49). Consistent with our data presented in Figure 2, we also observed significantly lower ATP abundance in NTT1-expressing cells throughout their entire lifespan compared to Wt control cells (Fig. 5A–C). Furthermore, these cells displayed significantly reduced mean and maximum replicative lifespan (RLS), directly indicating that intracellular ATP depletion shortens lifespan (Fig. 5D). Next, we assessed RLS and age-associated ATP changes under ATP supplementation. We found that exposing NTT1 cells to medium supplemented with 10 µM ATP restored intracellular ATP levels (Fig. 5A–C) and significantly (p = 4.03E-18) increased both mean and maximum RLS to levels comparable to WT cells (Fig. 5D).”

      This section explicitly explains that Figure 5 is consistent with Figure 2. LC-MS data (Figure 2) show intracellular ATP depletion in NTT1 cells under baseline conditions and restoration upon extracellular ATP supplementation. Figure 5 shows the same pattern longitudinally. The apparent contradiction raised by both reviewers stems entirely from misreading the directionality of the AU scale.

      In the public assessment,

      Concerns are raised about:

      - “Internally inconsistent, particularly regarding intracellular ATP measurements”

      - “Mismatched ATP measurements”

      - “Conceptual model contradicted by the data”

      - “The plots in Figure 5 make it seem like exogenous ATP addition lowers intracellular ATP…”

      These statements arise directly from the reversed interpretation of the AU scale. If the inverse relationship had been recognized, these perceived inconsistencies would not exist. Unfortunately, this misunderstanding then influenced broader interpretations, including the conclusion that the fundamental NTT1 model is internally contradictory.

      Similarly, Reviewer #2 states that LC-MS and QUEEN reporter data conflict and that ATP supplementation appears to lower intracellular ATP. This again reflects the same directional misunderstanding. There is no conflict between Figure 2 and Figure 5. Both show reduced ATP in NTT1 cells and restoration upon ATP supplementation.

      A second major point concerns the bidirectional transporter hypothesis. Reviewer #1 suggests that NTT1 may be bidirectional. However, NTT1 is well-characterized in the literature as a nucleotide transporter that exchanges extracellular ATP for intracellular ADP. We clearly described this in Figure 1C and cited the appropriate primary literature. The suggestion that we failed to consider directionality appears to stem from the same misinterpretation of intracellular ATP levels. We agree that clarifying the role of ADP/AMP depletion in NTT1-expressing cells would strengthen the manuscript, and we are prepared to revise the text to more explicitly describe how intracellular nucleotide exchange dynamics contribute to ATP depletion under baseline conditions.

      We also note that several criticisms, such as:

      -“Incorrect scale bars”

      - “Figure 5C does not match 5AB”

      - “Conceptual model contradicted by the data”

      - “No apparent correlation between ATP levels and lifespan”

      Are all rooted in this central misunderstanding of how ATP abundance is represented in the fluorescence measurements.

      To address this constructively during the next revision, we are willing to:

      (1) Revise all relevant figure legends to explicitly state that AU values are inversely correlated with ATP abundance. We will expand materials and methods section for clarifying reverse correlation and/or will generate new figures to minimize the confusion.

      (2) Add clarifying annotations directly onto the figures.

      (3) Include new figures for further validation of observed nucleotide changes.

      (4) We will expand our RNAseq data analyses.

      (5) Expand discussion of nucleotide exchange dynamics and transporter directionality

      (6) Adress remaining concerns with additional analyses, experiments and clarification throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements.

      We appreciate the reviewers' point here. In fact we selected the mitochondrial DNA as a target for just the reason that the reviewer notes. mtDNA should be spatially distinct from the nuclear targets and allow us to determine if we were in fact seeing spatially distinct proteins at the interorganelle (mtDNA vs. telomeres/centrosomes) and intraorganelle (telomeres vs centromeres) levels.

      But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one).

      We have now added two studies in Figure 4 and Figure 5 detailing the use of OMAP to investigate specific genomic elements. In this case the Hox clusters (HOXA and HOXB) and haplotype-specific analysis of X-chromosome inactivation centers in female murine (EY.T4) cells. The controls in these cases are more specific, in line with those suggested by the reviewer as we (1) compare HOXA and HOXB with or without EZH2 inhibition using the same sets of probes and (2) specifically compare the region surrounding the XIC in female cells for the inactive and active X chromosomes.

      You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      We performed GSEA on the enrichment scores for the label-free proteomics data from the SAINT output in Figure 1D and that several of these proteins (e.g., those highlighted in Figure 2A: TERF1, CENPN, TOM70) have already been extensively validated to co-localize to these locations.

      To the reviewers request for additional validation, we analyzed ChIP-seq data for several proteins to determine if they were enriched surrounding specific loci. In the case of the HoxA/B analysis, we found that HDAC3 and TCF12 were enriched at HOXB compared to HOXA, and SMARCB1 and ZC3H13 were enriched at HOXA compared to HOXB (Figure 4C). HDAC3 and TCF12 ChIP data confirmed increased peak calls at HOXB and SMARCB1 and ZC3H13 ChIP data confirmed increased peak calls at HOXA for these four selected proteins (Figure 4D).

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      We agree with the reviewer that compared to mitochondrial targeting, there could be non-specific nuclear comparisons. We note again though that we purposefully stayed away from using the word “specifically” when describing the proteomics work developed here. The reason being that we are not atlasing a large number of targets to define specificity. Instead, we highlight in Figure 2 that we did observe differences in proteins associating with telomeres and mitochondrial DNA. That may be non-specific, and in fact, this is also why we decided to include two nuclear targets to determine what might be specifically enriched. Thus, we compared centromeric and telomeric protein enrichment as determined by OMAP and observed consistent differential enrichment of shelterin proteins at telomeres (Figure 2I) and CENP-A complex members at centromeres (Figure 2J). We could have done the relative comparisons to no-oligo controls, analogous to how CASPEX compared targeted analyses to no-sgRNA controls (PMID: 29735997). However, we found that the mitochondrial targeted samples were generally better as a comparator because (1) we have clear means to validate differences and (2) the local environment around DNA is being labeled.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      Assuming the nuclear control was the same, It is unclear how this ratio-of-ratios ([Telo/Ctrl]/[Cent/ctrl]) experiment would be inherently different from the direct comparison between Telo and Centromere. Again, assuming the backgrounds are derived from the same cellular samples. More than likely adding the extra ratios could increase the artifactual variance in the estimates, reducing the power of the comparisons as has been seen in proteomics data using ratio-of-ratio comparisons in the past (Super-SILAC).

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      We appreciate the reviewers' point here. To be clear, we have not made any claims about new proteins at specific loci. Instead we validated that known telomeric and centromeric associating proteins were consistently enriched by DNA OMAP (Figure 2). We also want to emphasize that while valuable, the current paper is not an atlasing paper to define the full and specific proteomes of two genomic loci. We instead show how this method can be used to observe quantitative differences in proteins enriched at certain loci (HOXA/B work, Figure 4) and even between haplotypes (Xi/Xa work, Figure 5).

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      We appreciate the reviewers' point here and have added the following text to the discussion: “Additionally, we show that this method is also able to detect DNA-DNA contacts through biotinylation of loop anchors. Our approach functions similarly to 4C[86]. However, our approach of biotin labeling of contacts does not rely on pairwise ligation events. Thus, detection of contacts through DNA O-MAP will vary in the sampling of DNA-DNA contacts in comparison.”

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      We took the reviewers point and have worked to scale down the DNA OMAP experiments while revising this manuscript. As noted in Figure 5, we have been able to scale this work down to work on plates with ~10x fewer cells than with our initial experiments. This is on top of the initial DNA OMAP work in Figure 1 and 2, as well as our additional work in Figure 4, where we are using 30-60 million cells in solutions which is still 10x less material than previous work (PMID: 29735997). Thus, the newest DNA OMAP platform uses ~100x fewer cells than previous work.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      As noted above, we have added Figures 4 and 5 to address the reviewer concerns by targeting multiple non-repetitive loci (HOXA and HOXB clusters and a 4.5Mb region straddling X-inactivation center on both the active and inactive X homolog). Targeting the regions around the X-inactivation center shows the potential to perform haplotype-resolved proteome analysis of chromatin interactors.

      For the telomeric protein overlap, we tried to do this specifically in Figure 1F, we agree with the reviewer that the controls used dramatically change the proteins considered enriched. The goal of the network analysis was to show (1) that we identify proteins previously observed in telomere proteomic datasets and (2) that we gain a more complete view of proteins based on capturing more known interacting proteins than many previous methods as was noted for the RNA OMAP platform (PMID: 39468212). For example, we observed enrichment of PRPF40A in the telomeric DNA OMAP data. From the Bioplex interactome, PRPF40A was observed to interact with TERF2IP and TERF2, suggesting that through these interactions PRPF40A may colocalize at telomeres. Similarly, we observed enrichment of SF3A1, SF3B1, and SF3B2. The SF3 proteins are known regulators of telomere maintenance (PMID: 27818134), but have not previously been observed in telomeric proteomics datasets, except now in DNA OMAP.

      We have added the following text to the Results to clarify these points:

      “To benchmark DNA O-MAP, we compared the full set of telomeric proteins to proteins observed in five established telomeric datasets (PICh, C-BERST, CAPLOCUS, CAPTURE, BioID)12,14,16,35,36 (Figure 1F). DNA O-MAP captured both previously observed telomeric interacting proteins (shelterins) as well as telomere associated proteins (ribonucleoproteins). We identified multiple heterogeneous nuclear ribonucleoproteins (hnRNPs) previously annotated as telomere-associated, including HNRNPA1 and HNRNPU. HNRNPA1 has been demonstrated to displace replication protein A (RPA) and directly interact with single-stranded telomeric DNA to regulate telomerase activity37–39. HNRNPU belongs to the telomerase-associated proteome40 where it binds the telomeric G-quadruplex to prevent RPA from recognizing chromosome ends41. We mapped DNA O-MAP enriched telomeric proteins to the BioPlex protein interactome and observed that in addition to capturing proteins from previously observed telomeric datasets (Figure 1F), DNA O-MAP enriched for interactors of previously observed telomeric proteins. Previous data found RBM17 and SNRPA1 at telomeres, and in BioPlex these proteins interact with three SF3 proteins (SF3A1, SF3B1, SF3B2). Though they were not identified in previous telomeric proteome datasets, all three of these SF3 proteins were enriched in the DNA O-MAP telomeric data. Furthermore, through interactions with G-quadruplex binding factors, these SF3 proteins are regulators of telomere maintenance (PMID: 27818134). Taken together, this data supports the effectiveness of DNA O-MAP for sensitively and selectively isolating loci-specific proteomes.”

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figures 4 and 5).

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      Our primary claim for DNA OMAP is that it requires orders of magnitude fewer cells than previous studies. Based on comments along these lines from both reviewers, we performed DNA OMAP targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figure 4 and 5). For the X chromosome targeting, we used ~3 million cells per condition with methods that we optimized during revision. When targeting HOXA and HOXA, we were able to identify HDAC3 and TCF12 enrichment at HOXB compared to HOXA as well as ZC3H13 and SMARB1 enrichment at HOXA compared to HOXB, which is consistent with ChIP-seq reads from ENCODE for these proteins (Figure 4C, D). Both the HOXand X chromosome work help to address limitations noted in the Gauchier et al. paper the reviewer notes as both show progress towards overcoming “the major signal-to-noise ratio problem will need to be addressed before they can fully describe the specific composition of single-copy loci”.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      We analyzed ChIP-seq reads for our HOXA and HOXB (Figure 4C,D) which recapitulate our findings for four of our differentially enriched proteins. We also note that with the addition of the nonrepetitive loci (Figures 4 and 5), we have performed DNA OMAP on seven different targets (telomeres, pericentromeres, mitoDNA, HOXA, HOXB, Xi, and Xa) and identified expected targets at each of these. The consistency of these data, which mirrors the consistency of the RNA implementation of OMAP (PMID: 39468212), reinforces that we can successfully enrich local proteomes at genomic loci.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      Based on this critique we have gone back through the manuscript to improve the fairness of our comparisons and expanded the limitations in our discussion section.

      To the point about fixation, Schmiedeberg et al., which the reviewer references, does describe crosslinking requiring longer interactions (~5 s). Yet, as featured in reviews, many additional studies have found that “it has been possible to perform ChIP on transcription factors whose interactions with chromatin are known from imaging studies to be highly transient” (Review PMID: 26354429). We note similar results in proteomics analysis in Subbotin and Chait that state that the linkage of lysine-based fixatives like formaldehyde and “glutaraldehyde to reactive amines within the cellular milieu were sufficient to preserve even labile and transient interactions (PMID: 25172955).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOX clusters and part of the X chromosome) in the revised manuscript (Figures 4 and 5).

      Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We have made the comparisons as best as possible. In fact, we found it difficult to find examples of recent implementations of many of these methods. Purchasing the exact mass spectrometers or performing every version of chromatin proteomics would be well beyond the scope of this work. On the other hand, OMAP has already generated data for three manuscripts. We are making the claim that using the instrumentation and methods available to us, we were able to reduce the number of cells required to analyze a given genomic loci. We then applied TMT multiplexing to further improve the throughput and perform replicate analyses. To fully validate that one protein exists at one loci and no other would require exhaustive atlasing of protein-genomic interactions which would be well beyond the scope of this single paper. Similarly, ChIP for every target identified to assess an empirical FDR would be well beyond the scope of this work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In summary, all three reviewers raised major concerns about the limitations of the method, many of which could be resolved by more precise and transparent language about these limitations. If you choose to resubmit a revised version, you should address questions like: What scale does "individual locus" refer to? At what scale can the method map protein-DNA interactions at individual targeted loci, rather than large repetitive domains? What is the estimated false discovery rate for a set of enriched proteins? The eLife assessment for this version of the manuscript is based on reviewer concerns. Note that this assessment can be updated after receiving a response to reviewer comments.

      Reviewer #1 (Recommendations for the authors):

      (1)The first couple of paragraphs make it sound like your method would exclusively benefit from sample multiplexing with MS-based proteomics. That is a bit misleading. The other stated methods use TMT. They don't use it to compare very different genomic (or compartmental) regions, but there is no reason cberst, glopro or CasID could not.

      A good point and we have updated the manuscript to reflect this. While previous methods generally did not use TMT, they could be adapted to do so and, similar to OMAP, improved by the use of more replicates in their analyses.

      (2) Please make the colors in 1F for the dataset overlap easier to read. 2 and 4+ are too similar.

      We appreciate the comment on making the colors easier to discern. Along these lines we’ve changed the color of “2” to make it easier to distinguish from “4+”.

      (3) Label as many dots as legible in your volcano plots.

      We’ve labeled a number of proteins that are relevant to the discussion in this paper as well as some additional proteins. We feel that additional labeling would detract from the points that we are trying to make in individual figure panels about groups of proteins, rather than general remodeling of all proteins.

      (4) Figure 2E needs a divergent color scheme since it crosses 0. And is it scaled, log-transformed, or both? And compared to what then?

      Figure 2E (heatmap) is z-scaled relative protein abundance measurements based on TMTpro reporter ion signal to noise (“s/n”). We have added additional information to the legend to highlight the information that the reviewer points out here. For the color, we are unsure of what is being asked for, as above 0 is red and below 0 is blue.

      (5) Unclear what you are implying with "...only 1-2 biological replicates." I would omit or clarify.

      Fair point, we have updated the manuscript to omit this section to simplify the introduction.

      (6) H2O2 and biotin phenols might be toxic to living organisms. But so is 4% PFA and ISH. I realize you are trying to justify your new approach but you don't need to do it with exaggerated contrasts. This O-MAP is a great approach and probably more likely for people to adopt it because it's DNA ISH based. Plus, with the clinking, you are likely not displacing proteins via Cas9 landing.

      We appreciate the reviewer’s comments about adoption and lack of protein displacement. We’ve scaled back on the claims and added more about limitations owing to crosslinking and ISH.

      (7) How much genome does the Cent regions take up? You state 500 kb for Telos.

      In the text we delineate how large of a region the PanAlpha probes target “The genome-wide binding profile of the pan-alpha probe closely overlaps with centromeres (Figure S1) and covers approximately 35 Mb of the genome according to in silico predictions.” Additionally, we’ve added Table S4 to summarize target locus sizes for all of the included targets.

      (8) You seem to be underestimating the lysine labeling. Is that after TMT labeling and analysis? If so, you're already ignoring what couldn't be seen. I don't think it's that important but you included it, so please describe clearly why it's an issue and how much of an issue it is. How does that relate to lit values? And it's not just TMTpro, it's any lysine labeler.

      We appreciate the reviewers point about specifying the reasoning and the lack of clarity around overall lysine labeling. That 1.38% is the number of peptides with remainder modifications due to formaldehyde crosslinking. For overall acylation of lysines with TMT labels, we generally expect (and achieve) >97% labeling of lysines with TMT reagents as the Kuster and Carr labs nicely demonstrated across a range of labeling conditions (PMID: 30967486).

      Decrosslinking is a critical step generally for proteomics workflows on fixed or FFPE tissues and thus we sought to explore whether we could achieve sufficiently low residual lysine alkylation to enable protein quantitation by TMTpro reagents (or any lysine labeler, as the reviewer notes). For TMTpro-based methods on peptides, this is less of a concern generally as protease cleavage frees new primary amines at the N-termini of peptides which can be labeled for quantitation. But in part since we are describing a proteomics method on fixed tissues we wanted to share these data and the potential inclusion of residual fixation modifications for readers to potentially take into consideration when performing this method.

      Reviewer #3 (Recommendations for the authors):

      Liu et al. describe an original locus labelling approach that enables the isolation of specific genomic regions and their associated proteins. I have mixed views on this work, which, in my opinion, remains preliminary at this stage. Establishing the proteome of a single chromatin region is one of the most complex challenges in chromatin biology, as extensively discussed in Gauchier et al. (2020). Any breakthrough towards this goal is of significant interest to the community, making this manuscript potentially compelling. Indeed, some data suggest that the method works for repetitive DNA to some extent. However, much of the data is not very convincing, and in the case of small DNA targets, it argues against the use of DNA-O-MAP.

      In contrast to existing methods, DNA-O-MAP combines locus-specific hybridisation in situ (using affordable oligonucleotides) with proximity biotinylation. A major advantage of this strategy over other locus-specific biotinylation methods is the possibility of extensively washing excess or non-specifically hybridised probes before the biotinylation reaction, theoretically limiting biotinylation to the target region and thus significantly enhancing the signal-to-noise ratio. Other methods involving proximity biotinylation, such as targeted dCas9, do not have this capacity, meaning biotinylation occurs not only at the locus where a small fraction of dCas9 molecules is targeted but also around non-bound dCas9 molecules (representing the vast majority of dCas9 expressed in a given cell). This aspect potentially represents an interesting advance.

      We thank the reviewer for their thoughts and critiques, which we hope have in part relieved concerns pertaining to limitation on repetitive elements. To the latter points, we confirmed this with new specificity analysis that showed labeling to be highly specific to a given probe locus (Figure S3).

      Below, I outline the significant issues:

      The manuscript implies that DNA-O-MAP has better sensitivity than earlier techniques like CAPTURE, GLOPRO, or PICh. The authors state that PICh uses one trillion cells (which I doubt is accurate), and other methods require 300 million cells, whereas DNA-O-MAP uses only 60 million cells, suggesting the latter is more feasible. However, these earlier experiments were conducted almost 15 and 6 years ago, when mass spectrometry (MS) sensitivity was considerably lower than that of current instruments. The authors cannot know whether the proteome obtained by previous methods using 60 million cells, but analysed with current MS technology, would yield results inferior to those of DNA-O-MAP. Unless the authors directly compare these methods using the same number of cells and identical MS setups, I find their argument unjustified and misleading.

      Based on the instrumentation listed, we actually do have a good idea of how sensitivity changes may have affected identifications and overall sensitivity. For example, the CASPEX data was collected on an Orbitrap Fusion Lumos, while our data was collected on an Orbitrap Fusion Eclipse. From our work characterizing these two instruments during the Eclipse development (PMID: 32250601), we do actually know that the ion optics improvements boosted sensitivity of the Eclipse used in our work compared to the Lumos by ~50%, meaning if GLOPRO was run on an Eclipse it would still require >200 million cells per replicate for input.

      It is suggested that DNA-O-MAP is capable of 'multiplexing', whereas previous methods are not. This statement is also misleading. As I understand it, the targeted regions do not originate from a common pool of cells. Instead, TMT multiplexing only occurs after each group of cells has been independently labelled (Telo, Centro, Mito, control). Therefore, previous methods could also perform multiplexing with TMT. Moreover, it is unclear how each proteome was compared: one would expect many more proteins from centromeres than from telomeres (I am unsure about the number of mitochondria in these cells) since these regions are significantly larger than telomeres (possibly 10 to 100 times larger?). Have the authors attempted to normalise their proteomics data to the size (concatenated) of each target? This is particularly relevant when comparing histone enrichment at chromatin regions of differing sizes.

      We agree with the reviewers that this was overstated. In fact the GLOPRO paper notes that they performed a MYC analysis with a previous generation of TMT that could multiplex 10 samples. We have amended the manuscript to be more specific in those contexts. As stated in the methods section, “Samples were column normalized for total protein concentration”, to account for the amount of protein and size of the different targets.

      Figure 1C shows streptavidin dots resembling telomeres. To substantiate this claim, simultaneous immunofluorescence with a telomere-specific protein (e.g., TRF1 or TRF2) is required. It is currently unknown whether all or only a subset of telomeres are targeted by DNA-O-MAP, and it is also unclear if some streptavidin foci are non-telomeric. Quantification is needed to indicate the reproducibility of the labelling (the same comment applies to the centromere probes later in the manuscript; an immunofluorescence assay with CENPB would be informative, alongside quantifications).

      We understand the reviewer’s concern about specificity and reproducibility of DNA-O-MAP. To address this we have added analysis showing the efficiency and specificity of our FISH and biotin labeling for Telomere, PanAlpha, and Mitochondria targeting oligos (Figure S3). We found that biotin deposition was highly specific to the intended targets with an average across the three probes of 98% specificity.

      Perhaps more importantly, the authors suggest that it may be possible to enrich proteins that are not necessarily present at the target locus but are instead in spatial proximity (e.g., RNA polymerase I subunits enriched upon centromere targeting). Does this not undermine the purpose of retrieving locus-specific proteomes?

      The goal of DNA OMAP is to identify a local neighborhood of proteins around a specific genomic loci, similar to GLOPRO. As we note in the work presented in Figure 4 and 5 now, these neighborhoods are inherently interesting for comparison of quantitative changes that occur around a genomic locus.

      Possibly related to the previous issue, when DNA-O-MAP is used to assess DNA-DNA interactions, probes covering regions of 20-25 kb are employed. Therefore, one would expect these regions to be significantly biotinylated compared to flanking regions. However, Genome Browser screenshots indicate extensive biotinylation signals spanning several megabases around the 20-25 kb targets. If the method were highly resolutive, the target region would be primarily enriched, with possibly discrete lower enrichment at distant interacting regions. The lack of discrete enrichment suggests poor resolution, likely due to the likely large scale of proximity biotinylation. This compromises the effectiveness of DNA-O-MAP, especially if it is intended to target small loci with complex sequences. Could the authors quantify the absolute number of reads from the target region compared to those from elsewhere in the genome (both megabases around the locus and other chromosomes, where many co-enriched regions seem to exist)? This would provide insights into both enrichment and specificity.

      Thanks for this suggestion, we have included a new Figure S8 to look at normalized read depth as a function of distance from the genomic target. The resolution of DNA OMAP, like all peroxidase mediated proximity labeling methods, is not dependent on the sequence length of the DNA region, but the 30-40nm of physical space around the HRP molecule that is targeted to the genomic loci. 

      Minor Issues:

      (1) Page 3, second paragraph: It is unclear why probes producing a visible signal in situ necessarily translates to their ability to retrieve a specific proteome.

      We have revised the manuscript to de-emphasize the visible signal aspect of probe targeting and re-emphasize our initial point that the number of probes needed to properly target unique regions makes the use of locked nucleic acid probes cost-prohibitive. The basic point though, we and others previously showed with RNA OMAP (PMID: 39468212) and Apex/proximity labeling strategies, the ability to deposit biotin and visualize generally directly translates to recovery of proximally labeled proteins (PMID: 26866790).

      (2) Page 3, last paragraph: "to reach a higher degree of enrichment...": Has it been demonstrated that direct protein biotinylation provides higher enrichment of relevant proteins? Certainly, there is higher enrichment of proteins, but whether they are relevant is another matter.

      Our point here was that the methods using direct protein biotinylation have higher levels of enrichment and thus require less cells than the previously mentioned PICh method, which is why we wrote the following: “In the case of GLoPro, APEX-based proximity labeling enhanced protein detection sensitivity, reducing the input required for each replicate analysis to ~300 million cells—a 10-fold reduction in cell input compared to PICh which used 3 billion cells.”

      Regarding if these proteins are relevant or not, we show enrichment of known proteins that are critical to the function of their occupied genomic region at telomeres and centromeres. Additionally, we’ve made added quantitative comparisons to assess relevance in our analysis of Hox and our targeted region of the X chromosome through comparisons to ChIP data at these regions. The improved enrichment that we’ve established in our initial submission as well as in the updated version also means that we can further scale down the number of cells required.

      (3) Figure 2B is misleading; it appears as though all three regions are targeted in the same cell, suggesting true multiplexing, which, I believe, is not the case.

      To avoid any potential confusion about how the samples were derived we’ve updated this figure panel to show three separate cells, each with a different region being targeted.

      (3) If I understand correctly, the 'no probe' control should primarily retrieve endogenously biotinylated proteins (carboxylases), which are mainly found in mitochondria. Why does the Pearson clustering in Supplementary Figure 2 not place this control proteome closer to the mitochondrial proteome?

      Under the assumption that the ~10 carboxylases are biotinylated at the same levels in all cells, yet the proportion of these carboxylases compared to all enriched proteins for a given target is markedly reduced. Thus, as a proportion of the enriched proteome we note in Figure S4 that mitochondrial DNA OMAP enriches proteins besides the carboxylases. We believe this explains why the ‘no probe’ sample can be clearly separated along PC2 in Figure 2D.

      (4) Was CENPA enriched in the centromere DNA-O-MAP? If not, have the authors scaled up (e.g., with ten times more cells) to see if the local proteome becomes deeper and detects relevant low-abundance proteins like CENPA or HJURP? This would be very informative.

      We did not observe CENPA, and we had originally contemplated the experiment the reviewer suggested, but noted that CENPA has only two tryptic peptides (>7 AA, <35AA), and they are both in the commonly phosphorylated region of the protein. Rather than scale up these experiments, we decided to attempt DNA OMAP on the non-repetitive locus experiments.

      (5) Using a few million cells, I do not see how the starting chromatin amount could range from 0.5 to 7 mg, as shown in Figures 2 and 3. How were these figures calculated? One diploid cell contains approximately 6 pg of DNA/chromatin, which means one billion cells represent about 6 mg of DNA/chromatin (a typical measurement for these methods).

      Thanks to the reviewer for catching this, that should have been the total lysate amount, not chromatin mass. We have corrected Figures 2 and 3.

      (6) Figure S1: There is no indication of the metrics used for the shades of red.

      We have added a gradient legend to depict this.

      (7) What is the purpose of HCl in the experiment?

      HCl treatment was done to reduce autofluorescence for imaging (PMID: 39548245).

      (8) I could not find the MS dataset on the server using the provided accession number (PDX054080).

      Thank you for pointing this out, we have confirmed the dataset is public now and added the new datasets for the Xi/Xa and Hox studies. We also note that the accession should be “PXD054080”

      (9) Why desthiobiotin instead of biotin?

      We have tested both; desthiobiotin was helpful to reduce adsorption to surfaces. Either biotin or desthiobiotin can be used, though, for OMAP.

    1. Author response:

      The following is the authors’ response to the original reviews

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will have now implemented:

      (1) Improvements to the discussion. Following the recommendation of the reviewers recommended we have focused our discussion on the novel findings of the manuscript and drawn out some key points of interest that deserve more attention.

      (2) We added a new Figure 5 to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We added a new additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We have performed this experiment and the new data is show as part of a new Figure 2.

      (4) We included representative images of spindle morphology as requested by Reviewer #1, point 2 in Figure1.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation. We added the following sentence at Line 218: “ In wild-type cells, Pds1 levels are higher in meiosis I than in meiosis II, likely because the interval between the divisions is too short to allow Pds1 reaccumulation [1,2,4]. This pattern was also observed in SynSAC strains in the absence of ABA (Figure 3A).

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “However, ABA addition at the time of prophase release resulted in Pds1<sup>securin</sup> stabilisation throughout the time course, consistent with delays in both metaphase I and II”. (Line 225).

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B (now Figure 3B), spindle morphology counts show that at 105 minutes, 40% of cells had anaphase I spindles (and will be Pds1 negative), while ~20% had metaphase I and ~20% metaphase II spindles (and will be Pds1 positive). In contrast, due to the better efficiency of the meiosis II arrest, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3,5,6]. We re-wrote this section as follows. (Line 222).

      “Note that Pds1 levels do not fully decline in this population-based analysis as the short duration of meiotic stages results in a mixed-stage population. For example, at the anaphase I peak (90 minutes) around 30% of cells remain in prior stages in which Pds1 levels are expected to be high. However, ABA addition at the time of prophase release resulted in Pds1<sup>securin</sup> stabilisation throughout the time course, consistent with delays in both metaphase I and metaphase II. (Figure 3B). Anaphase I spindles nevertheless appeared with delayed kinetics, peaking at ~40% at 105 min. Concurrently, ~40% of cells remained in metaphase I or II and were therefore Pds1-positive, accounting for the persistent Pds1 signal on the western blot. In contrast, anaphase II spindles are observed at low frequency (maximum 10%) from 165 minutes onwards because metaphase II spindles give way to post-meiotic spindles, without undergoing anaphase II extension (Figure 1D).”

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      We have now included representative images as Figure 1D along with a schematic Figure 1C. This shows that there are no differences in spindle morphology or nuclei (chromosomes cannot be observed at this resolution), except of course the number of cells with a particular spindle morphology at a given time. We added the following text confirming that there is no change in spindle morphology (Line 174). “We scored spindle morphology after anti-tubulin immunofluorescence to determine cell cycle stage (Figure 1C). Prophase, metaphase I, anaphase I, metaphase II, anaphase II and post-meiotic spindles appeared successively over the timecourse in both the absence and presence of ABA (Figure 1D). While SynSAC dimerisation did not alter characteristic spindle morphologies, it changed their distribution over time.”

      The number of cells scored (at least 100 cells per timepoint) is given in the figure legends.

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We acknowledge however that we cannot completely rule out unwanted effects of the system, as in any synchronisation system, and where possible findings with the system should be backed up with an orthogonal approach. We appreciate the reviewers’ insight in highlighting these interesting discussion points and we have re-written the relevant paragraph in the discussion, starting line 545.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      Thanks for the suggestion. We agree and have moved the data for both meiosis I and meiosis II to make a new main Figure 2.

      (2) Line 197, the authors state: ...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I. However, line 229 and 240 the auhtors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes7, though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We have included a paragraph in the discussion in the section starting line 641.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. We highlighted the potential of using the 4A-RASA variant more strongly as follows:

      Line 312, Results:

      “These findings also indicate that spc105<sup>(1-455)</sup>-4A-RASA is the preferred SynSAC variant, particularly where metaphase I arrest is the goal.”

      Line 598, Discussion: “Finally, the stronger and more prolonged SynSAC arrest obtained using the PP1 binding site mutant spc105<sup>(1-455)</sup>-4A-RASA prompts its consideration as an alternative tool for future studies, particularly where meiosis I arrest is important. At the time of performing the kinetochore immunoprecipitations, these mutations were not yet available but, as we have demonstrated, wild type SynSAC protein fragments nevertheless yielded sufficiently enriched populations of metaphase I and II cells to allow reliable detection of stage-specific kinetochore proteins and phosphorylations. Going forward, however, we consider SynSAC-4A-RASA to be the optimal tool for inducing metaphase arrests.”

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (now Figure 7A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing, along with the reduced metaphase I delay, which together point to a role of Aurora B-mediated phosphorylation also in S. cerevisiae, though previous work has not supported such a role [8].

      We have re-written and expanded the paragraph in the discussion related to the mutation of the RVSF motif starting line 564 to reflect these points.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we have provided a new figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control (Figure 5).

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis9. In this study, we found that relatively few proteins significantly change in abundance. We added a sentence to this effect in the discussion (Line 632). “Although some variation could reflect global changes in protein abundance during meiosis, we previously found that only a few proteins undergo dynamic abundance changes during the meiotic divisions [9], so this is unlikely to fully explain the kinetochore composition differences observed.”

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we have re-framed the discussion to focus on the novel findings, as also raised by the other reviewers and noted above.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We added the requested data, which is now part of Figure 2. This now clearly shows that mad2 and mad3 mutants have very similar meiotic cell cycle profiles in the SynSAC background whether or not ABA is added. Please note that we removed the mad1 mutant from this analysis as technical difficulties prevented the strain from entering meiosis well.

      We have improved graphs throughout, as suggested: data lines are thinner, axis gridlines and external grid marks are included. We added an arrow to indicate the time of ethanol/ABA addition.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore we think this experiment unnecessary.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for dynamic error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I. We have re-written and expanded the discussion section starting line 565 to reflect these points.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Kinvetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected and we have carefully proofread the manuscript.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      References

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Marston, A.L., Lee, B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Dev Cell 4, 711–726. https://doi.org/10.1016/s1534-5807(03)00130-8.

      (5) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (6) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (7) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (8) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (9) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Taylar Hammond and colleagues identified new regulators of the G1/S transition of the cell cycle. They did so by screening publicly available data from the Cancer Dependency Map and identified FAM53C as a positive regulator of the G1/S transition. Using biochemical assays they then show that FAM53 interacts with the DYRK1A kinase to inhibit its function. They show in RPE1 cells that loss of FAMC53 leads to a DYRK1A + P53-dependent cell cycle arrest. Combined inactivation of FAM53C and DYRK1A in a TP53-null background caused S-phase entry with subsequent apoptosis. Finally the authors assess the effect of FAM53C deletion in a cortical organoid model, and in Fam53c knockout mice. Whereas proliferation of the organoids is indeed inhibited, mice show virtually no phenotype.

      The authors have revised the manuscript, and I respond here point-by-point to indicate which parts of the revision I found compelling, and which parts were less convincing. So the numbering is consistent with the numbering in my first review report.

      (1) The p21 knockdowns are a valuable addition, and the claim that other p53 targets than p21 are involved in the FAMC53 RNAi-mediated arrest is now much more solid. Minor detail: if S4D is a quantification of S4C, it is hard to believe that the quantification was done properly (at least the DYRK1Ai conditions). Perhaps S4C is not the best representative example, or some error was made?

      We appreciate the concern from the Reviewer. As explained in the first round of revisions, we have mostly used an immunoassay based on capillary transfer (WES system), which is very quantitative (much more than classical immunoblot). As for the other WES assays, the panel in S4C is a representation from the signal in the capillary from one of the experiments we performed (in many ways, we should simply not show these representations but readers and reviewers expect them). We agree that this was not visually the most representative, likely because of the saturation of the signal, and we replaced it with another one.

      (2a) I appreciate the decision to remove the cyclin D1 phosphorylation data. A more nuanced model now emerges. It is not clear to me however why the Protein Simple immunoassay was used for experiments with RPE cells, and not the cortical organoids. Even though no direct claims are made based on the phospho-cyclin D data in Figure 5E+G, showing these data suggests that FAM53C deletion increases DYRK1A-mediated cyclin D1 phosphorylation. I find it tricky to show these data, while knowing now that this effect could not be shown in the RPE1 cells.

      The Reviewer raises a valid point. The data we had presented in the first version of the manuscript were strongly suggestive of changes in Cyclin D1 phosphorylation and protein stability but we followed the Reviewer’s advice to remove them from the revised manuscript because the effects were sometimes small. We decided to keep these data in the organoid model because we felt this is a question that many readers would have (how do changes in FAM53C affect Cyclin D levels?). As the Reviewer mentions, we did not draw conclusions about this but we felt and still feel it is important to connect the dots, even if imperfectly, between FAM53C and the cell cycle, and these data in Figure complement the data in Figure 3F. The experiments with RPE-1 cells were mostly performed in the Sage lab with the WES assay while the experiments with organoids were largely performed in the Pasca lab where more ‘classic’ immunoblots are routinely used. More generally, some antibodies work better with one method vs. the other and we often go back and forth between the two.

      (2b) The quantifications of the immunoassays are not convincing. In multiple experiments, the HSP90 levels vary wildly, which indicates big differences in protein loading if HSP90 is a proper loading control. This is for example problematic for the interpretation of figure 3F and S3I. The cyclin D1 "bands" look extremely similar between siCtrl and siFAM53C (Fig S3I), in fact the two series of 6 samples with different dosages of DYRK1Ai look seem an identical repetition of each other. I did not have to option to overlay them, but it would be important to check if a mistake was made here. The cyclin D1 signals aside, the change in cycD1/HSP90 ratios seems to be entirely caused by differences in HSP90 levels. Careful re-analysis of the raw data and more equal loading seem necessary. The same goes (to a lesser extent) for S3J+K.

      As mentioned above, the representation of the fluorescence signal may be important for readers who are used to seeing immunoblot (Western blots), but the quantification is performed on the values directly obtained from the WES system from ProteinSimple. In these experiments, we make sure that the numbers we obtain are in a validated range, allowing us to use the values, even if sometimes the loading is a bit different between lanes. The sensitivity of the WES assay allows for high accuracy in intra-well quantification allowing for accurate inter-well quantification once loading control normalization is completed.

      (2c) the new model in Fig S4L: what do the arrows at the right FAM53C and p53 that merge a point straight towards S-phase mean? They suggest that p53 (and FAM53C) directly promote S-phase progression, but most likely this is not what the authors intended with it.

      Very good point. We were trying to be inclusive of various signaling pathways that may be implicated in the regulation of the cell cycle by this group of proteins. FAM53C does promote S-phase entry (more cycling when FAM53C is overexpressed) but we removed the arrow coming from p53, which is certainly not a positive regulator of cell cycle progression. Thank you for helping us correct this mistake.

      (3) Clear; nicely addressed.

      (4) Thank you for correcting.

      (5) I appreciate that the authors are now more careful to call the IMPC analysis data preliminary. This is acceptable to me, but nevertheless, I suggest the authors to seriously consider taking this part entirely out. The risk of chance finding and the extremely skewed group sizes (as reviewer #2 had pointed out) hamper the credibility of this statistical analysis.

      We appreciate this concern but feel that it is important for the community to be aware of these phenotypes so other investigators either study FAM53C in different genetic contexts or, for example, generate a conditional knockout allele to study more acute effects of FAM53C loss during development and in adult mice. We believe that the text is carefully written and acknowledge the caveats of small sample sizes in some statistical analyses.

      Reviewer #2 (Public review):

      The authors sought to identify new regulators of the G1/S transition by mining the Cancer Dependency Map (DepMap) co-dependency dataset. This analysis successfully identified FAM53C, a poorly characterized protein, as a candidate. The strength of the paper lies in this initial discovery and the subsequent biochemical work convincingly showing that FAM53C can directly interact with the kinase DYRK1A, a known cell cycle regulator.

      The authors then present evidence, primarily from acute siRNA knockdown in RPE-1 cells, that loss of FAM53C induces a strong G1 cell cycle arrest. Their follow-up investigation proposes a model where FAM53C normally inhibits DYRK1A, thereby protecting Cyclin D from degradation and preventing p53 activation, to allow for G1/S progression. The authors have commendably addressed some concerns from the initial review: they have now demonstrated the G1 arrest using two independent siRNAs (an improvement over the initial pool), shown the effect in several additional cancer cell lines (U2OS, A549, HCT-116), and developed a more nuanced model that incorporates p53 activation, which helps to explain some of the complex data.

      However, a central and critical weakness persists. The entire functional model is built upon the very strong G1 arrest phenotype observed in vitro following acute knockdown. This finding is in stark contrast to data from other contexts. As the authors note, the knockout of Fam53c in mice results in minimal phenotypes, and the DepMap data itself suggests the gene is largely non-essential in most cancer cell lines.

      This major discrepancy creates two competing interpretations:

      As the authors suggest, FAM53C has a critical role in the cell cycle, but its loss is rapidly masked by compensatory mechanisms in long-term knockout models (like iPSCs and mice) or in established cancer cell lines.

      The strong acute G1 arrest is an experimental artifact of the siRNA-mediated knockdown, and not a true reflection of FAM53C's primary function.

      The authors' new controls (using two individual siRNAs and showing the arrest is RB-dependent) make an off-target effect less likely, but they do not definitively rule it out. The gold-standard experiment to distinguish between these two possibilities-a rescue of the phenotype using an siRNA-resistant cDNA-has not been performed.

      Because this key control is missing, the foundation of the paper's functional claims is not as solid as it needs to be. While the study provides an interesting and valuable new candidate for the cell cycle field to investigate, readers should be cautious in accepting the strength of FAM53C's role in the G1/S transition until this central discrepancy is definitively resolved.

      We appreciate this concern from the Reviewer. Genetically, FAM53C is linked to a number of genes coding for known regulators of the G1/S transition and its loss of function would be predicted to lead to G1 arrest based on these genetic interactions. As the Reviewer nicely summarizes, we have data in several cell types, including non-cancerous immortalized cells (RPE-1) and several cancer cell lines, that FAM53C acute knock-down leads to a G1 arrest. Our data also indicate that this arrest is RB dependent and p53 independent. Furthermore, genetic knockout of FAM53C in iPSC-derived human cortical organoids results in decreased proliferation. All these elements point to a role for FAM53C in G1/S. We performed some pilot rescue experiments, as suggested by the Reviewer, but these preliminary assays could not identify the right “dose” of FAM53C. We agree that it will be important in future studies to develop better genetic systems in which FAM53C can be manipulated genetically. However, our overexpression experiments show increased proliferation, providing more support for a role of FAM53C at the G1/S transition of the cell cycle.

      Reviewer #3 (Public review):

      Summary:

      In this study Hammond et al. investigated the role of Dual-specificity Tyrosine Phosphorylation regulated Kinase 1A (DYRK1) in G1/S transition. By exploiting Dependency Map portal, they identified a previously unexplored protein FAM53C as potential regulator of G1/S transition. Using RNAi, they confirmed that depletion of FAM53C suppressed proliferation of human RPE1 cells and that this phenotype was dependent on the presence protein RB. In addition, they noted increased level of CDKN1A transcript and p21 protein that could explain G1 arrest of FAM53C-depleted cells but surprisingly, they did not observe activation of other p53 target genes. Proteomic analysis identified DYRK1 as one of the main interactors of FAM53C and the interaction was confirmed in vitro. Further, they showed that purified FAM53C blocked the ability of DYRK1 to phosphorylate cyclin D in vitro although the activity of DYRK1 was likely not inhibited (judging from the modification of FAM53C itself). Instead, it seems more likely that FAM53C competes with cyclin D in this assay. Authors claim that the G1 arrest caused by depletion of FAM53C was rescued by inhibition of DYRK1 but this was true only in cells lacking functional p53. This is quite confusing as DYRK1 inhibition reduced the fraction of G1 cells in p53 wild type cells as well as in p53 knock-outs, suggesting that FAM53C may not be required for regulation of DYRK1 function. Instead of focusing on the impact of FAM53C on cell cycle progression, authors moved towards investigating its potential (and perhaps more complex) roles in differentiation of IPSCs into cortical organoids and in mice. They observed a lower level of proliferating cells in the organoids but if that reflects an increased activity of DYRK1 or if it is just an off-target effect of the genetic manipulation remains unclear. Even less clear is the phenotype in FAM53C knock-out mice. Authors did not observe any significant changes in survival nor in organ development but they noted some behavioral differences. Weather and how these are connected to the rate of cellular proliferation was not explored. In the summary, the study identified previously unknown role of FAM53C in proliferation but failed to explain the mechanism and its physiological relevance at the level of tissues and organism. Although some of the data might be of interest, in current form the data is too preliminary to justify publication.

      Major comments:

      (1) Whole study is based on one siRNA to Fam53C and its specificity was not validated. Level of the knock down was shown only in the first figure and not in the other experiments. The observed phenotypes in the cell cycle progression may be affected by variable knock-down efficiency and/or potential off target effects.

      We fully acknowledge these limitations in our study. First, we agree that the efficiency of the knock-down can be variable across experiments; unfortunately, antibodies against FAM53C are currently still not optimal and immunoassays against this protein have not always been reliable in our hands. It will be important in the future to develop better antibodies for this poorly studied factor. Second, we also agree that the siRNA pool is perhaps not optimal (note that we used a pool, not a single siRNA). We provide data in the manuscript that single siRNAs (from the pool) also arrest cells in G1. Our data also show that this arrest in observed in several cell lines (cancerous and not cancerous), in a p53 independent but RB dependent way. We further note that we also provide data in cortical spheroids derived from CRISPR/Cas9 knockout iPSCs showing a similar inhibition of proliferation, validating our observations in a completely orthogonal system. Finally, overexpression studies support a role for FAM53C at the G1/S transition (i.e., FAM53C overexpression is sufficient to promote proliferation).

      (2) Experiments focusing on the cell cycle progression were done in a single cell line RPE1 that showed a strong sensitivity to FAM53C depletion. In contrast, phenotypes in IPSCs and in mice were only mild suggesting that there might be large differences across various cell types in the expression and function of FAM53C. Therefore, it is important to reproduce the observations in other cell types.

      As mentioned above, we have observed cell cycle arrest in several cancer cell lines (U2OS, A549, HCT-116) and in iPSC-derived organoids. We acknowledge that RPE-1 cells seem most sensitive to the knock-down and, currently, we do not understand why. In the future, it will be critical to gain a better understanding of the cellular/genetic contexts in which FAM53C plays more important roles in the G1/S transition; it will be also critical to understand what mechanisms may compensate for loss of FAM53C in cells, in culture and in vivo.

      (3) Authors state that FAM53C is a direct inhibitor of DYRK1A kinase activity (Line 203), however this model is not supported by the data in Fig 4A. FAM53C seems to be a good substrate of DYRK1 even at high concentrations when phosphorylations of cyclin D is reduced. It rather suggests that DYRK1 is not inhibited by FAM53C but perhaps FAM53C competes with cyclin D. Further, authors should address if the phosphorylation of cyclin D is responsible for the observed cell cycle phenotype. Is this Cyclin D-Thr286 phosphorylation, or are there other sites involved?

      We completely agree with the Reviewer that the functional interactions between FAM53C and DYRK1A will need to be explored further. Our data (and other data from mass spectrometry experiments in other contexts) support a model in which FAM53C binds to DYRK1A. Genetics analyses indicate that FAM53C is antagonistic to DYRK1A function. Our phosphorylation assays show decreased DYRK1A activity when FAM53C is present. Because our data also show that DYRK1A phosphorylates FAM53C, there may be more than one level of functional interaction between the two proteins, including effects by DYRK1A on FAM53C through its phosphorylation activity. We state in the text that our data suggest “that FAM53C may be a competitive substrate and/or an inhibitor of DYRK1A”, and we agree that we cannot provide a stronger conclusion at this point.

      We believe that genetic data from DepMap and our data support a model in which Cyclin D is downstream of FAM53C in its regulation of the G1/S progression. As discussed with Reviewer #1, it has proven challenging to investigate how FAM53C may control the phosphorylation and degradation of Cyclin D. Thr286 is certainly a critical phosphorylation site, and this residue can be phosphorylated by DYRK1A, but whether FAM53C and DYRK1A engage with other residues or domains is not known and should be the focus of future studies.

      (4) At many places, information on statistical tests is missing and SDs are not shown in the plots. For instance, what statistics was used in Fig 4C? Impact of FAM53C on cyclin D phosphorylation does not seem to be significant. In the same experiment, does DYRK1 inhibitor prevent modification of cyclin D?

      We thank the Reviewer for this comment. We made sure in the revised version to mention all the statistical tests used.

      (5) Validation of SM13797 compound in terms of specificity to DYRK1 was not performed.

      We provided tables in Figure S3 that summarize the biochemical characterization of this DYRK1A inhibitor (performed by Biosplice Therapeutics, where this compound was developed)

      (6) A fraction of cells in G1 is a very easy readout but it does not measure progression through the G1 phase. Extension of the S phase or G2 delay would indirectly also result in reduction of the G1 fraction. Instead, authors could measure the dynamics of entry to S phase in cells released from a G1 block or from mitotic shake off.

      This is an interesting point raised by the Reviewer. It is correct that we only performed a more in-depth characterization of cell cycle phenotypes in certain contexts (e.g., cell counting, EdU incorporation) (see Figures 1 and S1). It is possible that different cell types adapt differently to loss or overexpression of FAM53C, and assays to synchronize the cells, including by mitotic shake off, maybe useful in future experiments to further characterize the cell cycle of FAM53C mutant cells.

      Comments to the revised manuscript:

      In the revised version of the manuscript, authors addressed most of the critical points. They now include new data with depletion of FAM53C using single siRNAs that show small but significant enrichment of population of the G1 cells. This G1 arrest is likely caused by a combined effects on induction of p21 expression and decreased levels of cyclin D1. Authors observed that inhibition of DYRK1 rescued cyclin D1 levels in FAM53 depleted cells suggesting that FAM53C may inhibit DYRK1. This possibility is also supported by in vitro experiments. On the other hand, inhibition of DYRK1 did not rescue the G1 arrest upon depletion of FAM53C, suggesting that FAM53C may have also DYRK1-independent role in G1. Functional rescue experiments with cyclin D1 mutants and detection of DYRK1 activity in cells would be necessary to conclusively explain the function of FAM53C in progression through G1 phase but unfortunately these experiments were technically not possible. Knock out of FAM53C in iPSCs and in mice suggest that FAM53C may have additional functions besides the cell cycle control and/or that adaptation may have occurred in these model systems. Overall, the study implicated FAM53C in fine tuning DYRK1 activity in cells that may to some extent influence the progression through G1 phase. In addition, FAM53C may also have DYRK1 and cell cycle independent functions that remain to be addressed by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      All my minor points (6-11) were addressed adequately. No further comments.

      Reviewer #2 (Recommendations for the authors):

      The paper's conclusions would be substantially strengthened and the primary concern about off-target effects could be definitively resolved by performing one of the following two experiments:

      (1) Perform a rescue experiment. This would involve transfecting RPE-1 cells with an expression vector for an siRNA-resistant FAM53C cDNA (alongside a control vector) and then treating the cells with the FAM53C siRNAs. If the G1 arrest is a true on-target effect, the cells expressing the resistant cDNA should be "rescued" and continue to proliferate, while the control cells arrest. This is the most direct and standard way to validate a phenotype derived from siRNA.

      (2) Use an acute gene deletion approach that bypasses siRNAs entirely. The authors could use a lentiviral gRNA/Cas9 system to induce acute knockout of FAM53C in RPE-1 cells and assess the cell cycle phenotype at an early time point (e.g., 48-72 hours post-infection). This would provide a direct comparison to the acute siRNA knockdown, and if it recapitulates the strong G1 arrest, it would confirm the phenotype is due to FAM53C loss and not an artifact of the RNAi machinery. The current knockout models (iPSC, mice) are stable and long-term, which allows for the compensatory mechanism argument; an acute knockout would be a much stronger control. The authors could then also follow the fate of the cells and determine the nature of the suspected compensatory mechanisms.

      Addressing this central point is critical for the credibility of the proposed G1/S control element.

      As discussed above, the observations of similar phenotypes in four cell lines (RPE-1 cells and three cancer cell lines) using a pool of siRNAs and in cortical organoids derived from iPSCs using a knockout approach strongly support our results. But we agree that our current study has limitations, including the lack of genetic re-introduction of FAM53C in knock-down or mutant cells. We also note that strong genetic evidence points to a role for FAM53C at the G1/S transition. We hope that some of the readers will be excited by FAM53C as an understudied factor with possible critical roles in fundamental cell biology and human diseases, and future studies will continue to investigate its function in cells using additional approaches.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      We thank Reviewer #1 for the affirmative appraisal of our manuscript as well as the thoughtful and insightful comments, which have enabled us to significantly improve the manuscript.

      (1) Inferences rely heavily on the results of mixed effects models which may or may not be properly specified and are not supported by complementary analyses.

      We thank Reviewer #1 for raising this critical issue of model specification. We have re-fitted our mixed-effects models and performed complementary analyses to validate the robustness of our findings. Specifically, we adopted the maximal converging random-effects structure (including random slopes for Recipient, Effort, and Magnitude where feasible) while ensuring model stability (see Responses to Reviewer #1’s Recommendations point 2). Crucially, our primary findings, including the Recipient × Effort and Recipient × Effort × Magnitude interactions, remained robust. Furthermore, additional analyses confirmed that these results were not confounded by factors such as response speed and subjective effort rating (see Responses to Reviewer #1’s Recommendations point 5).

      (2) Also, not all results hang together in a sensible way. For example, participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. Given that participants took longer to complete tasks when earning effort for others, it is conceivable that participants might have been working less hard for others versus themselves, and this may complicate the interpretation of results.

      We thank Reviewer #1 for this insightful point (which also relates to Reviewer #3’s point 5). In our study, participants were asked to rate three specific dimensions: Effort (“How much effort did you exert to complete each effort condition when earning rewards for yourself [or the other person]?”), Difficulty (“How much difficulty did you perceive in each effort condition when earning rewards for yourself [or the other person]?”), and liking (“How much did you like each effort condition when earning rewards for yourself [or the other person]?”).

      We acknowledge the Reviewer #1’s concern that the lower subjective effort ratings for others seems contradictory to the higher disliking and longer completion times. We propose that in this paradigm, subjective effort ratings are susceptible to demand characteristics and likely captured motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. To disentangle these factors, we included a measure of perceived task difficulty, which is anchored in task properties and is less prone to social desirability biases (Harmon-Jones et al., 2020; Wright et al., 1990). We found no differences in perceived difficulty between self- and other-benefiting trials (Figure 2D), suggesting that the task demands were perceived as equivalent across conditions. To examine this interpretation more directly, we analyzed correlations among participants’ ratings of difficulty, effort, and liking. As illustrated in Figure S1, we found no correlation between difficulty and effort ratings. Crucially, liking ratings were negatively correlated with difficulty ratings.

      More importantly, our performance data contradict the interpretation that participants “worked less hard” for others in terms of task completion. While participants took longer to complete tasks for others, they maintained comparable, near-ceiling success rates for self (97%) and other (96%) recipients (b = -0.46, p = 0.632; Supplementary Table S1). This dissociation suggests that although participants were less motivated (e.g., lower subjective ratings, longer completion times, and greater disliking) to work for others, they ultimately exerted the necessary physical effort to achieve successful outcomes. Thus, the results consistently point to a decrease in prosocial motivation (consistent with prosocial apathy) rather than a failure of effort exertion.

      Wright, R. A., Shaw, L. L., & Jones, C. R. (1990). Task demand and cardiovascular response magnitude: Further evidence of the mediating role of success importance. Journal of Personality and Social Psychology, 59(6), 1250-1260. https://doi.org/10.1037/0022-3514.59.6.1250

      Harmon-Jones, E., Willoughby, C., Paul, K., & Harmon-Jones, C. (2020). The effect of perceived effort and perceived control on reward valuation: Using the reward positivity to test a dissonance theory prediction. Biological Psychology, 107910. https://doi.org/10.1016/j.biopsycho.2020.107910

      Reviewer #2 (Public review):

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences the processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      An important strength of the study is that the amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We appreciate Reviewer #2’s positive appraisal of our manuscript. We are fortunate to receive your thoughtful and insightful suggestions and have revised the manuscript accordingly.

      (1) Although the obtained results are highly plausible, I am concerned whether the reward positivity (RewP) and P3 were adequately measured. The RewP and P3 were defined as the average voltage values in the time intervals 300-400 ms and 300-440 ms after feedback onset, respectively. So they largely overlapped in time. Although the RewP measure was based on frontocentral electrodes (FC3, FCz, and FC4) and the P3 on posterior electrodes (P3, Pz, and P4), the scalp topographies in Figure 3 show that the RewP effects were larger at the posterior electrodes used for the P3 than at frontocentral electrodes. So there is a concern that the RewP and P3 were not independently measured. This type of problem can often be resolved using a spatiotemporal principal component analysis. My faith in the conclusions drawn would be further strengthened if the researchers extracted separate principal components for the RewP and P3 and performed their statistical analyses on the corresponding factor scores.

      We thank Reviewer #2 for raising this issue. We would like to clarify that these two components were time-locked to different types of feedback and therefore reflect neural responses to distinct stages of the prosocial effort task. Specifically, the P3 was time-locked to performance feedback (the effort-completion cue; e.g., the tick shown in Figure 1B), whereas the RewP was time-locked to reward feedback (e.g., the display of “+0.6”). Thus, despite the numerical similarity in the post-stimulus windows, the components capture neural activity evoked by independent events separated in time, corresponding to the performance monitoring versus reward evaluation stages of the task. To avoid misunderstanding, we have made this distinction more explicit in the revised manuscript, which now reads, “Single-trial RewP amplitude was measured as mean voltage from 300 to 400 ms relative to reward feedback onset (i.e., reward delivery) over frontocentral channels (FC3, FCz, FC4). We also measured the parietal P3 (300–440 ms; averaged across P3, Pz, and P4) in response to performance feedback (i.e., effort completion), given its relationship with motivational salience (Bowyer et al., 2021; Ma et al., 2014)” (page 27, para. 1, lines 2–6).

      Reviewer #3 (Public review):

      This study investigates how effort influences reward evaluation during prosocial behaviour using EEG and experimental tasks manipulating effort and rewards for self and others. Results reveal a dissociable effect: for self-benefitting effort, rewards are evaluated more positively as effort increases, while for other-benefitting effort, rewards are evaluated less positively with higher effort. This dissociation, driven by reward system activation and independent of performance, provides new insights into the neural mechanisms of effort and reward in prosocial contexts.

      This work makes a valuable contribution to the prosocial behaviour literature by addressing areas that previous research has largely overlooked. It highlights the paradoxical effect of effort on reward evaluation and opens new avenues for investigating the mechanisms underlying this phenomenon. The study employs well-established tasks with robust replication in the literature and innovatively incorporates ERPs to examine effort-based prosocial decision-making - an area insufficiently explored in prior work. Moreover, the analyses are rigorous and grounded in established methodologies, further enhancing the study's credibility. These elements collectively underscore the study's significance in advancing our understanding of effort-based decision-making.

      We thank Reviewer #3 for the positive assessment. We are particularly encouraged by the reviewer’s recognition of our novel integration of ERPs to uncover the distinct effects of effort on reward evaluation for self versus others. We have carefully addressed the specific recommendations raised in the subsequent comments to further strengthen the rigor and clarity of the manuscript.

      (1) Incomplete EEG Reporting: The methods indicate that EEG activity was recorded for both tasks; however, the manuscript reports EEG results only for the first task, omitting the decision-making task. If the authors claim a paradoxical effect of effort on self versus other rewards, as revealed by the RewP component, this should also be confirmed with results from the decision-making task. Omitting these findings weakens the overall argument.

      We thank Reviewer #3 for giving us the opportunity to verify the specific roles of our two tasks. The primary aim of our study is to elucidate the neural after-effects of effort exertion on subsequent reward evaluation during prosocial acts. The prosocial effort task was specifically designed for this purpose, as it involves actual effort expenditure followed by reward outcomes. Furthermore, this task uses preset effort-reward combinations, ensuring balanced trial counts and adequate signal-to-noise ratios across conditions, a critical requirement for robust ERP analysis. In contrast, the prosocial decision-making task was included specifically to quantify behavioral preference (i.e., prosocial effort discounting) rather than neural reward processing. Specifically, this task involves choices without immediate effort execution and reward feedback, making it impossible to examine the neural after-effects of effort exertion. However, the decision-making task remains indispensable for our study structure: it provides an independent behavioral phenomenon of prosocial apathy, which allowed us to link individual differences in behavioral motivation to the neural dissociations observed in the prosocial effort tasks (as detailed in our Responses to Reviewer #3’s 2). Thus, the two tasks provide complementary, rather than redundant, insights into the behavioral and neural mechanism of prosocial effort.

      (2) Neural and Behavioural Integration: The neural results should be contrasted with behavioural data both within and between tasks. Specifically, the manuscript could examine whether neural responses predict performance within each task and whether neural and behavioural signals correlate across tasks. This integration would provide a more comprehensive understanding of the mechanisms at play.

      We thank Reviewer #3 for this insightful and helpful suggestion. We agree that linking neural signatures with behavioral patterns is crucial for establishing the functional significance for our ERP findings. Regarding within-task association, it is important to note that the prosocial effort task was designed to require participants to exert fixed, preset levels of physical effort to earn uncertain rewards. This experimental control was necessary to standardize effort exertion across self-benefiting and other benefiting trials, thereby minimizing confounds such as differences in physical or perceived effort prior to the feedback phase. Indeed, the neural after-effects remained after controlling for these behavioral measures (i.e., response speed and self-reported effort; as detailed in responses to Reviewer #1’Recommendations point 5). Furthermore, unlike the prosocial effort task, the decision-making task inherently precludes the examination of the neural after-effects of effort; therefore, within-task association in this task was not possible.

      Given these considerations, we focused on the cross-task association. We examined whether the neural after-effects of effort (indexed by the RewP) in the prosocial effort task were modulated by individual differences in effort discounting. We used the K value estimated from the prosocial decision-making task as the index of effort discounting. We entered the K value (log-transformed and z-scored) as a continuous predictor into the mixed-effects models of RewP amplitudes. The full regression estimates for the model are presented in Table S1 (left).

      We observed a significant four-way interaction among recipient, effort, magnitude, and K value (b = 0.58, p = 0.013). To decompose this complex interaction, we performed simple slopes analyses separately for self- and other-benefiting trials at high and low levels of reward magnitude and discounting rate (±1 SD). As shown in Figure S2, for self-benefiting trials, the effort-enhancement effect on the RewP was significant only for participants with high discounting rates at low reward magnitude (b = 1.02, 95% CI = [0.22, 1.82], p = 0.012). In contrast, participants with low discounting rates exhibited no significant effort effect (b = -0.37, 95% CI = [-0.89, 0.15], p = 0.159). At high reward magnitude, simple slopes analyses detected no significant effort effects for either high (b = 0.35, 95% CI = [-0.44, 1.14], p = 0.383) or low (b = 0.45, 95% CI = [-0.07, 0.97], p = 0.093) discounting individuals. These findings strongly support the cognitive dissonance account (Aronson & Mills, 1959): those who find effort most aversive are most compelled to inflate the value of small rewards to justify their exertion. For these individuals, the completion of a costly action for a small reward may trigger a stronger internal justification effect, resulting in an amplified neural reward response.

      For other-benefiting trials, participants with low discounting rates exhibited a significant effort-discounting effect at high reward magnitude (b = -0.97, 95% CI = [-1.74, -0.20], p = 0.014). In contrast, no significant effort effects were observed for participants with high discounting rates at either high (b = -0.45, 95% CI = [-0.97, 0.08], p = 0.098) or low (b = -0.16, 95% CI = [-0.69, 0.38], p = 0.564) reward magnitudes, nor for participants with low discounting rates at low reward magnitude (b = 0.14, 95% CI = [-0.64, 0.92], p = 0.729). These results suggest that the justification mechanism observed for self-benefiting effort appears absent for other-benefiting effort. Instead, we observed a persistent effort discounting before, during, and after effort expenditure, which was most pronounced in individuals with low effort sensitivity (low K) when reward magnitude was high. This seemingly paradoxical pattern might be interpreted through the lens of disadvantageous inequity aversion (Fehr & Schmidt, 1999). Specifically, the combination of high personal effort and high monetary reward for another person creates a salient disparity between the participant’s incurred cost and the recipient’s gain. Although low-K individuals are behaviorally willing to tolerate this cost, their neural valuation system may nonetheless track the “unfairness” of this asymmetry, thereby attenuating the neural reward signal (Tricomi et al., 2010). These insights suggest that facilitating prosocial behavior may require not just lowering costs, but potentially framing outcomes to trigger the effort justification mechanisms that drive the effort paradox observed in self-benefiting acts (Inzlicht & Campbell, 2022).

      To confirm this four-way interaction, we also replaced the high-effort choice proportions in the decision-making task and observed a similar four-way interaction among recipient, effort, magnitude, and high-effort choice proportions (b = -0.58, p = 0.014; see Table S1 for detailed regression estimates). Together, this cross-task analysis not only provides a more comprehensive understanding of the mechanisms at play but also justifies the inclusion of the prosocial decision-making task. We sincerely thank Reviewer #3’ for this valuable suggestion, which has significantly strengthened our manuscript. We have included this analysis (page 16, para. 2; page 17, paras. 1–2) and discussed the results (page 20, para. 2, lines 10–15; page 20, para. 3; page 21, para. 1, lines 1–8) in the revised manuscript.

      Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. The Journal of Abnormal and Social Psychology, 59(2), 177-181. https://doi.org/10.1037/h0047195

      Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114(3), 817-868. http://www.jstor.org/stable/2586885

      Tricomi, E., Rangel, A., Camerer, C. F., & O'Doherty, J. P. (2010). Neural evidence for inequality-averse social preferences. Nature, 463(7284), 1089-1091. https://doi.org/10.1038/nature08785

      (3) Success Rate and Model Structure: The manuscript does not clearly report the success rate in the prosocial effort task. If success rates are low, risk aversion could confound the results. Additionally, it is unclear whether the models accounted for successful versus unsuccessful trials or whether success was included as a covariate. If this information is present, it needs to be explicitly clarified. The exclusion criteria for unsuccessful trials in both tasks should also be detailed. Moreover, the decision to exclude electrodes as independent variables in the models warrants an explanation.

      We appreciate the opportunity to clarify these points. In the revised manuscript, we have now explicitly reported the descriptive statistics and the results of a mixed-effects logistic model on response success in the revised manuscript (page 8, para. 1, lines 2–4; Supplementary Table S1). Participants achieved similarly high success rates in both self (M = 97%) and other trials (M = 96%; Figure S3). As shown in Table S2, success rates decreased as effort increased (b = -4.77, p < 0.001). However, no other effects reached significance (ps > 0.245). These near-ceiling success rates indicate strong task engagement and effectively rule out risk aversion as a potential confound.

      Regarding model structure, we excluded unsuccessful trials from statistical analyses because they were rare and distributed equally across conditions. Given the near-ceiling performance, we did not include success rate as a covariate, as it offers limited variance.

      Finally, we did not include electrodes as an independent variable because our hypotheses focused on condition effects rather than topographic differences. Following established research (e.g., Krigolson, 2018; Proudfit, 2015), we averaged RewP amplitudes across a frontocentral cluster (FC3, FCz, and FC4) and P3 amplitudes across a parietal cluster (P3, Pz, and P4), where activity is typically maximal. Averaging across these theoretically grounded clusters improves the signal-to-noise ratio and provides more reliable estimates of the underlying components. We have explicitly included this rationale in the revised manuscript, which reads, “Data were averaged across the selected electrode clusters to improve signal-to-noise ratio and reliability” (page 27, para. 1, lines 9–10).

      Proudfit, G. H. (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology, 52(4), 449-459. https://doi.org/10.1111/psyp.12370

      Krigolson, O. E. (2018). Event-related brain potentials and the study of reward processing: Methodological considerations. Int J Psychophysiol, 132(Pt B), 175-183. https://doi.org/10.1016/j.ijpsycho.2017.11.007

      (4) Prosocial Decision Computational Modelling: The prosocial decision task largely replicates prior behavioural findings but misses the opportunity to directly test the hypotheses derived from neural data in the prosocial effort task. If the authors propose a paradoxical effect of effort on self-rewards and an inverse effect for prosocial effort, this could be formalised in a computational model. A model comparison could evaluate the proposed mechanism against alternative theories, incorporating the complex interplay of effort and reward for self and others. Furthermore, these parameters should be correlated with neural signals, adding a critical layer of evidence to the claims. As it is, the inclusion of the prosocial decision task seems irrelevant.

      We thank Reviewer #3 for this thoughtful suggestion regarding the value of computational modelling. We fully agree that formalizing mechanisms is crucial, but we would like to clarify why a computational model of decision-making cannot directly capture the paradoxical after-effects observed in our neural data. The paradoxical after-effect of effort exertion we report refers to experienced utility (i.e., how prior costs modulate the hedonic consumption of a reward), whereas the decision task measures decision utility (i.e., how prospective costs and benefits are integrated to guide choice). We included the prosocial decision task to establish a behavioral baseline and replicate the well-documented phenomenon of prosocial apathy. Consistent with prior work (e.g., Lockwood et al., 2017; Lockwood et al., 2022), our data show that at the decision stage (ex-ante), effort functions as a universal cost: participants discounted rewards for both self and others, differing only quantitatively (steeper discounting for others). It is only after effort is exerted (ex-post) that the pattern reverses: effort is valued for self but remains costly for others, representing a qualitative shift. Crucially, incorporating a "paradoxical valuation" parameter (i.e., effort as a reward) into our decision model would mathematically contradict the behavioral reality. Since participants actively avoided high-effort options, a model assuming effort adds value might fail to fit the choice data. The theoretical novelty of our study lies precisely in this temporal dissociation: whereas self-benefiting effort paradoxically enhances reward valuation, other-benefiting effort induces a persistent reward devaluation.

      To address the reviewer’s interest in bridging these two domains, we examined whether these distinct stages are linked at the level of individual differences. We hypothesized that an individual’s sensitivity to prospective effort cost (discounting rate K) might modulate their susceptibility to the retrospective neural after-effect. As detailed in our Responses to Reviewer #3’s point 2, we found that for self-benefiting trials, high-discounting individuals showed an effort-enhancement effect on the RewP at low reward magnitude, while for other-benefiting trials, low-discounting individuals exhibited effort-discounting effects at high reward magnitude. We sincerely thank Reviewer #3’ for this valuable suggestion, which has successfully correlated the two tasks and facilitated our understanding of the mechanisms at play.

      Lockwood, P. L., Hamonet, M., Zhang, S. H., Ratnavel, A., Salmony, F. U., Husain, M., & Apps, M. A. J. (2017). Prosocial apathy for helping others when effort is required. Nat Hum Behav, 1(7), 0131. https://doi.org/10.1038/s41562-017-0131.

      Lockwood, P. L., Wittmann, M. K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., & Apps, M. A. J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Curr Biol, 32(19), 4172-4185 e4177. https://doi.org/10.1016/j.cub.2022.08.010.

      (5) Contradiction Between Effort Perception and Neural Results: Participants reported effort as less effortful in the prosocial condition compared to the self condition, which seems contradictory to the neural findings and the authors' interpretation. If effort has a discounting effect on rewards for others, one might expect it to feel more effortful. How do the authors reconcile these results? Additionally, the relationship between behavioural data and neural responses should be examined to clarify these inconsistencies.

      This point aligns with the issues raised in Reviewer #1’s point 2. We acknowledge the apparent discrepancy between lower reported effort in the prosocial condition and the neural discounting effect. As detailed in our Responses to Reviewer #1’s point 2, we reconcile this by proposing that subjective effort ratings in this paradigm likely reflect motivational engagement (e.g., “how hard I tried” or “how willing I was”) rather than perceived task demands. Under this interpretation, the lower effort ratings for others reflect a withdrawal of engagement (consistent with prosocial apathy), which conceptually aligns with, rather than contradicts, the neural discounting effect. To validate this, we contrasted effort ratings with difficulty ratings (a more reliable index of objective demand). Our correlational analysis revealed no significant relationship between difficulty and effort ratings (r = -0.21, p = 0.196), suggesting that they capture distinct constructs. Furthermore, liking ratings were negatively correlated with difficulty ratings (r = -0.43, p = 0.011) but not with effort ratings (r = 0.32, p = 0.061), further dissociating the two measures. Crucially, as detailed in our Responses to Reviewer #1’s Recommendations point 5, our RewP effects remained significant even after controlling for individual effort ratings. This demonstrates that the neural effort-discounting effect for others is a physiological signature that operates independently of the subjective report bias.

      (6) Necessary Revisions to Manuscript: If the authors address the issues above, corresponding updates to the introduction and discussion sections could strengthen the narrative and align the manuscript with the additional analyses.

      We thank Reviewer #3 for the above insightful and helpful comments. We have carefully addressed these issues raised above and have updated the manuscript accordingly, including abstract, introduction, result, and discussion sections.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The two biggest concerns I have are

      - Whether the mixed-effect models are properly specified, and

      - Whether the main interaction between the Recipient and effort on the reward positivity (RewP) reflects different levels of effort exertion when working for self versus others.

      We thank Reviewer #1 for identifying these two critical issues. We have carefully considered these points and conducted additional analyses to address them. Below, we provide a detailed response to each concern, explaining how we have improved the model specification and ruled out alternative interpretations regarding effort exertion.

      (2) On the first point, I noticed that the authors selectively excluded random effects for Effort and Magnitude when regressing RewP on Effort, Magnitude, Recipient, and Valence. This is important because the key result in the paper is a fixed effect two-way interaction between Recipient and Effort and a three-way interaction between Recipient, Effort, and Magnitude. It is not clear that these results will remain significant when Effort and Magnitude are included as random effects in the model. Thus the authors should justify their exclusion as random effects, and/or show that the results don't depend on including those random effects in the model. The same logic applies to the specification of other mixed effects models (e.g. the effect of Magnitude in the model predicting RTs).

      We thank Reviewer #1 for raising this important methodological point. We fully agree that including random slopes wherever possible reduces Type 1 error rates and yields more conservative tests of fixed effects. In our analyses, we determined the random effects structure for each model using singular value decomposition (SVD). Specifically, we began with a maximal model that included by-participant random slopes for all main effects and interactions as well as a participant-level random intercept. When the model failed to converge or yielded a singular fit, we applied SVD to identify redundant dimensions (i.e., components explaining zero variance) and iteratively removed these terms until convergence was achieved. This procedure allowed us to retain the maximal converging random-effects structure while ensuring model stability. We have clarified this procedure in the revised manuscript as follows, “For each model, we fitted the maximal random-effects structure and, when the model was overparameterized, used singular value decomposition to simplify the random-effects structure until the model converged” (page 28, para. 1, lines 5–8).

      Regarding the RewP model, including all variables (i.e., Recipient, Effort, Magnitude, and Valence) in the random-effects structure resulted in a boundary (singular) fit. Examination of the variance-covariance structure of the random effects revealed that the random slopes for Valence and Magnitude were perfectly negatively correlated (r = -1.00), indicating severe overparameterization. In our original submission, we removed the random slopes for Effort and Magnitude because the SVD analysis indicated redundant dimensions in the model structure.

      However, we agree with the Reviewer that retaining slopes for variables involved in key interactions is crucial. Therefore, we re-evaluated the model strategy: instead of removing Effort and Magnitude, we removed the random slope for Valence (which was the primary source of the perfect correlation). This modification successfully resolved the singularity while allowing us to retain the random slopes for the critical variables (i.e., Effort and Magnitude).

      Critically, this updated model yielded the same pattern of results as our original submission: the two-way interaction between Recipient and Effort and the three-way interaction between Recipient, Effort, and Magnitude remained significant (see Table S3). As expected, including the random slopes for Effort and Magnitude yielded a more conservative test of the fixed effects. While the critical three-way interaction remained significant (p = 0.019), the simple slope for the Self condition at high reward magnitude shifted slightly from significant (p = 0.041) to marginally significant (p = 0.056). However, the effect size remained largely unchanged (b = 0.42 vs. original b = 0.43), and the dissociation pattern, where self-benefiting trials show a positive trend while other-benefiting trials show a significant negative slope, remains robust and is statistically supported by the significant interaction. We have adopted this updated model in the revised manuscript and updated the relevant sections accordingly. Finally, note that we have removed the RewP table from the Supplementary Materials because the RewP model results are now presented as a figure in the main text (as suggested by Reviewer #1’s Recommendations point 3).

      We have also carefully verified the random effects structures for other mixed-effects models, including the RT and Performance-P3 models in the prosocial effort task, as well as the decision time and decision choice models in the prosocial decision-making task. The updated information is detailed as follows:

      Regarding the RT model, we replaced it with a more reasonable model of response speed (button presses per second), as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 4 for details).

      Regarding the performance-P3 model, the random-effects structure could only support Effort, as in our original submission; thus, the results remain unchanged.

      Regarding the decision time model, we have updated our results to include the quadratic effort term, as suggested by Reviewer #1 (see our responses to Reviewer #1’s Recommendations point 6 for details).

      Regarding the decision choice model, we included Recipient, Effort, and Magnitude in the random-effects structure. As shown in Table S4, the results remain largely consistent with the original model, except for a newly significant interaction between effort and magnitude. Follow-up simple slopes analyses revealed that the discounted effect of effort was more pronounced at low reward magnitude (M − 1SD: b = -2.69, 95% CI = [-3.09, -2.29], p < 0.001) than at high reward magnitude (M + 1SD: b = -2.38, 95% CI = [-2.82, -1.94],p < 0.001).

      In summary, we have improved the model specification following Reviewer #1’s suggestion. Crucially, the results remain qualitatively consistent with our original findings. We have updated the Results section, figures (Figures 2, 4, and 5), and OSF documents (including a new R Markdown file and an HTML output file detailing the final results) to reflect these analyses. Additionally, we have explicitly stated the method used for calculating p-values in the mixed-effects models (page 28, para. 1, lines 8–10), which was omitted in the original submission.

      (3) Regarding the mixed models, it would also be good to show a graphical depiction summarizing key effects (e.g. the Recipient by Effort interaction on RewP) rather than just showing the predictions of the fitted mixed effects models.

      This point is well-taken. Please see Figure S4, which visualizes the key effects and has now been included in the revised manuscript as Figure 4A.

      (4) Finally, regarding the mixed effect models of RTs - given the common finding that RTs are not normally distributed, the Authors might be better off regressing 1/RT (interpreted as speed rather than latency) since 1/RT will often make distributions less asymmetric and heavy-tailed.

      We thank Reviewer #1 for this helpful suggestion regarding data distribution. In our original analysis, the dependent variable was “completion time” (i.e., the latency to complete the required button presses with the 6-s window). We agree that these raw latency data exhibited characteristic non-normality (see Figure S5, Left). Based on Reviewer #1’s suggestion, we adopted “response speed” (calculated as button presses per second) as the dependent variable. As expected, this transformation substantially improved the normality of the distribution (see Figure S5, Right). We have refitted the mixed-effects model using this speed metric. Critically, the results largely replicated the patterns observed in our original model, with the exception that the main effect of reward magnitude did not reach significance in the speed model (see Table 5). Given the superior distributional properties of the speed metric, we have replaced the original latency analysis with the response speed model in the revised manuscript. We have updated the Results section (page 8, para. 1, lines 4–9) and Figures 2B–C accordingly.

      (5) Regarding the level of effort exerted, there are two reasons to suspect that participants exerted less for others versus themselves. The first is that they were slower to complete the button pressing for others versus themselves. The second is that they reported paradoxically less subjective effort for others versus self (paradoxical because they also reported liking the task less for others versus self). The explanation for both may be that they exerted less effort for others versus self and this has important implications for interpreting the main effects. If they exerted less effort for others, this may partly account for the key Recipient:Effort and Recipient:Effort:Magnitude interactions in the mixed effects regression of RewP. Do either median effort durations or self-reported effort predict the magnitude of the Recipient:Effort and Recipient:Effort:Magnitude interactions (if these were included as random effects)? If so, that would provide evidence supporting this story. Alternatively, if median durations or self-reported effort were included as covariates, do these interactions still obtain? In any case, the Authors should include caveats regarding this potential explanation of the self-versus-other interactions with effort and magnitude on the RewP" (or explain why this can not explain the interactions).

      We thank Reviewer #1 for raising this important interpretational issue. We acknowledge the concern that differences in physical exertion or perceived effort could potentially confound the neural findings. However, we argue that the observed RewP effects are not driven by these factors for several reasons.

      First, the prosocial effort task enforced fixed effort thresholds (10%–90% of their maximum effort level) across self-benefiting and other-benefiting trials. Importantly, participants achieved ceiling-level success rates that were highly comparable between self-benefiting (97%) and other-benefiting (96%) trials, indicating that they successfully exerted the required effort across conditions.

      Second, regarding the slower response speed for others (we used response speed instead of completion time, as the former is more suitable for statistical analysis; see details in Responses to Reviewer #1’s Recommendations point 4), we interpret this as a reduction in motivation rather than a reduction in the amount of effort exerted. Similarly, as detailed in our Responses to Reviewer#1’s point 2, subjective effort ratings in this paradigm appear to be influenced by demand characteristics and do not reliably track physical exertion. For instance, liking ratings were associated with difficulty (r = -0.43, p = 0.011) instead of effort (r = 0.32, p = 0.061) ratings.

      To empirically rule out the possibility that these behavioral differences account for the neural effect, we followed the reviewer’s suggestion and re-ran the mixed-effects model predicting RewP amplitudes with trial-by-trial response speed and subjective effort rating included as covariates. These control analyses revealed that neither response speed (b = -0.07, p = 0.614) nor self-reported effort (b = 0.10, p = 0.186) significantly predicted RewP amplitudes (see Table S6). Most importantly, the key interactions of interest (Recipient × Effort and Recipient × Effort × Magnitude) remained significant and virtually unchanged. These findings suggest that the observed neural after-effects of prosocial effort are not driven by variations in motor execution or perceived effort.

      Minor comments:

      (6) In Figure 5A a quadratic effect (not a linear effect) seems fairly obvious in decision times as a function of effort level. This makes sense given that participants are close to indifference, on average, around the 50-70% effort level. I recommend fitting a model that has a quadratic predictor and not just a linear predictor when regression decision times on effort levels.

      We thank Reviewer #1 for this insightful suggestion. We agree that decision times likely track decision conflict, which typically peaks near indifference points (e.g., moderate effort levels). Accordingly, we reanalyzed the decision time data using a mixed-effects model that included both linear and quadratic terms for effort. As detailed in Table S7, this analysis revealed a significant quadratic main effect of effort, which was further qualified by a significant interaction between the quadratic effort term and reward magnitude. Decomposition of this interaction (Figure S6) revealed that the quadratic effort effect was more pronounced at low reward magnitude (M − 1SD: b = -160.10, 95% CI = [-218.30, -101.90], p < 0.001) than at high reward magnitude (M + 1SD: b = -99.50, 95% CI = [-157.60, -41.40], p = 0.001). However, we found no significant interactions involving the quadratic effort term and recipient. We have updated the Results section (page 13, para. 2; page 14, para. 1) and Figures 5A–B (right panel) to reflect these findings.

      (7) The distinction between the effort and decision-making tasks wasn't super clear from the main text. A sentence early on in the results section could be useful for readers' understanding.

      This point is well taken. In the revised manuscript, we have clarified this distinction at the beginning of the Results section (page 6, para. 2, lines 1–10). In addition, we have explicitly indicated the corresponding task within each subsection heading in the Results:

      “2.1 Investing effort for others is less motivating than for self in the prosocial effort task” (page 7)

      “2.2 Effort adds reward value for self but discounts reward value for others in the prosocial effort task” (page 9)

      “2.3 Reward is devalued by effort to a higher degree for others than for self in the prosocial decision-making task” (page 13)

      (8) To what does "three trials" refer to on lines 143-144?

      Thank you for raising this point. Participants completed three trials in which they were asked to press a button as rapidly as possible with their non-dominant pinky finger for 6000 ms. The maximum effort level was operationalized as the average button-press count across the three trials. To improve clarity, we have also provided more detailed description in the Results section, which reads: “The mean maximum effort level (i.e., the average button-press count across three 6000-ms trials; see Procedure for details) ….” (page 7, para. 1, lines 1–2).

      (9) It is unclear how the authors select their time windows for ERP analyses.

      We thank Reviewer #1 for this comment. Measurement parameters (i.e., time windows and channel sites) were determined based on the grand-averaged ERP waveforms and topographic maps collapsed across all conditions. This procedure is orthogonal to the conditions of interest and prevents bias in the selection of measurement windows and channels, consistent with the “orthogonal selection approach” (Luck & Gaspelin, 2017). We have clarified this point in the revised manuscript, which now reads, “Measurement parameters (time windows and channel sites) were determined from the grand-averaged ERP waveforms and topographic maps collapsed across all conditions, which was thus orthogonal to the conditions of interest (Luck & Gaspelin, 2017)” (page 27, para. 1, lines 6–9).

      Luck, S., & Gaspelin, N. (2017). How to get statistically significant effects in any ERP experiment (and why you shouldn't). Psychophysiology, 54(1), 146-157.

      (10) There are a few typos throughout. For example, Line 124 should read "other half benefitted...", Line 127 should read "interest at each effort level...", "following" on Line 369, and Supplemental table titles incorrectly spell the word "Results".

      We thank Reviewer #1 for catching these errors. We have corrected all the specific typos noted (page 6, para. 2, lines 11 and 15; page 22, para. 3, line 2; Supplementary Table S2). Furthermore, we have conducted a thorough proofreading of the entire text and supplementary materials to ensure linguistic accuracy and consistency throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Lines 84-86. "The RewP ... has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002) and ventral striatum (Foti et al., 2011)." This is a better reference for the ACC source: https://pubmed.ncbi.nlm.nih.gov/23973408/. And perhaps remove the reference to the ventral striatum; most people would agree that activity in the ventral striatum cannot be measured with scalp EEG.

      We thank Reviewer #2 for providing the updated reference, which has been cited in the revised manuscript. We agree that activity in the VS cannot be reliably measured with scalp EEG and thus have removed the reference to the VS. The revised sentence now reads, “… has its neural sources in the anterior cingulate cortex (Gehring & Willoughby, 2002; Hauser et al., 2014)” (page 4, para. 2, lines 12–13).

      (2) Lines 152-153. What exactly is shown in Figure 2A? How did the authors average across subjects?

      We thank Reviewer #2 for raising this issue. Figure 2A depicts the distribution of the maximum effort level, defined as the average button-press count across three 6000-ms trials completed before the prosocial effort task. In these trials, participants were instructed to press the button as rapidly as possible with their non-dominant pinky fingers. To improve clarity, we have revised the figure caption as: “(A) Distribution of the maximum effort level (i.e., the average button-press count across three 6000-ms trials) across participants” (Figure 2).

      (3) Lines 160-164. "As expected (Figure 2D), participants perceived increased effort as more difficult ... and more disliking (b = -0.62, p < 0.001) when the beneficiary was others than themselves." Does this sentence describe the main effect of the beneficiary or the interaction between beneficiary and effort level, as the start of the sentence ("increased effort") suggests?

      We thank Reviewer #2 for pointing out this ambiguity. The sentence describes the main effect of beneficiary rather than the interaction between beneficiary and effort level. In the revised manuscript, we have rephrased the sentence as: “They felt less effort (b = -0.32, p = 0.019) and more disliking (b = -0.62, p = 0.001) for other-benefiting trials compared to self-benefiting trials” (page 9, para. 1, lines 4–6).

      (4) Lines 195-196. "..., we conducted post-hoc simple slopes analyses at -1 SD ("Low") and + SD ("High") reward magnitude." I did not understand what the authors meant with these reward magnitudes, given that the actual potential rewards were ¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0.

      In our analyses, the actual reward magnitudes (¥0.2, ¥0.4, ¥0.6, ¥0.8, and ¥1.0) were z-scored and entered as a continuous regressor in the mixed-effects models. Post-hoc simple slopes analyses were then conducted at ±1 SD from the mean of the z-scored reward magnitude. To clarify, we have revised the sentence as “… we conducted post-hoc simple slopes analyses at 1 standard deviation (SD) below (“Low”) and above (“High”) the mean reward magnitude” (page 11, para. 2, lines 8–9). This standard method for testing simple effects for continuous predictors is recommended by Aiken and West (1991). Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. Sage.

      (5) Lines 253 and 275. I would not call this a computational model. The authors fit a curve to data, there is no model of the computations involved.

      This point is well taken. We have replaced “computational model” with “discounting” (Figure 5) and “parabolic discounting model” (page 15, para. 1, line 15).

      (6) Line 710. Figure S1 does not show topographic maps of the P3, as the figure caption suggests.

      We thank Reviewer #2 for identifying this oversight. We have now included topographic maps of the P3 in Figure S1.

      (7) Please check language in lines 33 (effect between), 38 (shape), 49 (highest cost form?), 74 (tunning), 90 (omit following), 127 (interest on at each effort level), 135 (press buttons >> rapidly press a button?), 142 (motivated), 219 (should low be high?), 265-266 (missing word), 275 (confirmed by following), 292 (an action can be effortful, a feeling cannot), 315 (when it comes into), 330-331 (data is plural; the aftereffect of prosocial effect), 387 (interest on at each effort level), 405 (should quickly be often?).

      We thank Reviewer #2 for the careful review and feedback about these language issues. We have revised all the phrasing you identified. The corrections are as follows:

      Line 33: “effect between” has been changed to “effects for” (page 2, para. 1, line 6).

      Line 38: “shape” has been updated to “shapes” (page 2, para. 1, line 13).

      Line 49: “highest cost form?” has been revised to “the most common cost type” (page 3, para. 1, lines 7–8).

      Line 74: “tunning” has been corrected to “tuning” (page 4, para. 2, line 1).

      Line 90: omit following. Done (page 5, para. 1, line 2).

      Line 127: “interest on at each effort level” has been corrected to “liking for each effort level” (page 6, para. 2, line 15).

      Line 135: “press buttons” has been updated to “rapidly press a button” (the caption of Figure 1).

      Line 142: “motivated” has been revised to “motivating” (page 7).

      Line 219: should low be high? Yes, we have corrected this (the caption of Figure 4).

      Lines 265–266: The missing word “with” has been inserted (page 15, para. 1, line 2).

      Line 275: “confirmed by following” has been revised as “corroborated by a parabolic …” (page 15, para. 1, line 15).

      Line 292: an action can be effortful, a feeling cannot. We have changed the word “effortful” to “effort” (page 18, para. 2, line 3).

      Line 315: “when it comes into” has been revised to “when it came to” (page 19, para. 1, line 10).

      Lines 330–331: These two expressions have been revised to “our data establish …” and “the after-effect of prosocial effort” (page 20, para. 1, lines 2–3).

      Line 387: “interest on at each effort level” has been corrected to “interest at each effort level” (page 23, para. 2, line 5).

      Line 405: should quickly be often? We agree that “quickly” might imply latency or speed of a single press, whereas the task required maximizing the frequency of presses within the time window. To capture this meaning accurately, we have revised the phrase to “pressed a button as rapidly as possible” (implying repetition rate) in the revised manuscript (page 24, para. 2, lines 3–4).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity.

      (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in the blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development.

      (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype.

      All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individuals of origin). Here, you could model the sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      We appreciate this good suggestion. We performed an analysis along these lines, and found that it supported the conclusion of a lack of strong relationship between microglial and macrophage chimerism. In particular (and as we now have added to the Methods):

      “To perform an analysis of Fig. 2D that takes into account the uncertainty in the estimate of the chimeric cell proportion, we performed a binomial generalized linear mixed-effects model analysis in R using the command glmer( y~(1|indiv) + chimerism_micro, family=binomial), where y is a vector (of length 1,333) containing the genomic identity of each macrophage (either host or twin), 1|indiv models a random effect for the identity of each animal, and chimerism_micro is the microglia chimerism of the animal’s brain. The fixed effects probability of chimerism_micro was 0.795, indicating that microglial chimerism fraction was not statistically significant as a predictor for macrophage chimerism fraction. The estimate for the intercept was -0.8115 and the estimate for chimerism_micro was 0.3106, which indicates that the probability of a cell is a macrophage given the microglia chimerism fraction was only 0.57 (plogis(-0.8115+0.3106)).”

      We have added the following in the main text:

      “We investigated further by performing a statistical test that takes into account the uncertainty in the estimates of the chimeric cell proportion using a binomial framework (Methods); in this analysis, microglia chimerism fraction was not a statistically significant predictor of macrophage chimerism fraction (Methods). This suggests that in addition to the cell’s genome, other factors such as local host environment play a role in differential recruitment, proliferation or survival of the sibling cells. (We note that macrophages often transit the fluid-filled perivascular space, with a substantially different migration history and arrival dynamics than microglia.)”

      Given this new analysis, and our original observation that the Pearson correlation was only 0.31, we believe that other factors in addition to the cell’s genome play a role in differential recruitment or survival of sibling cells.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Figure 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      We performed the analysis along these lines and added the following in the Methods section:

      “We used the same framework to further analyze Fig. 4. We included brain region as a covariate in the binomial framework: glmer( y~(1|indiv) + brain_reg + assay, family=binomial), where, y is a vector (of length 48,439) containing the genomic identity of each microglia, and assay is either “Drop-seq” or “10X”. The brain regions assayed in Fig. 4 are the cortex, hippocampus, hypothalamus, striatum, thalamus, and basal forebrain. All these brain regions were statistically significant predictors for microglia chimerism fraction (all P-values<2x10<sup>-16</sup>), supporting the conclusion that chimerism varies across brain regions. We also re-analyzed Supplementary Fig. 4 (Fig. 4B in original manuscript) using the same framework and found that 18 out of 27 brain substructures were statistically significant predictors for microglia chimerism fraction.”

      We have added the following sentences in the main text:

      “We used the binomial generalized linear mixed-model framework and found that all brain regions were statistically significant predictors for microglia chimerism fraction, supporting the conclusion that chimerism varies across brain regions (Methods).

      Analysis of finer brain substructures showed a similar result (Supplementary Fig. 4; the binomial generalized linear mixed-model framework determined that 18 out of 27 brain substructures were statistically significant as predictors for microglia chimerism fraction, Methods).”

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      We like this idea, but our study is underpowered for eQTL analysis since we only have 14 data points in the correlation analysis (eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses).

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differentiate more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Figure 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      We agree; to help clarify this for readers, we added the following sentence at the end of the paragraph discussing Fig. 5A-C.

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings. We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      And in the caption of Fig. 5A-C, we have included the statistical threshold for identifying DEGs:

      “In (A) to (C), each point represents a gene; its location on the plot represents the level of expression of that gene among microglia with two different genomes in the same animal. x- and y-axes: normalized gene expression levels (number of transcripts per 100,000 transcripts). FC: fold-change of gene expression, female/male for XIST. Fold-change and P-values were calculated using the binomTest method from the edgeR package (Robinson et al., 2010). Differentially expressed genes (black dots) were defined as: FDR Q-value<0.05 and fold-change>1.5 (in either direction) and the gene must be expressed in at least 10% of at least one of the two sets of microglia being compared.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide strong evidence that all chimeric cells are derived from hematopoietic cell lineages.

      This work will have an impact on studies using marmosets to investigate various biological questions but will have the biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interactions between microglia, macrophages, and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for the brain, liver, and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual, and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to the chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study. Third, the snRNA-seq data will be made available through the Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through GitHub.

      Weaknesses:

      I find no major weaknesses, but several minor ones. First, the main text of the manuscript provides no information about the specific animals used in this study, other than sex. Some basic information about the sources of animals and their ages at the time of study would be useful within the main paper, even though more information will be available in the supplementary material.

      We moved the table containing animal information (age at time of study, sex, source, tissues analyzed) from Supplementary Table 1 into the main text as Table 1. We also added the following sentences starting on line 140:

      “Brain snRNA-seq was performed on 11 animals (6 adults, 3 neonates and 1 six months old; Table 1). All were unrelated except for CJ006 and CJ007 which are birth siblings, and CJ025 and CJ026 which are (non-birth) siblings. All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization in Massachusetts. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single cell atlas of the marmoset brain. The three neonates had died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      Second, it is not clear why only 14 pairs of animals were used for estimating the correlation of chimerism levels in microglia and macrophages. Is this lower than the total number of pairwise comparisons possible in order to avoid using non-independent samples? Some explanation would be helpful.

      Only birth siblings (twins and triplets) can be meaningfully included in this analysis. The 14 pairs of animals we used to estimate the correlation of chimerism levels in microglia and macrophages included all pairs that we could use for this analysis: eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses.

      Finally, I think more analysis of the consistency and variability of gene expression in microglia across different regions of the brain would be valuable. Are there genetic pathways expressed similarly in host and sibling microglia, regardless of region of the brain? Are there pathways that are consistently expressed differently in host vs sibling microglia regardless of brain region?

      For brain-region differences in microglial gene expression, we are under-powered and would only be scratching the surface of a question (interesting but beyond the focus and scope of this paper) that needs deeper experimental sampling.

      For the questions about sibling-sibling differences (regardless of which sibling is host) and recurring host-sibling differences, we can do a stronger analysis, because these analyses have similar power to each other. We describe this analysis in the revised manuscript as follows:

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings.”

      We also, as suggested, tried to get beyond single-gene analyses to expression of programs/pathways, by performing latent factor analysis on the single-cell gene expression measurements. 

      “Following the method described in (Ling et al., 2024), we performed latent factor analysis using the probabilistic estimation of expression residuals (PEER, Stegle et al., 2010) on the gene-by-donor matrix expression of microglia. We started by creating a gene-by-cell matrix of microglia gene expression from all animals, and we normalized the matrix using SCT transform version 2 (Choudhary and Satija, 2022) with 3000 variable features. We obtained the Pearson residuals from SCT normalization and summed up the residuals across cells with the same genome to obtain a gene-by-donor matrix of expression measurements of microglia. We used this matrix as input to PEER and ran the tool with a provided number of factors from 9 to 12. For each gene-expression latent factor, to evaluate whether host/sibling identity had a consistent effect on expression levels, we performed a linear regression with host/sibling identity using glm(peer_factor_k ~ host_or_twin). For all factors, the P-values for the effect of host_or_twin were all insignificant (greater than 0.1), indicating that no PEER factor associated with host-vs-twin identity. Thus, our results found no large-scale gene expression program that was consistently expressed differently between hosts and twins.”

      We have added the text above to the Methods section, and we added the following at the end of the section on Gene-expression comparisons of host- to sibling-derived microglia (lines 264-267):

      “We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      Gene-expression pathways/factors did (within some animals) did show host-twin differences in expression levels, but without a consistent host-twin direction of effect that was shared across the many host-twin comparisons. In particular, we used the PEER analysis that we have performed above and calculated the host-sibling expression level difference for each latent factor. Many factors differed in expression in individual cases, though none did so in all cases nor in a consistent-sign manner:

      Author response image 1.

      Difference between host and sibling expression of gene-expression latent factors for each of the 12 factors computed (using PEER) from the single-cell dataset. For a given factor, the factor expression value of the sibling-genome cells is subtracted from that of the host-genome cells and the difference is divided by the maximum of the absolute value of all elements in that factor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the introduction (line 62), the authors mention that chimerism might have shaped behavior in marmosets (and perhaps been selected for). It would be helpful to see this revisited in the discussion. Is it possible that additional genetic variation in immune cells (resident and circulating) provides adaptive benefits and/or disease resistance? In the case of microglia, could the proportion of sibling cells be related (either positively or negatively) to local/regional pathology?

      We liked this suggestion and have added the following in the Discussion:

      “Chimerism could also enable interesting future analyses of whether there are adaptive benefits of chimerism in marmoset immune cells, among whom chimerism could in principle allow presentation of a wider variety of antigens for adaptive immunity. In a recent outbreak of yellow fever in Brazil in 2016-2018, marmosets were found to be less susceptible than other primates that lack immune system chimerism, including the howler monkeys (Alouatta), robust capuchins (Sapajus), and titi monkeys (Callicebus) (de Azebedo Fernandes, et al., 2021). In studying future outbreaks in marmosets, one could use single-cell RNA-seq and the methods described here to study how genetically distinct immune cells (in the same animal) have differentially migrated to affected tissues and/or assumed "activated" immune cell states. Recent innovations in spatial transcriptomics with sequencing readouts (that detect SNP alleles) may also make it possible to identify any differential recruitment of genetically distinct immune cells to focal infection sites.”

      Minor comments:

      L300 delete "temporal.”

      We have revised the text accordingly.

      L305: "more-restricted" should not be hyphenated.

      We have revised the text accordingly.

      L309: "from the non-cell" - delete "the.”

      We have revised the text accordingly.

      L367: Louvain, not Louvaine.

      We have revised the text accordingly.

      Figure 2B can be removed - it does not add much information and takes up a lot of space.

      We have moved Figure 2B to panel J Supplementary Fig. 1 (it is now displayed together with all other animals).

      The same can be said for Figure 4B, which is too tiny. There might be more effective ways to show this variation across animals.

      We have moved Figure 4B to Supplementary Fig. 4 and we have increased the font sizes to make the text in the figures more readable.

      Reviewer #2 (Recommendations for the authors):

      I would suggest providing some basic information about the sources of study animals within the main text. At a minimum, it would be useful to state which colonies are represented in the data, and if there is anything significant about the individual animal histories (e.g. prior exposure to surgical intervention or infectious disease). I believe this basic information should be in the main text, despite the inclusion of a broader range of information in the supplements.

      We appreciate this suggestion and revised lines 143 to 149 of the main text as follows:

      “All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single-cell atlas of the marmoset brain (Krienen et al., 2020; Krienen et al., 2023). The three neonates died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      I would include the species name (Callithrix jacchus) in line 48.

      “On lines 47-48, we now indicate the name of the genus: “Chimerism is common, however, in the Callitrichidae family that consists of the marmosets (Callithrix) and their close relatives the tamarins (Saguinus)...”

      Then on line 65, we now indicate the species name: “Here, we analyze chimerism in the common marmoset (Callithrix jacchus) brain, liver, kidney and blood,...”

      The word "organisms" in line 59 should be "organs.”

      We have modified the text accordingly.

      Lines 100-101: I would suggest this would be clearer to readers if it read: "The relative likelihoods of the original source of each cell could be strongly...".

      We have modified the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      Weaknesses:

      The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.

      We thank the reviewer for this comment and would like to clarify that there has been a misunderstanding: our EEG analyses were time-locked to actual tone onsets, not to expected beat positions. For both music and shuffled conditions, ERPs were computed by epoching around all real auditory events present in each stimulus. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli. We have now clarified this further in the revised manuscript (p. 9).

      Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.

      Weaknesses:

      I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.

      Thank you very much for your thoughtful suggestion regarding the title. To ensure clarity and to unambiguously indicate that our study focuses on the period after birth, we agree that specifying "first postnatal year” in the title is appropriate. We have revised the title accordingly.

      The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.

      We agree that due to the high variability and limited differentiation of infant motor responses at this age, it is important to consider an overall measure of movement in addition to specific PMs. To address exactly this, we had included an analysis in which we combined all 10 PMs into a single global QoM metric. This ‘All PMs’ measure reflects the overall motor response to the different auditory stimuli. For clarity, this result is presented in Figure 5, where we show the denoised global QoM signal and highlight the observed Condition × Age interaction (which averaged QoM for all PMs and is therefore equivalent to QoM without PM distinction). We now emphasize this analysis more clearly in the Results section (p. 16).

      Reviewer #3 (Public review):

      Summary:

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths:

      This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.

      Weaknesses:

      (1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.

      We agree with the reviewer that the differences in neural responses to high-pitch versus low-pitch stimuli between 6-month-olds and other infants are difficult to interpret. We have offered several possible explanations for these findings, including developmental changes in auditory plasticity, social interaction effects, maturation of the auditory system, and arousal or exposure differences. If the reviewer has additional perspectives or alternative explanations, we would be very pleased to incorporate them into the revised manuscript.

      (2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.

      We appreciate the suggestion that exploring links between neural and movement responses would be valuable, especially given the study's goals. We were initially cautious about interpreting potential Granger-causal relations between neural and motor activity, as temporal scale differences between the two measures can easily bias directionality estimates. Neural responses typically occur on the scale of milliseconds, whereas movement unfolds over seconds. As a result, an apparent directional relation might emerge simply due to these intrinsic timescale differences rather than reflecting genuine causal influence.

      Nevertheless, we agree that this relationship warrants further investigation and added the following analyses to the supplements (p. 9). Accordingly, we conducted additional exploratory analyses to examine whether ERP amplitudes correlated with movement measures. To this end, we computed correlations between neural and movement responses using participant-averaged data (not single trials). For neural measures, we extracted mean ERP amplitudes in the time window post-tone-onset encompassing the P1 component derived from cluster-based analyses. For movement measures, we used: (1) total movement quantity (mean velocity across the entire trial), and (2) Granger causality F-values reflecting music-to-movement coupling strength. These analyses included comparisons between music and shuffled music conditions, as well as between high- and low-pitch conditions. We therefore ran two linear mixed-effects models, with ERP amplitudes as response variables and either QoM or Granger causality F-values as fixed effects. Infants were modelled as random intercepts. Our results showed no significant correlations between ERP amplitudes and movement quantity, irrespective of conditions (p>.124), and neither when comparing music vs shuffled music (p>.111) nor when comparing high vs low pitch (p>.071) across all age groups. We also do not find significant correlations between ERP amplitudes and Granger causality F-values, irrespective of conditions (p>.164), and when comparing music vs shuffled music (p>.494) or high vs low pitch (p>.175) across all age groups. The absence of robust correlations suggests that neural sensitivity to musical structure (as indexed by ERPs) and motor responsiveness to music (as indexed by movement quantity or coupling strength) develop somewhat independently during the first year of life. This dissociation aligns with broader developmental theories proposing that perceptual sensitivity often precedes and enables later motor coordination, rather than developing together.

      (3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?

      We agree this is important. Infants in each age group were within a quite narrow age range (3 months: M=113.04 days, SD=5.68 days, Range=98-120 days, 6 months: M=195.88 days, SD=9.46 days, Range=182-211 days,12-13 months: M=380.44 days, SD=14.93 days, range=361-413 days), as detailed in the sample description on p. 37. Despite this, we asked parents to report on infants' major motor milestones, specifically their ability to sit and/or walk. At 6 months, 25% of infants were able to sit (N = 20), and at 12 months, 50% of infants were able to walk (N = 18). Given the relatively small group sizes for these milestones, we are concerned that conducting detailed analyses could yield unstable or misleading results that may not generalize beyond our sample. Therefore, we chose to focus on broader analyses that are more robust given our current dataset. We fully support your suggestion that future studies with larger samples and more comprehensive motor assessments will better clarify these developmental trajectories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the analysis and findings on auditory-evoked spontaneous movement are highly interesting, the results from the neural data raise questions about the genuine role of music in the observed evoked and induced responses.

      General comments on the findings related to neural data

      (1) The main neural finding is a larger response in the Music condition compared to the Shuffled Music condition. To address their hypothesis, the authors computed the AEP to tones at the beat position and compared responses between the Music and Shuffled Music conditions, aligning the onset to the expected beat position. However, given that inter-onset intervals were permuted in the Shuffled condition, an AEP time-locked to the expected beat position is not meaningful, as no tone is expected at that time. Therefore, it is expected to have a relatively flat AEP in response to the shuffled condition. Furthermore, given the reduced regularity in the Shuffled condition, the observed difference in ASSR at the beat frequency is expected. Similar results could be obtained using an isochronous sequence of pure tones and a shuffled version of the same sequence. Therefore, these two analyses do not strongly support the conclusion of infants' enhanced neural responses to music.

      The authors could consider comparing AEPs by aligning onsets in the Shuffled condition to the actual tone positions, potentially focusing only on tones with sufficiently long preceding and following IOIs to avoid confounds from short intervals. The two conditions could then be compared with correction for the number of tones. Potential differences in this case could have suggested an impact beyond the auditory evoked responses.

      We agree that ASSR analyses at the beat frequency is not enough to evidence enhanced neural responses to music. However, we would like to clarify that for the AEP analyses, the EEG data were epoched to all actual tone onsets rather than the expected beat positions, therefore adding to the ASSR analysis. Thus, for the shuffled music condition, the EEG was aligned with the real tone onsets present in that sequence, not with hypothetical beat positions derived from a regular rhythm. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli.

      We further clarify this in the results section on p. 9

      “Figure 2 shows the average ERPs to the bassline notes in the auditory stimuli, with EEG data time-locked to actual tone onsets (see Methods for details).”

      Finally, following the reviewer’s suggestion, we carried out three control analyses: 1) including only epochs corresponding to bassline tones whose prior inter-onset interval (IOI) exceeded the median IOI duration, 2) including only epochs corresponding to bassline tones whose subsequent IOI exceeded the median IOI duration, and 3) including only epochs corresponding to both melody and bassline tones whose prior and subsequent IOI exceeded the median IOI duration. These analyses yielded event-related potentials in the shuffled music condition that were highly similar to those obtained when all epochs were included (see Figure S1). Therefore, the greater neural response to music compared with shuffled music likely reflects an effect of predictability in the musical condition or, more generally, infants’ disengagement with the shuffled stimuli.

      It would also be helpful to see whether the authors explored other approaches for evaluating neural responses across conditions, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), and whether these yielded comparable results.

      Thank you for this question. We have not explored these approaches, but we agree that alternative methods for evaluating neural responses, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), could offer complementary insights. Given the scope and focus of the present work, and the already extensive set of neural and behavioral measures reported, we chose to prioritize analyses most directly relevant to our initial research questions. Incorporating further methods might risk complicating the narrative and obscuring the key findings. We appreciate the value of these additional methods and consider them promising avenues for future investigations.

      (2) Another important finding concerns the difference in AEPs between the High Pitch and Low Pitch conditions in 6-month-old infants, a pattern not observed in the younger (3-month) or older (12-month and adult) groups. The authors interpret this as heightened sensitivity to high-pitch sounds, typical of infant-directed speech. However, the absence of this effect at 12 months raises questions. It would be helpful to consider whether this pattern may be influenced by data quality differences across age groups. Additionally, the authors could discuss this observation in relation to studies showing stronger neural tracking of rhythms in infants, particularly for low-frequency sounds (e.g., Lenc et al., Developmental Science, 2022).

      This is an interesting consideration that we investigated further. Regarding data quality differences, we considered different measures and now report these in the methods section (p. 30) and supplements (p. 1).

      “We conducted two analyses to compare the EEG data quality across age groups. First, we compared the number of trials that were included in the final analysis per age group. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest (i.e., 2.25 Hz, matching the musical beat) by the background noise in surrounding bins (3rd to 5th bin, see ASSR methodology for further details; c.f., Christodoulou et al., 2018; Cirelli et al., 2014). This division yields a signal-to-noise ratio that can be averaged across conditions and compared across age groups to assess variations in signal quality (especially when focusing on the pitch conditions with the same beat frequency). Here, we find that all three age groups show considerable SNR above 1 (3m: M = 2.569, SD = 1.104; 6m: M = 2.743, SD = 1.001; 12m: M = 1.907, SD = 0.749), with no statistically significant differences (three t-tests, FDR-corrected, p > .134). Importantly, our key comparison of High vs. Low Pitch was performed within each age group, thus controlling for any overall differences in signal quality across groups. Together, these two analyses indicate that signal quality was comparable across age groups.”

      Overall, these control analyses seem to support the observed high-pitch sensitivity in the neural response of 6-month-olds, specifically, and in line with previous research investigating this age range (Trainor & Zacharias, 1998; Fernald & Kuhl, 1987). What is more is that there might be some particular changes towards the end of the first year that mark infants’ widening of their attention towards others (beyond their primary caregivers) and objects in their environment (Cooper et al., 1997; Newman & Hussain, 2006), as well as a decrease in exposure to face-to-face interactions with their primary caregivers (Jayaraman et al., 2015). Taken together, research shows that infants' preference for infant-directed speech decreases significantly between 4.5 and 9 months, coinciding with developmental changes in attentional systems and social interaction patterns. This might explain the absence of high-pitch sensitivity in 12-month-olds. However, further research is needed to determine if and in which contexts high-pitch sensitivity to music changes throughout infancy.

      We also edited the discussion in order to compare our results to those of Lenc et al., 2023, p. 23: “It should also be noted that our musical stimuli comprised polyphonic (two-voice) music, carrying sound frequencies falling within the typical range of infant-directed song (~200-400 Hz, Cirelli et al., 2020; Nguyen, Reisner, et al., 2023b; Trainor & Zacharias, 1998). As such, our results might specifically speak for infants’ ability to separate (and prioritize among) simultaneous communicative auditory streams (Marie & Trainor, 2013; Trainor, 2015). Indeed, other studies presenting one-voice pure tone sequences (single isochronous and isotonous tones) with high vs. low pitch - notably at frequencies outside our range (130 vs. 1237 Hz) - have reported stronger neural responses to relatively low frequencies (Lenc et al., 2023). Together, these contrasting observations suggest that pitch prioritization changes not only throughout development but also depends on the polyphonic complexity and spectral characteristics of the perceived stimuli. Further research might investigate this interesting issue further.”

      (3) It would also be helpful if the authors provided more detailed information on the stimuli, including both temporal/rhythmic and spectral content, for the original music, high-pitch and low-pitch variations, and shuffled versions.

      Absolutely. We agree that this is important to report. We have added a Table to the Results (Table 1) and a Table S1 with M, SD and range of the envelope to further describe the temporal and spectral features of the Stimuli.

      General comments on the findings related to body kinematics

      (4) Quantification of movement based on the PMs did not lead to any differences between the High Pitch and Low Pitch conditions. However, Granger causality showed high prediction strength for the High Pitch condition. In the discussion, the authors proposed that high-pitch music might have led to higher arousal. If this were the case, one might expect to observe increased movement in the High Pitch condition relative to the Low Pitch condition in the PM analyses. I propose that the authors revise the discussion to address the misalignment between different findings.

      We thank the reviewer for highlighting this important point and welcome the suggestion to clarify the relationship between movement quantification based on principle movements (PM) and the Granger causality results. We agree that the apparent discrepancy between these measures merits further clarification. We note that the discrepancy suggests that Granger causality may capture subtler temporal coordination between movements and the music, rather than gross movement magnitude. We have incorporated this reasoning into the revised discussion paragraph (page 23-24), which now reads as:

      “If increased arousal were to result in greater overall movement, we would expect higher movement levels in the high pitch condition; however, this was not observed. QoM analyses based on the PMs did not reveal significant differences between the high pitch and low pitch conditions. This discrepancy may arise because Granger causality captures subtler temporal coordination between movement and music rather than gross movement quantity. Thus, high-pitch music may modulate the timing and coordination of motor responses without necessarily increasing the overall amount of movement. In line with prior work (e.g., Bigand et al., 2024), this interpretation emphasizes that musical coordination often involves changes in coupling strength rather than movement quantity per se.”

      (5) The authors report a lack of periodicity and phase-locked movement in infants. Considering the developmental stage, I assume that spontaneous movements to music have emerged over short periods during each exposition period. Probably to further investigate movement periodicity, which has been previously suggested, the authors can first automatically extract periods of periodic movement and further evaluate the tempo/frequency and synchronization with the stimulus during these specific periods.

      We thank the reviewer for this thoughtful suggestion. We conducted similar analyses prior to submission, using methods comparable to previous studies (Fujii et al., 2014). These analyses did not yield additional insights beyond those already presented in the manuscript, so we opted not to include them initially. For completeness, we briefly mention these results on p. 19:

      “Robustness analyses based on thresholding of variation in the time series to identify movement burst epochs (similar to Fujii et al., 2014) yielded consistent results. No significant movement-to-music synchronization was found across age groups (all ps > .563).“

      It is important to clarify that while movement periodicity in infants listening to music has been previously suggested, the evidence for actual synchronization to musical beats remains limited and has been frequently misinterpreted in the literature. The seminal study by Zentner and Eerola (2010) is often cited as evidence for infant rhythmic entrainment, but their findings actually demonstrated tempo flexibility rather than synchronization, i.e., infants moved faster when the music was faster. Similarly, Fujii et al. (2014) found that while individual infants showed some movement-to-music coordination, this occurred in only 2 out of 11 tested infants (18%), and the authors emphasized that "movement-to-music synchronization is rare in infants and observed at an individual level".

      (6) A last general comment is that the authors try to explain the findings of the current study, providing hypotheses, for instance, on the origin of differences in the neural response to high and low pitch only at 6 months. It would be helpful if the authors also consider the misalignment of results with previous findings.

      We thank the reviewer for this comment and acknowledge the importance of placing our findings in the context of prior research on infant pitch perception, including some apparent inconsistencies such as those noted for Lenc et al. (2023), which we have addressed in our response to comment 2. We agree that results inevitably vary across studies due to differences in methods, stimuli, and participant samples—all factors that contribute to some variability in developmental trajectories observed in the literature.

      Importantly, our observation of a transient difference in neural responses to high versus low pitch emerging at 6 months aligns with existing evidence indicating significant neural reorganization occurring around this age (Carr et al., 2022) and continuing toward 12 months (Kuhl et al., 2014). This may reflect a sensitive developmental window during which infants show heightened sensitivity to prosodic features important for early social and communicative interactions. After this window, attentional and auditory processing priorities shift, which could explain the subsequent decline in pitch sensitivity.

      We emphasize that these interpretations are preliminary, and further systematic investigations—preferably longitudinal studies incorporating diverse pitch ranges and multimodal attentional and neural measures—are needed to delineate the developmental course of pitch sensitivity comprehensively.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the opportunity to read this interesting work.

      Thank you for the constructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) I would suggest replacing "first year of life" with "first post-natal year".

      Thank you for the suggestion. In line with yours and Reviewer #2’s comments, we have revised the title to “first postnatal year”.

      (2) Precising the music paradigm and the stimuli nature/timing would be useful at the beginning of the Results section.

      We agree and have added two tables (Table 1 and Table S1 for continued information on the envelope) for further information about the paradigm and stimuli to the beginning of the results section (p.8).

      In addition, the stimuli are also shared on a repository: https://doi.org/10.48557/DCSCFO.

      (3) Since the infants moved during the experiment, EEG data might show movement artefacts. Was the approach used to correct these artefacts satisfactory, even in 12-month-olds who moved more?

      We appreciate the reviewer’s important question regarding artifact correction in infant EEG data, especially given increased movement in older infants. We recognize that movement-related artifacts are an inherent challenge in EEG recordings with infants, and complete elimination of such artifacts is technically difficult (if not impossible). However, several points support the robustness of our ERP findings despite spontaneous movement:

      First, we used a two‐stage pipeline to maximize artifact removal without bias: First, Artifact Subspace Reconstruction (ASR) repaired brief, high‐variance artifacts by reconstructing contaminated channels from clean data. Second, Independent Component Analysis (ICA, as implemented in ICLabel) decomposed the ASR‐cleaned EEG into independent components, allowing us to remove residual non‐neural artifacts (e.g., eye movements) based on their spatial and spectral features. Both ASR and ICA operate agnostically to condition or age group and automatically, without subjective decisions, ensuring unbiased cleaning and reliable ERP comparisons.

      As noted in the response to R1 Comment (2), we also compared the EEG data quality across age groups and conditions. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest and found no statistically significant differences across age groups (three t-tests, FDR-corrected, p > .134). Together, these two analyses indicate that signal quality was comparable across age groups.

      Infant movements during the session were sporadic and, most importantly not time-locked to tone onsets (see Fig S2). Because artifact rejection (namely, Artifact Subspace Reconstruction and Independent Component Analysis) discarded only those epochs containing large, transient artifacts irrespective of condition, residual movement-related noise would not systematically inflate ERPs.

      (4) The timing of the P200 response peak could be specified in adults as for infants.

      The timing of the P200 in adults is mentioned on page 9: “[…] a second positivity peaking at 158 ms post-stimulus (so-called “P200”, here reaching an amplitude of 0.85 µV).” The timing of the infant P2 is specified on p 10 and 11: “The P2 ranged between 307 and 325 ms post-stimulus and peaked at 316 ms, reaching an average amplitude of 1.026 µV.”

      (5) In infants, the evocation of "peaking at 212ms" is not completely clear: does this timing correspond to the P1 peak at 3 months of age or to the time when the response to music was enhanced compared to shuffled music?

      Thank you for highlighting the need for greater clarity regarding the timing of the P1 peak and its relation to the observed enhancement. We have revised the text to explicitly state that 212 ms corresponds to the P1 peak in 3-month-old infants within the window where the response to music was significantly enhanced compared to shuffled music.

      p.9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Cluster-based permutation (nPerm=1000) testing revealed that 3-month-old infants’ P1 amplitude was enhanced between 177 and 305 ms post-stimulus (cluster-t=1111.90, p=.002). Within this window, the P1 peaked at 212 ms and reached an amplitude of 1.8 µV.”

      (6) It might be useful to put the results of this study into perspective with other studies of infant motor development (e.g., Hinnekens et al, eLife 2023).

      Thank you for pointing out this study. We have integrated the Hinnekens et al. (2023) findings into our discussion of infant motor development toward dance-like behaviors. p.22 “Taking a broader perspective on infants’ motor development, our findings align with research on locomotion across the first 14 months of life, which shows that as the number of motor primitives increases, their intrinsic variability decreases (Hinnekens et al., 2023). Viewed together, these patterns point toward a gradual refinement of motor control: the human motor system first develops the capacity to control individual muscles, and gradually to integrate them into motor synergies that support complex, coordinated behaviours, such as locomotion, musical synchronization, and dance.”

      (7) Regarding the progressive maturation of the auditory/linguistic pathways during infancy, the authors might also refer to (Dubois et al, Cerebral Cortex 2016).

      Thank you for the suggestion. We added the study to the discussion on page 22: “This developmental trajectory aligns with neuroimaging evidence showing that while the ventral linguistic pathway (connecting temporal and frontal regions via the extreme capsule) is well-established at birth, the dorsal pathway—particularly the arcuate fasciculus connecting temporal regions to inferior frontal areas—continues maturing throughout the first postnatal months, with different maturational timelines for dorsal versus ventral connections (Dubois et al., 2016).“

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Weaknesses:

      (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

      (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

      (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

      (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

      Reviewer #2 (Public review):

      Summary:

      The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

      Strengths:

      The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

      Weaknesses:

      It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

      I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

      Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      I am very appreciative of the insightful comments you all shared, and in light of them have made several clarifications and revisions. Thank you again, I am grateful to have received such considered feedback and I hope I’ve addressed any outstanding issues. I have replied to each reviewer’s recommendations in this document sequentially for ease of scanning, and am most grateful for the summary strengths and weaknesses, which I am also incorporated into these replies. Thank you again!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript makes the important argument that many meta-analyses are inherently fragile, which aligns with prior work (e.g., PMID: 40999337). Please add the reference to the statements.

      Excellent point, thank you – I’ve expanded the discussion of fragility analysis, and its application to meta-analysis, including this reference.

      (2) The rationale and mathematical underpinnings of the proposed EOI and ROAR methods are not sufficiently explained. While the authors cite Grimes (2022, 2024b), readers are expected to rely heavily on these external sources without adequate exposition in the current paper. This limits the ability to fully evaluate the reasonableness of the methods or to reproduce the approach. I strongly recommend expanding the description of EOI and ROAR within the manuscript.

      I agree fully – I was a little remiss in this scope, as I was worried about overwhelming the reader. However, I was too sparse with detail and have now extended the text this way to describe the methods intuitively as possible (see Discussion, subsection “Ellipse of Insignificance and Region of Attainable Redaction”

      (3) In the Methods, the authors note that EOIMETA is applicable when between-study heterogeneity is low. However, the manuscript provides little guidance on how to interpret results when heterogeneity is high (e.g., larger I² values). I recommend clarifying this issue in the Results or Discussion sections, emphasizing the limitations of EOIMETA under high heterogeneity. Ideally, the authors could include either a small simulation study or an illustrative example to demonstrate the performance of the method in such settings.

      This is an excellent question, and I was remiss for not considering it better in the manuscript. Originally, the simple idea was to just pool the results for EOI, in which case heterogeneity would be an issue. But I then subsequently added weighed-inverse variance methods to account for situations with increased heterogeneity, so my initial comment was not strictly correct. I’ve changed the text in several places, notably in the methods and in the discussion (see reply point 5).

      (4) While EOIMETA is introduced as a generalizable fragility metric for meta-analyses, the illustrative examples would benefit from clearer comparisons with the traditional Fragility Index (FI). Because FI is well established in the RCT literature and familiar to many readers, presenting side-by-side results (e.g., FI at the trial level versus EOIMETA at the meta-analytic level) would provide important context. Such comparisons would also highlight the added value of EOIMETA, underscoring that even when individual trials appear robust under FI, the pooled meta-analysis may remain fragile.

      This is an excellent idea! The new table is given below. Note that traditional FI are not defined for non-significant results, and EOI is ambiguous for counts <2.

      (5) In the Discussion currently states that the Fragility Index (FI) applies only to binary outcomes. This is not entirely accurate. While the original FI was indeed developed for dichotomous endpoints, subsequent methodological work has extended the concept to other data types, including continuous outcomes (continuous fragility index, CFI). The manuscript should acknowledge this distinction: EOIMETA presently focuses on binary outcomes at the meta-analytic level, but FI more broadly is not restricted to binary data. Adding this clarification, with appropriate citations, would improve accuracy and place EOIMETA more clearly within the broader fragility literature.

      Thank you for this catch – clarified now in the discussion:

      Reviewer #2 (Recommendations for the authors):

      (1) Typos/inconsistencies/writing clarifications: All table and figure legends and titles are missing a period at the end of each sentence. In the sentence "to be estimated by bootstrap methods. Initially, we ran...", there should be a space between "methods" and "Initially" (line 113).

      Apologies, these are now remedied.

      (2) In Table 2, the total number of patients in the meta-analysis of all 12 studies is reported as 133,262, whereas the text states 133,475 patients. Based on my calculations from Figure 2, the total appears to be 133,262. Could you please clarify this discrepancy?

      Certainly – your calculations are correct. The text figure was a typo based on a very early draft where the summation function was not correctly run, and doubled counted some cases. This was fixed for the figure but not the text. The text should now match, thank you for spotting this. There are some issues with figure 2, which I will address in next few points.

      (3) Regarding this point, the meta-analysis by Zhang et al. (2019) shows some inconsistencies in the reported number of patients in the paper. According to the data provided on GitHub the total number of patients is 37671. However, Table 1 of the paper lists 38538 patients, and the main text states "5 RCTs involving 39168 patients." Similarly, for Guo et al. (2023), the main text reports that the meta-analysis included 11 RCTs with 112165 patients, whereas the table lists 111952, which appears consistent with the data available on GitHub. There is also a discrepancy in Zhang et al. (2022), which cites 61853 patients in the introduction but 61223 patients in Table 1. These inconsistencies should be clarified, as even small discrepancies in reported sample sizes can undermine the credibility of the analyses presented.

      Well-spotted – the incorrect figures are artefacts of an early draft with a double-counting summation function, and I should have spotted them and removed them prior to submission. To clarify, the correct figures from each study (which agree with github data) are given in the corrected table 1.

      Thus, there are 38,538 subjects in the Zhang et al 2019 analysis, which matches the first sheet of the github listing. The confusion comes from sheet 2 which was included only with this, which breaks these events down into events / non-events (hence the total non-events being 37,671) but keeps the old labels. This is needlessly confusing, and accordingly I have re-uploaded the data with correct headers for sheet 2.  This summation problem was also apparent in the total of figure 2, which has been replaced with a correct version now. Thank you for spotting this!

      (4) In line 158, who does "He" refer to? Please clarify this in more detail.

      Apologies, this was a typo and should have read “the” – now corrected.

      (5) The discrepant results of the RCT by Scragg et al. (2018) between the meta-analysis by Zhang et al. and that by Guo et al. could be presented in a table. This could be included as supplementary material or, preferably, in the main text (Results section).

      To avoid confusion, I will add a version of this to the github files for interested users to explore.

      (6) In the legend of Figure 2, a period is missing at the end of the sentence. Additionally, although it is generally understood, it would be helpful to specify that the numbers in parentheses represent the confidence intervals. Please confirm whether these are 95%, 89%, or 99% confidence intervals.

      Apologies, these are 95% CIs. Clarified now in updated legends.

      (7) The statement of "The more recent and robust methods for fragility analysis (EOI) and redaction (ROAR) have potential applications beyond fragile-by-design RCTs, extending to cohort studies, preclinical work, and even ecological studies, as stated by the author" in line 163. Could you please provide references supporting these claims? I believe the relevant references may be included in the EOI paper, but it would be helpful to cite them here as well.

      This has recently been used in new analysis now cited in the introduction with fuller description of method for context. Please see response to reviewer 1, points 2

      (8) Since the study was previously published as a preprint (https://www.medrxiv.org/content/10.1101/2025.08.15.25333793v1.full-text), this should be mentioned in the manuscript.

      Added as a note now.

      (9) It would also be valuable to include a figure illustrating ROAR for the same meta-analyses presented in Figure 1 for EOI, possibly as supplementary material.

      See reply to point 10.

      (10) Finally, it would be interesting to provide plots of both EOI and ROAR for the meta-analyses of all 12 included studies. These graphs could be replicated using the code examples provided by the author in the original EOI and ROAR publications.

      These have now been added to the github repository as supplementary material.

      (11a) Replications of EOI fragility: eoicfunc.R (github): - In the code provided on GitHub, an error occurred in the "EllipseFromEquation" function within eoifunc. This was due to the PlaneGeometry package not being available for the latest version of R. I attempted several installation methods (using devtools, remotes, and GitHub, as well as direct installation from a URL). However, after adjusting the code, I was able to run the analyses. For the full cohort, including all 12 studies using the EOI approach, I obtained a Minimal Experimental Arm only recoding (xi) = 14 and a Minimal Control Arm only recoding (yi) = 15, whereas the authors reported that 5 recodings were sufficient. It appears that differences in code versions or functions might have slightly affected the results. After downgrading R and running the eoic function with PlaneGeometry successfully installed, the fragility index for the EOI approach was 15 rather than 5.

      Apologies for the issue with PlaneGeometry, I will try to fix this for future iterations. The difference you see is an artefact of running EOIFUNC on pooled data, rather than the dedicated EOIMETA function, with the chief difference being that EOIFUNC doesn’t apply WIV correction.  If we simply pool events, this is the output:

      Author response image 1.

      If the reviewer uses the EOIMETA function which employs inverse weighing, then to define each trial we use a vector of events and non-events in each arm. For all the 12 studies, this would be (in R code syntax, or import from github file)

      Author response image 2.

      Then they will obtain:

      Author response image 3.

      If the reviewer runs a simple pooler analysis with weighed inverse correction turned off, they should return a similar answer as a simple eoifunc call, save the zero count correction difference. But EOIMETA weighs the sample, and is reported in main paper.

      (12) I recalculated the eoic function for Zhang et al. (2019) and found a fragility index (dmin) of 1. FECKUP Vector Length: 0.5722. Minimal Experimental Arm Recoding (xi): 0.7738. Minimal Control Arm Recoding (yi): 0.8499.

      This again appears to be an artefact of using eoifunc rather than eoimeta; with eoimeta, which uses WIV to adjust the studies for heterogeneity effects, this is the reported output:

      Author response image 4.

      (13) Using the previous code (before downgrading R and loading PlaneGeometry), I recalculated the EOI for Zhang et al. (2022) and found Minimal Experimental Arm only recoding (xi) = 55 and Minimal Control Arm only recoding (yi) = 59-results slightly closer to those reported by the authors. After properly loading PlaneGeometry, I recalculated and obtained for Zhang et al. (2022): Fragility index (dmin) = 57; FECKUP Vector Length = 39.948; Minimal Experimental Arm Recoding (xi) = 54.5436; Minimal Control Arm Recoding (yi) = 58.635.

      Again this appears to be a difference in using eoifunc or eoimeta as a call -  I can replicate this result using EOIFUNC:

      Author response image 5:

      But adjusting for study weighing with eoimeta:

      Author response image 6.

      (14) For Guo et al. (2022), the EOI fragility index was 17 [dmin = 17]. FECKUP Vector Length: 11.3721. Minimal Experimental Arm Recoding (xi): -15.6825. Minimal Control Arm Recoding (yi): -16.5167. However, the authors report an EOI fragility of 38. Since I was able to load PlaneGeometry properly and run eoicfunc.R (from GitHub) without errors, the discrepancies likely reflect minor coding or version inconsistencies rather than software limitations.

      These again stem from using eoifunc on simple pooled data versus eoimeta, which adjusts by study.

      (15) Replications of ROAR fragility: roarfunc.R (github): - For Guo et al. (2022), the ROAR fragility calculated using roarfunc.R was 16 [rmin (Redaction Fragility Index) = 16]. FOCK Vector Length: 15.942. Minimal Experimental Arm Redaction (xc): 15.9442. Minimal Control Arm Redaction (yc): 978.8906. In the main text, the author reports a redaction fragility of 37. What might explain these discrepancies?

      Again, this stems from EOIMETA versus EOIFUNC (and roarfunc calls without weighed adjustment). As the reviewer has observed, the fragility increases when there is no study level adjustment, which we have now added to the discussion text.

      (16) In generic_run.R, line 6 contains a bug - it is missing a forward slash (/) between the directory path and the filename. The correct line of code should be: pathload = paste0(pathname, "/", filename, exname). The same issue occurs in generalcode.R.

      Apologies, I will correct this in the upload!

      (17) Theoretical framework: Is there any other method available for comparison besides the one proposed by Atal et al.? Could you include a brief literature review describing alternative approaches?

      To my knowledge, there is not – Xing et al (now referenced) covered this earlier in the year, and I have included an expanded background for this purpose. Please see reply to reviewer 1, point 1.

      (18a) There appears to be no heterogeneity in the meta-analysis in terms of effect sizes and I², likely because most values are quite large, yet the included studies address very different populations (e.g., patients with COPD, NSCLC survivors, older adults, women, and GI cancer survivors). This could have been explained more clearly, including how such diverse literature might influence fragility indices or whether there is a logical rationale for combining these studies. Could you perform a sensitivity analysis or provide a conceptual explanation of how the heterogeneity - or lack thereof - across these trials may affect the fragility indices? Although I² values are small, the conceptual heterogeneity among studies suggests that the pooled results may be comparing fundamentally different clinical contexts, which requires clarification.

      I think this is a very pertinent point, I am unsure as to why these authors combined such diverse populations without any consideration of whether they were comparable, but this is a common problem in meta-analysis. I have added the following to the discussion to address this problem:

      “The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much vitamin D research. (Grimes aet al., 2024). The three studies cited in this work report relatively low heterogeneity in their meta-analysis in both effect sizes and I<sup>2</sup> values, but it is worth noting that the included studies addressed very different populations, including patients with Chronic Obstructive Pulmonary Disease, Non small cell lung cancer survivors, women only cohorts, older adults, and gastrological cancer survivors. These groups have presumably different risk factors for cancer deaths, and why the authors of these studies combined the cohorts with fundamentally different clinical contexts is unclear. Why the heterogeneity appeared so relatively low in different groups is also a curious feature. This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data and methodological rigor in comparing like-with-like, and the conclusions drawn from them must always be seen in context.”

      Reviewer #3 (Recommendations for the authors):

      (1) Line 156, acronym FI not defined.

      Apologies, I this is now defined at the outset as “fragility index”.

      (2) Line 158, typo "He"?

      Apologies again, this was a typo and was supposed to read “the”, fixed now.

      (3) Across the manuscript, I think the "re-coding" phrasing may confuse clinical readers. Maybe rephrasing to "flipping event classification" or "flipping group" would be better.

      Excellent point – this has now been modified at the outset.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

      We now demonstrate in Fig. S9 efficient degron-mediated depletion of both NUF2 and SPC24 in cell-cycle arrested cells by Western blotting. We show similar data for siRNA knockdowns. Our siRNA knockdown experiments include a “siDEATH” control that induces cytotoxicity by targeting several essential genes. In Fig. S6a we now show that siDEATH transfection results in strong cytotoxicity and cell death in cycling as well as cell cycle arrested G1/S and G2/M populations indicating efficient protein depletion. Additionally, in Fig. S6b we now show depletion NCAPH2 protein levels by siRNA knockdown in cycling as well as cell cycle arrested cell populations by Western blot analysis. We mention these results on page 11 and page 13.

      Reviewer #2 (Public review):

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      We discussed our hit selection criteria on page 8 and in the Methods section. Some of the concerns regarding a bias towards non-essential genes are alleviated by the fact that our screen is limited to a relative short duration of 72 hours rather than the longer timepoints that are generally used to assess essentiality in pooled CRISPR-KO screens, allowing us to identify genes that may be essential if eliminated permanently. In support of this notion, we identify subunits of the essential condensin and cohesin complexes as hits with only limited effect on cell viability. In this case, the Z-score for change in cell number upon NCAPH2 knockout was -0.26 indicating only a mild reduction compared to the average cell number across all targets.

      Other confounding effects on hit selection due to micronuclei formation, cell cycle effects etc. are minimized as we closely monitor micronuclei formation and cell viability in our screen. Finally, aneuploidy is similarly not a confounding factor in hit identification since, as we previously demonstrated, the Ripley’s K-based clustering score is robust to changes in spot number (Keikhosravi, A., et al. 2025).

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      We appreciate these points. Given the presence of one centromere on each chromosome, we used centromeres as surrogate landmarks of higher-order nuclear genome organization and considered centromere patterns as a general indicator of overall genome organization. While the relationship of centromere patterns to other genome features is poorly understood in mammalian cells, a link is suggested by observations in other organisms. For example, in yeast, the clustering of centromeres reflects the overall Rabl configuration of chromosomes. Having said that, we agree that our extrapolation to overall genome organization is somewhat speculative, and we have toned down these conclusions throughout the manuscript.

      We agree that one of the most interesting questions emerging from our study is whether centromere clustering has a functional role. In follow-up studies we will use some of the key regulator identified in these screens to perturb the native centromere distribution and assay for various cellular responses including in gene expression and genome integrity. These studies will be the subject of future publications.

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      We thank the reviewer for this comment. We now clarify the relationship of these proteins to centromeres in more detail on page 12. While they all have some relationship to centromeres, as would be expected if they contributed to centromere clustering, they represent multiple distinct pathways and processes.

      The observed effects on clustering are unlikely due to aneuploidy as only very limited aneuploidy is observed in our cells and because Ripley’s K measurement of centromere clustering is robust to change in chromosome copy number. Follow-up studies using live cell imaging approaches are currently in progress to address some of these mechanistic questions.

      Finally, the additive effects observed mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

      We rephrased the text on page 14 based on the reviewer’s recommendations.

      Reviewer #3 (Public review):

      Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      We now include in Fig. S11 examples of disordered mitotic nuclei observed in the absence of NUF2 or SPC24.

      I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

      We have modified this description on page 4.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Proper characterisation of the cell lines used in the manuscript. Tagged proteins have been known to affect protein levels compared to the parental cell, and where this is the case (or not), it needs to be transparently shown in the manuscript.

      The cell lines to conditionally deplete NCAPH2 and KI67 have previously been published, and they have been characterized to show normal expression levels of the tagged protein (Takagi et al., 2018). We also show quantification of Western blots to compare protein level of tagged SPC24 and NUF2 to that of the untagged proteins in the parental cell line (Fig. S8e-f) and discuss these results on page 11 and page 12.

      (2) Demonstration of protein depletion in the degron cell lines.

      We showed efficient protein depletion in the degron cell lines (Fig. S8c and S8d). In addition, we now show in Fig. S9 depletion of SPC24 and NUF2 in cells arrested at G1/S and G2/M.

      (3) The study examines centromere clustering, but not genome architecture. While it is understood that a complete investigation of genome architecture is beyond the scope of the current study, the interpretation does not match the data. The authors are suggested to pay attention to this point throughout the manuscript and consider their findings in terms of centromere clustering rather than genome architecture, including changing the title accordingly.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and a link to overall genome organization has been suggested in some organisms such as yeast, we have retained the wording in a few select instances, including the title. We also make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      Reviewer #1 (Recommendations for the authors):

      (1) Controls of depletion by western blot in synchronized cells (siRNAs and degrons) are lacking.

      We now show Western blots demonstrating efficient depletion of the target proteins in degron (Fig. S9) and siRNA treated cell-cycle arrested cells (Fig. S6b).

      It would have been very nice to discuss the implications of these findings further. For example, do centromere clustering changes gene expression/repression of pericentromeric heterochromatin expression? Is centromere clustering associated with specific diseases? How is global chromatin organization affecting gene expression/genome stability, etc? Although some of these aspects are unknown, a discussion about them would have been nice.

      We appreciate these interesting points. These questions are the subject of our ongoing follow up studies. We now discuss possible consequences of centromere re-organization on gene expression and genome stability on page 18.

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) Clarify Scope and Avoid Overinterpretation

      (a) The study exclusively investigates centromere positioning, without addressing broader aspects of genome architecture.

      (b) There is no established link presented between centromere positioning and higher-order genome organisation.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and observations in yeast suggest such a link, we have retained the wording in a few select instances. We make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      (c) The exclusion criteria used in the screen should be clearly explained, including the implications of selecting only non-essential or redundant genes.

      We discuss on page 8 and in the Methods section the exclusion criteria used in the screen, including the implications for identifying essential genes.

      (d) The authors should discuss why the identified proteins significantly affect centromere clustering but do not impact cell cycle progression.

      We now discuss this topic briefly on page 9. While some hits are expected to affect both cell-cycle progression and centromere clustering (Fig. S4c), it is not a priori expected that all hits would affect both.

      (2) Supplementary Figure 1

      This figure appears unnecessary. The co-localisation between CENP-C and CENP-A is well established in the literature, and the scoring provided does not add essential new information.

      The data was included in response to repeat questions from a centromere expert. We prefer to retain this data for completeness.

      (3) Differential Hits between Cell Lines 

      For hits that behave differently across cell lines, expression data should be provided. Are the genes equally expressed in both cell types? What is the level of depletion achieved?

      It is possible that cell-type specific hits arise due to difference in expression. Cell-type specific hits may also arise due multiple other reason including cancer vs. non-cancer origin, hTERT-immortalization, cell growth properties, variation in underlying DNA sequences of the Cas9 target loci, initial state of centromere clustering to name a few. Each of these possibilities requires additional experiments to identify the exact reason for cell-type specificity of a given factor. A full analysis of the reason for cell-type specificity is, however, beyond the scope of current study.

      (4) Efficiency of Cell Cycle-Specific Degradation

      Degradation efficiency likely varies across cell cycle stages. The authors should provide Western blots showing the extent of protein depletion at each cell cycle block.

      We provide Western blot data in Fig. S9 to demonstrate efficient knockdown of proteins in G1/S and G2/M arrested cells.

      (5) Figure S6 - Validation of New Cell Lines

      Genotyping data for the newly generated cell lines should be included, along with Western blots using protein-specific antibodies (not just the tag), compared to the parental cell line.

      We provide in Fig. S7c-d genotyping data and in Fig. S8e-f Western blot data to compare levels of tagged and untagged proteins.

      (6) Figure S7 - G2/M Block Efficiency

      The G2/M block appears suboptimal after 20 hours in RO-3306, with only ~50% of cells in G2/M and just 21-27% for Ki-67, where most cells remain in S phase. This raises concerns about the interpretation of mitotic depletion effects. It is possible that cells never progressed from G1 or completed S phase without Ki-67. Prior studies (van Schaik et al., 2022; Stamatiou et al., 2024) have shown delayed and uneven replication of centromeric/pericentromeric regions upon Ki-67 depletion during S phase, which could affect the readout. Live-cell imaging would be a more robust approach to confirm mitotic status.

      For KI67 after RO-3306 treatment, 73 and 67% cells were arrested at the G2/M boundary in the presence or absence of KI67, respectively (Fig. S10a-b). Upon release from G2/M arrest, the proportion of G1 cells increased from 6-13% to 28-60% in all four factors tested (Fig. S10b, and d). Please note that our results are not directly dependent on release efficiency, since we use single-cell staging (Fig. 3b) and selectively analyze only G1 populations (Fig. 5c).

      We are currently working towards live cell imaging, but this requires development and characterization of additional cell lines which is beyond the scope of this study.

      Statistical analyses of cell cycle phase distributions should also be included.

      We include statistical analyses of cell cycle phase distributions in Fig. S4c and Fig. S10c-d by performing t-tests with FDR corrections to compare percentage of cells in either in G1, S or G2 in the presence and absence of each factor tested.

      (7) Aneuploidy Assessment

      Aneuploidy scores for the four key proteins should be provided, ideally using centromere-specific FISH probes.

      While an aneuploidy score for each hit would be interesting piece of information, we showed in a previous publication that the Ripley’s K-based Clustering Score method used here is robust to aneuploidy (Keikhosravi et al., 2025) and aneuploidy would thus not lead to spurious identification of these proteins in our screen.

      (8) Add-Back Experiment (Page 14)

      While the add-back experiment is conceptually strong, its execution could be improved. <br /> It should be performed on synchronised cells: deplete the protein in G2/M, arrest in thymidine, then release into G1 without the protein to observe the unclustering phenotype.

      Re-expression should occur during the block, followed by release and analysis in the next G1 phase. This would better demonstrate whether clustering defects from the previous division can be rescued.

      We have attempted these types of long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (9) Statistical Analyses

      Several figures lack statistical analysis, which is essential for data interpretation:

      (a) Figure 1B-E

      (b) Figure 3I

      (c) Figure 4B

      (d) Figure 5B, C, G

      (e) Supplementary Figures S4B and S7

      Statistical analyses were performed for a) Fig. 1b-e, b) Fig. 3i, c) Fig. 4b, d) Fig. 5b-c and the details of the test are mentioned in the corresponding figure legends. We also include statistical tests for Fig. 5g, S5b and S7c-d.

      Minor Comments:

      (1) Page 9: "Reassuringly, in line with known centromere-nucleoli association (Bury, Moodie et al. 2020, van Schaik, Manzo et al. 2022)..."

      The citation "van Schaik, Manzo et al. 2022" is incorrect and should be revised.

      We have removed this reference.

      (2) Page 10:

      "...were grouped into six categories: regulators of chromatin structure, kinetochore proteins, nucleolar proteins, nuclear pore complex components..."

      The authors should note that NUP160, listed as a nuclear pore complex hit, is also a kinetochore component during mitosis and may be linked to mitotic defects.

      We now mention this on page 10.

      (3) Page 12:

      "Progression through S phase was equally efficient in the presence or absence of KI67."

      While bulk S phase progression may appear unaffected, refined analyses (e.g., Repli-seq, EdU patterning) have shown delayed replication of centromeric/pericentromeric regions upon Ki-67 depletion. This should be acknowledged, especially given the study's focus on centromeres (see Schaik et al., 2022; Stamatiou et al., 2024).

      Our statement was meant to describe the results we observed in this study. We indicate that overall progression is not affected, but subtle effects may persist, and we cite the relevant references on page 13.

      (4) Page 12:

      "KI67 is a well-known marker of cell proliferation..."

      The first study demonstrating the dependency of chromosome periphery on Ki-67 was Booth et al., 2014, which should be cited.

      This citation has been added.

      Reviewer #3 (Recommendations for the authors):

      (1) On page 14, paragraph 1, the authors suggest that NCAPH2 and SPC24 act independently on centromere clustering. I'm not convinced that this is the right interpretation of the data. Rather, the lack of an additive phenotype following NCAPH2 and SPC24 dual depletion suggests to me that these two proteins are acting in the same pathway.

      We show that knockdown of NCAPH2 and SPC24 results in opposite effects in centromere clustering. However, knockdown of SPC24 in NCAPH2-AID cells produces an intermediate level of clustering compared to depletion of NCAPH2 or SPC24 knockdown alone. This indicates additive effects. We have modified our description of these results on p. 14.

      (2) The analysis and experimental design in Figure 5g could be improved. For one, I would add statistical comparisons like the other figure panels. Second, the authors would ideally perform AID depletion in a synchronized G2 population before washout during the subsequent G1. This design might make some of the more subtle changes (e.g., KI67-AID) more obvious.

      We now include statistical analysis in Fig. 5g. We have attempted long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (3) In the discussion, the authors allude to centromere clustering data from the NDC80 complex, HMGA1, and other HMGs but fail to direct the reader to where they may find the data. If these data are in Tables S4 and S5, perhaps the authors could make these tables more reader-friendly?

      For each target, the mean Z-score of two biological replicates based on Clustering Score is located in column H in Table S4 and S5.

      (4) In my opinion, the term 'clustering score' comes across a bit ambiguous. In most cases, this term appears to refer to the distance between centromeric foci but is used occasionally to refer to the number of centromeric spots. For example, on page 9, paragraph 1, line 3, cluster/clustering is used three times but with slightly different meanings. Perhaps the authors can consider using the word 'clustering' to indicate the number of spots, 'dispersion' to indicate distance between centromeres, and 'radial distribution' to indicate distance from the nuclear center? Or other ways to improve the consistency of the descriptive terms.

      We apologize for not being clear. The Clustering Score is a very specific parameter derived from use of a Ripley’s K clustering algorithm as described in Materials and Methods. We now ensure that the term is used correctly throughout and that the other terms are also used consistently.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.

      An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      We appreciate the reviewer's concern regarding the timing of CNO administration relative to behavioral testing. The 30-minute interval was selected according to some previous studies[1, 2]. This window ensures stable and specific neuronal manipulation while minimizing off-target effects and was strictly performed through all experiments. We acknowledge that shorter interval (~15 mins) can be efficient to produce biological effect in vivo[3, 4]. We repeated chemogenetic tests 2-3 times to make sure to get reliable data for statistical analysis. However, we cannot exclude potential side-effects caused by chemogenetically prolonged activation of SuM because of its poor temporal resolution compared to optogenetic manipulation. We agree that employing techniques with higher temporal resolution, such as optogenetics, in future studies would provide an excellent complement to these findings.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      We agree that the use of multiple stressors (foot shock and CSDS) adds complexity to the interpretation. Our rationale was to test the generality of the SuM response and the role of SANs across different stress modalities (acute vs. chronic). The key finding is that while both vSub and dSub projections to the SuM were activated by the acute stressor of foot shock (Figure 5N-R), only the vSub-SuM pathway showed a significant increase in calcium activity specifically during the anxiety-provoking transition from the closed to the open arms of the EZM (Figure 5I-M). This dissociation suggests a selective role for the vSub-SuM circuit in encoding anxiety-related information, beyond a general response to stress.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      We thank the reviewer for highlighting the important studies linking the SuM to locomotion. We acknowledge this known function and carefully considered it in our analyses. Non-selective activation of the entire SuM didn’t affect total distance traveled in open field and elevated zero maze (Supplemental Figure 2 B-C). Although the locomotion of mice in OF and EZM was affected while targeting SANs, we also compared the travel distance in the central area of OF, to some extent, to minimize the influence of locomotion on the estimation of anxiety produced avoidance to the central area (Figure 4 I). We agree that future work delineating the specific subpopulations within the SuM that regulate locomotion versus anxiety would be highly valuable.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      We acknowledge that the primary focus on OF and EZM tests is a limitation in fully characterizing the behavioral profile of SAN manipulation. These tests were selected as they are well-validated, standard assays for anxiety-like behavior in rodents[5–10]. However, we also included the reward-seeking test, where activation of SANs significantly suppressed sucrose consumption (Figure 4L), suggesting a broader impact on motivational state that is often linked to anxiety. We fully agree with the reviewer that employing a richer behavioral battery—such as tests for social avoidance, conditioned place aversion, or Pavlovian fear conditioning—in future studies will be essential to comprehensively define the functional scope of SuM SANs and to conclusively dissect their role from fear memory engrams.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      Thank you for this important comment. We agree that directly linking acute foot shock-induced cFos expression with chronic social defeat stress (CSDS) electrophysiological changes may create an interpretive gap. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). We did not intend to imply that the same neuronal population responds identically to both stressors.

      To address this, we have clarified in the text that the purpose of Figure 1 is to show that SuM is responsive to diverse stressors, rather than to establish a direct mechanistic link between acute and chronic activation patterns. The baseline for SAN studies in Figure 3 is established through the TRAP2 tagging protocol following foot shock, independent of the CSDS model. We acknowledge that future studies should compare SAN recruitment across acute vs. chronic stressors to better define their functional overlap.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      Please see Supplemental Figure 2 for the infection area of AAV.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination.

      We thank the reviewer for pointing out the axis scaling issue. We have modified the y-axis to start from 0. The SuM nucleus has been reported to play role in the awake of rodents, it’s reasonable to have some basal neuronal activation after 4-OHT i.p. injection.

      Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      Thank you for the question. We have replaced the reactivation chance graph with a new reactivation percent analysis graph to show the proportion of SANs that reactivated by subsequent sucrose reward or stress. The rationale we use social stress other than foot shock is to show the potential generality of foot-shock tagged neurons. The lower expression of cFos after sucrose exposure suggest first, the SuM may not involve in reward regulation, which we agree with you; second, those SANs are more likely to modulate anxiety-like behavior but not reward.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      We agree that here we didn’t provide enough data to confirm there is no regulation effect of SuM-SANs on fear memory. Relevant statement has been removed to avoid any further misunderstanding.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      Thank you for your kind reminder. Corticosterone/cortisol, the primary stress hormone, is a well-established biomarker whose levels are elevated in response to stress and in anxiety states.[11, 12]. Some studies also reported that supplying corticosterone can produce anxiety-like behaviors in rodents[13–16]. We collect the blood sample at the same timepoint in Figure 4 C-D. We agree that line 236 is a kind of overstatement and has modified.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      We thank the reviewer for this important point. To address the concern regarding the in vivo behavioral encoding specificity of the vSub-SuM pathway, we further analyzed the in vivo fiber photometry data. The new analysis revealed that calcium activity in vSub-SuM projection neurons exhibited bidirectional, instantaneous, and specific changes during transitions between the open and closed arms of the elevated plus maze: their activity significantly and immediately decreased when mice moved from the open arm to the closed arm (new results shown in Supplemental Figure 5), and conversely, significantly and immediately increased upon transitioning from the closed to the open arm. However, under the same behavioral events, dSub-SuM projection neurons showed no significant change in activity. We hope this finding could strengthens the role of the vSub-SuM pathway in encoding anxiety-like behavior.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      Thank you for the suggestion. We have modified the line 190 with cautious “In this study, we combined multiple methods to determine whether the SuM is a brain region that involve in modulating anxiety.”

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      We discussed the DREADD method in the first part in our response.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported (Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      Thank you for your valuable comments. Foot-shock-activated neurons may play role in modulate any of the following anxiety-like behaviors and emotional memory (fear memory). We realized that we didn’t fully test all aspects of anxiety and memory, thus resulting in some overstatements in the manuscript. It is more proper to focus on “anxiety avoidance” according to the reduced open-arm exploration in EZM/EPM.

      Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

      We thank the reviewer for raising these important points. We agree that further clarification is warranted. In our study, we compared SAN reactivation across different stimuli: foot shock (acute physical stress), social stress (chronic psychosocial stress), and sucrose reward (non-aversive positive stimulus). As shown in Figure 3, SANs in the supramammillary nucleus (SuM) were significantly reactivated by social stress but not by sucrose reward. Moreover, the c-Fos response in SuM was markedly higher after foot shock compared to home cage controls (Figure 1). While we did not test all possible aversive states (e.g., pain, sickness), our data support that SuM SANs are preferentially recruited by stressors rather than by reward or neutral conditions. We acknowledge that the overlap across stress modalities is not complete, which may reflect differences in stress intensity, duration, or circuit engagement. Future work will systematically compare SAN recruitment across diverse aversive and non-aversive states to further define their selectivity.

      The term “stress-activated neurons” (SANs) here refers to neurons that are reliably activated by at least one type of stressor and can be reactivated by subsequent stress exposure. The partial overlap across stressors likely reflects the diversity of stress responses and the possibility that distinct subpopulations within SuM may encode different aspects of aversive experience. Importantly, chemogenetic activation of SANs was sufficient to induce anxiety-like behavior and elevate corticosterone (Figure 4), supporting their functional role in stress-related behavioral and physiological outputs. We have revised the manuscript to clarify that SANs represent a stress-responsive ensemble rather than a uniform population activated identically by all stressors.

      We appreciate the reviewer’s conceptual caution. In the revised manuscript, we intentionally avoided using the term “engram” to describe SANs. Our focus is on a stress-activated neuronal ensemble that drives anxiety-like behavior, not on memory recall per se. We refer to SANs as an “ensemble” or “population” rather than an engram, consistent with the TRAP-based labeling approach used to capture neurons activated during a specific experience. We agree that “engram” is best reserved for memory-encoding cells and will ensure this distinction remains clear throughout the text.

      Reviewer #3 (Public review):

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

      We appreciate the reviewer’s thoughtful critique regarding the specificity of SuM’s role in anxiety and the interpretation of our findings. We acknowledge that SuM has broad functions, including regulating exploration and hippocampal theta. However, our data show that general SuM activation increases anxiety-like measures (reduced open-arm time in EZM, decreased center exploration in OF) without altering total locomotion (Fig. 2, Suppl. Fig. 2). The locomotor reduction in SAN activation experiments (Suppl. Fig. 2F–G) was observed alongside clear anxiety-like behavioral changes (e.g. suppressed reward seeking), suggesting that the effects are not solely due to motor suppression. We agree that the methods we used to estimate anxiety-like behaviors base on mice movement when testing, and this could be a shortage of this research when trying to link the data to anxiety. Therefore it will be more proper to interpret the results as modulation of anxiety-like behavior (anxiety related avoidance) but not anxiety itself. We have modified the manuscript to describe more precise to avoid overstatement.

      Our fiber photometry data (Fig. 5) show that vSub–SuM projection neurons increase activity specifically when mice enter open arms of the EZM—a behavioral transition associated with anxiety—whereas dSub–SuM projections do not. This activity correlates with anxiety-related behavior, not merely with movement or stress per se.

      We also agree that the term “engram” may be misleading in this context. In the manuscript, we refer to SANs as a “stress-activated neuronal ensemble” rather than an anxiety engram. Our data indicate that these neurons are recruited by stress and their reactivation produces more anxiety related avoidance to open arms. We have revised the text to avoid conceptual overreach and to clarify that SuM SANs likely contribute to a state of sustained anxiety/avoidance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting, including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from noting that the subjects were male in the abstract and discussion of the limitations of the exclusion of females.

      Thank you for the suggestion. We have included the full statistical detail in a separate sheet as Table 1. Also, we have modified the title of the manuscript to reflect the sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) In line 211, the authors state, "we recorded neuronal action potentials via multichannel extracellular recording while the mice were moving in the EPM, a traditional type of maze used to test anxiety in rodents,". However, it is unclear what data is presented in the paper, that is, extracellular recordings from SuM in mice on the elevated plus maze.

      We have deleted the description of multichannel recording data in EPM as the data was removed earlier.

      Minor corrections to the text and figures.

      (2) For bar plots, perhaps clarify how the data is presented. For example, in Figure 4, "The data in B, D, E and I-L are presented as the means {plus minus} SEMs," but this does not appear to be plotted as a mean with SEM error bars because the error bars cover all the values.

      Corrected.

      (3) In Figure 5, the white text for EGFP in panel B is very difficult to see.

      Corrected.

      (4) For Figure 5D, it would be helpful to more clearly specify which neurons in SuM were recorded from. Was it SANs or all SuM neurons?

      We did whole-cell recording on all SuM neurons.

      (5) Fos2A-iCreERT2 is mislabeled as "Fos2A-iCreERT" in the methods.

      Corrected.

      (6) The sentence at line 139 "To make sure foot shock induced anxiety won't last until manipulation, we subjected139mice to an acute stress protocol involving foot shocks and then performed the elevated plus140maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7," is unclear as written.

      Thank you for pointing this. We have modified the sentence to make it more clear. “To make sure mice are on similar basal condition while applying chemo-genetic manipulation, we subjected mice to an acute stress protocol involving foot shocks and then performed the elevated plus maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7 (Figure 4 A). The mice that experienced foot shocks showed decreases in the exploration time in the open arms on day 2. However, acute stress-induced anxiety was not detected on day 7 (Figure 4 B), which allow us to compare the reactivation of SANs produced anxiety-like behavior between groups at the same baseline.”

      (7) The details of the viral injections used for ex vivo electrophysiology are not sufficient to understand the experiment and the implications of the data. Which neurons (SANs?) are recorded from, what percent of those had inputs, were the sub-neurons globally labeled or just SANs?

      We performed whole-cell recording on global SuM neurons to show if the projection is innervated by glutamergic neurons in Sub as shown in Figure 5-B that the projection neurons in Sub are exclusively vglut1 expressed. Based on this aim of the experiment, we didn’t keep any neurons that were not response to the light stimulation, therefore can’t calculate the input percent in this case. We have added words to clearly show that we did global SuM neurons in Methods.

      (8) The scale used in Figure 6C renders that data unreadable. 120 to 40% changes in body weight are well beyond the variability in the data.

      We have modified the axis (90 to 110%) to show the body weight change clearer.

      (9) The dose of CNO used, 5 mg/kg, is high, and using lower doses or other DREADD ligands is worth considering.

      Thank you for your valuable comment. We have noticed that people are using relatively lower dose of CNO or other DREADD ligands that are reported much higher affinity and less side-effect. The dose of 5mg/kg was adapted from earlier papers that using DREADD and show no obvious side-effect in mice[17], e.g locomotion (S Figure 2B), in our experiments, so we keep using this dose in this project to make it consistent across different cohorts of experiments. We are switching to DCZ to avoid any potential side-effect of CNO in the following experiments based on this project.

      Reviewer #2 (Recommendations for the authors):

      This is a strong manuscript that provides important insights into the role of the supramammillary nucleus (SuM) and its inputs from the ventral subiculum in regulating anxiety. The combination of behavioral, imaging, electrophysiological, and circuit manipulation approaches is impressive, and the distinction the authors propose between anxiety-related and fear-related circuits is conceptually important.

      There are, however, some points that I think need clarification. The authors emphasize that the hippocampus is essential for fear memory recall, yet they do not directly evaluate whether the SuM-hippocampal pathway might contribute differentially to anxiety versus fear memory. Addressing this would help to explain where the dissociation between the two processes arises.

      Thank you for the suggestion. We realized that we didn’t collect enough data to exclude the role of those SANs on memory, especially fear memory, a memory formation bases on strong emotional training as aforementioned. The data and relevant discussion have been removed to avoid misunderstanding and overstatement.

      I am also not fully convinced about the definition of the "stress-activated neurons" (SANs). The overlap across repeated stress exposures is quite modest (around 20%), which suggests that this population may not be strictly stress-specific but rather a dynamic subset that is preferentially, though not exclusively, engaged by stress. Related to this, the use of the term "engram" raises conceptual questions. Since the classic engram refers to an ensemble encoding and recalling a specific memory, it is not obvious whether it is appropriate to apply the term to a neuronal population that appears to represent a persistent emotional state. The authors should consider justifying this choice of terminology more carefully or adopting a different term.

      Thank you for your important comments. Yes we agree that the SANs in this manuscript are more likely dynamic subset other than exclusive foot-stress engaged “engram”. That’s why we use “stress-activated neurons” but not “engram” to describe this neuronal ensemble. To avoid further misleading, we have made some modification to reduce the use of “engram” across the manuscript.

      Some parts of the text also need more precision. For example, the statement in lines 63-65 that "few studies have explored emotion-related engram cells" is potentially misleading, as most engram studies focus on memories with a strong emotional component. The rationale for this claim should be clarified.

      This sentence has been deleted since it is not necessary to link the text and misleading.

      In Figure 1, the choice of methods is also puzzling: cFos immunostaining is used after shock delivery, while electrophysiology is used for the CSDS paradigm. It would be helpful to explain why different readouts were chosen for different stress models, and whether this may affect the comparability of the results.

      Thank you for this important comment. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). The reason we chose different method is that acute stress produces transit effect while chronic stress produces long-lasting effect. To our knowledge, cFos is a well-established marker for strong neuronal activation, but with short lifespan (~4-6 hours) and suits acute paradigm better. In vivo recording allows us to compare the neuronal activity before and after chronic experiments within subjects and has ability to reveal cumulative effect which cFos cannot. To address this, we have clarified in the text that the purpose of Figure 1 in Line 112-113: “To investigate if SuM would be responsive to diverse stressors, we next examined whether chronic stress, which different mechanism underlying…”

      Finally, some additional details would strengthen the presentation. The discussion of corticosterone and other physiological markers could be expanded to indicate whether these effects were robust across stress paradigms. Similarly, the relatively modest overlap between SANs activated by different stressors could be framed more explicitly as part of a broader principle of flexible ensemble recruitment in anxiety-related circuits.

      Thank you for your suggestion. We have added more discussion about the corticosterone and the flexibility of SANs in the manuscript. See Line 267-270: “The serum corticosterone concentration can be used as a marker of stress-induced change in the peripheral blood. Previous studies showed serum corticosterone can be increased by various stress stimulation [39–42]; meanwhile, intentionally supplementing the diet with corticosterone can induce anxiety-like behaviors in rodents[43].” and Line 275-281: “However, the reactivation rate of SANs caused by different stressor was relatively lower than the initial activation rate caused by foot shock (Figure 3). This suggests that stress-activated neuronal clusters may have more flexible recruitment principles, with only a small number of neurons potentially encoding emotional information, while most other neurons remain involved in encoding other neural activities. Studies in other field, particularly studies of memory engram, has shown that the sets of neurons activated during learning are dynamic and exhibit high flexibility [44, 45].”

      Overall, the work is of high quality and provides a valuable contribution to the field, but addressing these points would help sharpen the mechanistic claims and ensure that the conceptual framework is as clear and precise as the experimental data.

      Reviewer #3 (Recommendations for the authors):

      (1) Since increased SuM activity is hypothesized to mediate the effects of stress on anxiety-like behavior, a logical step would be to test for necessity by silencing the stress-activated SuM cells.

      We agree this is a logical and valuable experiment. While our current study focused primarily on the sufficiency of SuM/SAN activation to induce anxiety-like behavior, we acknowledge that inhibition experiments would provide critical complementary evidence for necessity. We have added a statement in the Discussion noting that “future studies should examine whether silencing SuM SANs, either during stress exposure or during anxiety testing, can prevent or reduce stress-induced anxiety”. This will help establish a more complete causal role.

      (2) Discuss what is meant by "anxiety engram" and what features of anxiety the labeled cells might encode.

      We concur that “stress-activated neuron (SAN)” is a more precise descriptor than “engram” in this context. We have revised the text to avoid the potentially misleading term “engram” and instead refer to a “stress-activated neuron”. The labeled cells are preferentially reactivated by stress (not reward), and their activation promotes both behavioral avoidance and physiological stress markers (corticosterone). They likely contribute to the maintenance of an anxious state under perceived threat, rather than encoding discrete threat cues or memories.

      (3) A more nuanced analysis of behavioral correlates of SuM activity and/or the behavioral effects of SuM manipulations would strengthen this paper.

      To provide a more nuanced understanding of the behavioral correlates, we have performed additional analyses on our fiber photometry data (now presented in Supplemental Figure 6). and have also planned additional experiments for the future study to deepen our understanding.

      References:

      (1) Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, et al. Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Sci Rep. 2019;9.

      (2) Koike H, Demars MP, Short JA, Nabel EM, Akbarian S, Baxter MG, et al. Chemogenetic Inactivation of Dorsal Anterior Cingulate Cortex Neurons Disrupts Attentional Behavior in Mouse. Neuropsychopharmacology. 2016;41:1014–1023.

      (3) Guettier J-M, Gautam D, Scarselli M, Ruiz De Azua I, Li JH, Rosemond E, et al. A chemical-genetic approach to study G protein regulation of cell function in vivo. Proceedings of the National Academy of Sciences. 2009;106:19197–19202.

      (4) Wess J, Nakajima K, Jain S. Novel designer receptors to probe GPCR signaling and physiology. Trends Pharmacol Sci. 2013;34:385–392.

      (5) Kraeuter AK, Guest PC, Sarnyai Z. The Elevated Plus Maze Test for Measuring Anxiety-Like Behavior in Rodents. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 69–74.

      (6) Kraeuter AK, Guest PC, Sarnyai Z. The Open Field Test for Measuring Locomotor Activity and Anxiety-Like Behavior. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 99–103.

      (7) Wall PM, Messier C. Methodological and conceptual issues in the use of the elevated plus-maze as a psychological measurement instrument of animal anxiety-like behavior. Neurosci Biobehav Rev. 2001;25:275–286.

      (8) Carobrez AP, Bertoglio LJ. Ethological and temporal analyses of anxiety-like behavior: The elevated plus-maze model 20 years on. Neurosci Biobehav Rev. 2005;29:1193–1205.

      (9) Seibenhener ML, Wooten MC. Use of the open field maze to measure locomotor and anxiety-like behavior in mice. Journal of Visualized Experiments. 2015. 6 February 2015. https://doi.org/10.3791/52434.

      (10) Prut L, Belzung C. The open field as a paradigm to measure the effects of drugs on anxiety-like behaviors: A review. Eur J Pharmacol. 2003;463:3–33.

      (11) Chen Y, Zhou X, Chu B, Xie Q, Liu Z, Luo D, et al. Restraint Stress, Foot Shock and Corticosterone Differentially Alter Autophagy in the Rat Hippocampus, Basolateral Amygdala and Prefrontal Cortex. Neurochem Res. 2024;49:492–506.

      (12) Hassell JE, Nguyen KT, Gates CA, Lowry CA. The Impact of Stressor Exposure and Glucocorticoids on Anxiety and Fear. Curr. Top. Behav. Neurosci., vol. 43, Springer; 2019. p. 271–321.

      (13) Peng B, Xu Q, Liu J, Guo S, Borgland SL, Liu S. Corticosterone attenuates reward-seeking behavior and increases anxiety via D2 receptor signaling in ventral tegmental area dopamine neurons. Journal of Neuroscience. 2021;41:1566–1581.

      (14) Myers B, Greenwood-Van Meerveld B. Elevated corticosterone in the amygdala leads to persistant increases in anxiety-like behavior and pain sensitivity. Behavioural Brain Research. 2010;214:465–469.

      (15) Demuyser T, Deneyer L, Bentea E, Albertini G, Van Liefferinge J, Merckx E, et al. In-depth behavioral characterization of the corticosterone mouse model and the critical involvement of housing conditions. Physiol Behav. 2016;156:199–207.

      (16) Shoji H, Maeda Y, Miyakawa T. Chronic corticosterone exposure causes anxiety- and depression-related behaviors with altered gut microbial and brain metabolomic profiles in adult male C57BL/6J mice. Molecular Brain . 2024;17.

      (17) Manvich DF, Webster KA, Foster SL, Farrell MS, Ritchie JC, Porter JH, et al. The DREADD agonist clozapine N-oxide (CNO) is reverse-metabolized to clozapine and produces clozapine-like interoceptive stimulus effects in rats and mice. Sci Rep. 2018;8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      GID/CTLH-type RING ligases are huge multi-protein complexes that play an important role in protein ubiquitylation. The subunits of its core complex are distinct and form a defined structural arrangement, but there can be variations in subunit composition, such as exchange of RanBP9 and RanBP10. In this study, van gen Hassend and Schindelin provide new crystal structures of (parts of) key subunits and use those structures to elucidate the molecular details of the pairwise binding between those subunits. They identify key residues that mediate binding partner specificity. Using in vitro binding assays with purified protein, they show that altering those residues can switch specificity to a different binding partner.

      Strengths:

      This is a technically demanding study that sheds light on an interesting structural biology problem in residue-level detail. The combination of crystallization, structural modeling, and binding assays with purified mutant proteins is elegant and, in my eyes, convincing.

      Weaknesses:

      I mainly have some suggestions for further clarification, especially for a broad audience beyond the structural biology community.

      We thank the reviewer for the careful evaluation of our manuscript and for the positive and encouraging assessment of our work. We also thank the reviewer for the constructive suggestions to improve clarity for a broader audience and have revised the manuscript accordingly.

      (1) The authors establish what they call an 'engineering toolkit' for the controlled assembly of alternative compositions of the GID complex. The mutagenesis results are great for the specific questions asked in this manuscript. It would be great if they could elaborate on the more general significance of this 'toolkit' - is there anything from a technical point of view that can be generalized? Is there a biological interest in altering the ring composition for functional studies?

      We thank the reviewer for raising this important point. Beyond addressing the specific pairwise assembly mechanisms analyzed in this study, we agree that the broader significance of this engineering toolkit warrants further discussion. The residue-level understanding of CTLH-CRA interfaces not only explains assembly specificity but also enables rational manipulation of ring composition in a controlled manner. We have therefore expanded the end of the discussion section to outline generalizable strategies for CRA-interface disruption and to highlight potential biological applications of altering ring composition for functional studies.

      (2) Along the same lines, the mutagenesis required to rewire Twa1 binding was very complex (8 mutations). While this is impressive work, the 'big picture conclusion' from this part is not as clear as for the simpler RanBP9/10. It would be great if the authors could provide more context as to what this is useful for (e.g., potential for in vivo or in vitro functional studies, maybe even with clinical significance?)

      We thank the reviewer for this important comment and agree that the broader implications of the more complex Twa1 rewiring were not sufficiently emphasized in the original manuscript. Through the competition ITC experiments (Fig. 5), we aimed to demonstrate a concrete application of the Twa1. At the same time, we recognize that additional use cases are conceivable. To address this point, we have expanded the discussion section to clarify the conceptual significance of Twa1 rewiring and briefly outline further potential applications of controlled interface manipulation. These additions aim to better contextualize the broader relevance of this approach beyond the specific mechanistic questions addressed in this study.

      (3) For many new crystal structures, the authors used truncated, fused, or otherwise modified versions of the proteins for technical reasons. It would be helpful if the authors could provide reasoning why those modifications are unlikely to change the conclusions of those experiments compared to the full-length proteins (which are challenging to work with for technical reasons). For instance, could the authors use folding prediction (AlphaFold) that incorporates information of their resolved structures and predicts the impact of the omitted parts of the proteins? The authors used AlphaFold for some aspects of the study, which could be expanded.

      We agree with the reviewer that the transferability of the domain constructs to the corresponding full-length proteins is an important consideration. In the original version of the manuscript, we addressed this point by fitting the experimentally determined CTLH-CRA domain structures of muskelin and RanBP9 into the cryo-EM maps of the full-length complexes (Fig. 5d), demonstrating that the applied truncations and fusion strategies are compatible with the architecture observed in the intact assembly. Following the reviewer’s suggestion, we have further strengthened this analysis by adding a new Supplementary Figure 1. In this figure, the experimentally determined CTLH-CRA domain structures are superposed with full-length AlphaFold predictions. This comparison shows that removal of flexible linker regions, such as those between the CTLH and CRA motifs or at terminal segments, does not alter the overall fold or the binding interfaces of the domains. Together, these analyses support the conclusion that the domain constructs faithfully represent the structural and interaction properties of the full-length proteins.

      Reviewer #2 (Public review):

      Summary:

      This is a very interesting study focusing on a remarkable oligomerization domain, the LisH-CTLH-CRA module. The module is found in a diverse set of proteins across evolution. The present manuscript focuses on the extraordinary elaboration of this domain in GID/CTLH RING E3 ubiquitin ligases, which assemble into a gigantic, highly ordered, oval-shaped megadalton complex with strict subunit specificity. The arrangement of LisH-CTLHCRA modules from several distinct subunits is required to form the oval on the outside of the assembly, allowing functional entities to recruit and modify substrates in the center. Although previous structures had shown that data revealed that CTLH-CRA dimerization interfaces share a conserved helical architecture, the molecular rules that govern subunit pairing have not been explored. This was a daunting task in protein biochemistry that was achieved in the present study, which defines this "assembly specificity code" at the structural and residue-specific level.

      The authors used X-ray crystallography to solve high-resolution structures of mammalian CTLH-CRA domains, including RANBP9, RANBP10, TWA1, MAEA, and the heterodimeric complex between RANBP9 and MKLN. They further examined and characterized assemblies by quantitative methods (ITC and SEC-MALS) and qualitatively using nondenaturing gels. Some of their ITC measurements were particularly clever and involved competitive titrations and titrations of varying partners depending on protein behavior. The experiments allowed the authors to discover that affinities for interactions between partners is exceptionally tight, in the pM-nM range, and to distill the basis for specificity while also inferring that additional interactions beyond the LisH-CTLH-CRA modules likely also contribute to stability. Beyond discovering how the native pairings are achieved, the authors were able to use this new structural knowledge to reengineer interfaces to achieve different preferred partnerings.

      Strengths:

      Nearly everything about this work is exceptionally strong.

      (1) The question is interesting for the native complexes, and even beyond that, has potential implications for the design of novel molecular machines.

      (2) The experimental data and analyses are quantitative, rigorous, and thorough.

      (3) The paper is a great read - scholarly and really interesting.

      (4) The figures are exceptional in every possible way. They present very complex and intricate interactions with exquisite clarity. The authors are to be commended for outstanding use of color and color-coding throughout the study, including in cartoons to help track what was studied in what experiments. And the figures are also outstanding aesthetically.

      Weaknesses:

      There are no major weaknesses of note, but I can make a few recommendations for editing the text.

      We are very grateful to the reviewer for this exceptionally positive and thoughtful assessment of our work. We sincerely appreciate the recognition of both the conceptual scope and the technical depth of the study. We are particularly encouraged by the reviewer’s comments regarding the clarity and presentation of the figures. Considerable effort went into ensuring that the structural and biochemical complexity of the CTLH assemblies could be conveyed in a clear and accessible manner, and we are grateful that this was appreciated. We thank the reviewer for the constructive recommendations for textual improvements.

      Reviewer #3 (Public review):

      Summary:

      Protein complexes, like the GID/CTLH-type E3 ligase, adopt a complex three-dimensional structure, which is of functional importance. Several domains are known to be involved in shaping the complexes. Structural information based on cryo-EM is available, but its resolution does not always provide detailed information on protein-protein interactions. The work by van gen Hassend and Schindelin provides additional structural data based on crystal structures.

      Strengths:

      The work is solid and very carefully performed. It provides high-resolution insights into the domain architecture, which helps to understand the protein-protein interactions on a detailed molecular level. They also include mutant data and can thereby draw conclusions on the specificity of the domain interactions. These data are probably very helpful for others who work on a functional level with protein complexes containing these domains.

      Weaknesses:

      The manuscript contains a lot of useful, very detailed information. This information is likely very helpful to investigate functional and regulatory aspects of the protein complexes, whose assembly relies on the LisH-CTLHCRA modules. However, this goes beyond the scope of this manuscript.

      We thank the reviewer for the detailed review of our manuscript and for the constructive and positive remarks. We greatly appreciate the recognition of the high-resolution structural insights and the value of combining crystallographic data with mutational analyses to elucidate domain-specific interactions. We are also grateful for the acknowledgment that these findings may serve as a useful resource for future functional and regulatory studies of LisH-CTLH-CRA-containing complexes. While such aspects extend beyond the immediate scope of the present study, we hope that the structural framework provided here will facilitate and inspire future investigations addressing these questions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) For the ITC measurements that are less accurate, the authors may want to represent that in the figures with an approximate sign.

      We thank the reviewer for this helpful suggestion. After consideration, we decided not to introduce an approximate sign in the main figures, as this would be inconsistent with the graphical conventions used throughout the manuscript (there is also no equal sign). Since the associated errors are reported directly alongside each K<sub>D</sub> value, we believe that the precision of the measurements is sufficiently conveyed. However, we agree that explicitly marking estimated values can be appropriate in specific cases. We have therefore added approximate signs in Supplementary Fig. 5 for the K<sub>D</sub> estimation of self-association.

      (2) The names of the proteins are from mammals and should probably be capitalized.

      We agree that capitalization is generally appropriate for mammalian protein names. In particular, for proteins such as Rmnd5a, which is identical in sequence between mouse and human, the use of human-style nomenclature would indeed be fully justified. Originally, we chose the current nomenclature to distinguish the proteins studied here from strictly human versions, as most constructs are derived from mouse and one (muskelin) from rat. This approach also avoids inconsistencies between the mouse and rat proteins within the manuscript and maintains alignment with the nomenclature used in our previous publications. For the sake of consistency and continuity, we have therefore retained the original formatting throughout the manuscript.

      (3) For the sequence alignments, it would be good to specify in the legend which organisms these are from, and where the differences are in mouse and rat proteins used in the study, and the human proteins.

      We appreciate this constructive suggestion. We have revised the sequence alignment legends to clearly specify the organism of origin for each sequence included in the analysis. In addition, we have added a new Supplementary Figure 1 presenting the AlphaFold predictions of the mouse proteins and rat muskelin used in this study. Within these models, sequence differences relative to the human proteins are indicated, and variations within the CTLH-CRA domains are explicitly annotated. These additions clarify how the constructs analyzed here relate to their human counterparts.

      (4) A few points about the referencing:

      (a) It was reference 27 that first described the dual-sided interactions where the CRA domain weaves back and forth such that CTLH-CRAN and LisH-CRAC mediate the contacts on the two sides. This should be cited.

      We fully agree and added the reference accordingly.

      (b) To this reviewer's knowledge, it was references 13 and 9 that resolved the daisy-chain of helical LisH-CTLHCRA interactions around the oval helical structures.

      We agree with the reviewer that references 13 and 9 resolved the helical LisH-CTLH-CRA daisy-chain arrangement around the oval structure. Reference 13 was already cited in the original manuscript, and we have now added reference 9 to appropriately acknowledge this contribution. We have retained reference 14, although it did not resolve the helical daisy-chain architecture, as it described a related oval assembly of CTLH complex components that remains relevant in the structural context discussed.

      (c) A cryo-EM map with RANBP10 was shown at low resolution in reference 8.

      We agree with the reviewer that a low-resolution cryo-EM map including RANBP10 was reported in reference 8. Our original wording was not sufficiently precise and may have given the impression that RANBP10 had not been characterized. Our intention was to convey that, although cryo-EM maps exist, detailed atomic-level information on subunit interfaces was lacking. We have revised the paragraph accordingly to clarify this point and now cite reference 8 explicitly in this context.

      (d) The Discussion requires referencing.

      We agree with the reviewer that additional referencing improves the clarity and contextualization of the Discussion. We have revised the Discussion section accordingly and added appropriate references to support the statements made.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors focus on the quantification of spike time uncertainties in simulated data and in data recorded with high sampling rate in cebellar slices with GCaMP8f, and they demonstrate the high temporal precision that can be achieved with their method to estimate spike timing.

      Strengths:

      - The author provide a solid ground work for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al. and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - Although the algorithm is compared (in the revised manuscript) to other models to infer individual spikes (e.g., MLSpike), these comparisons could be more comprehensive. Future work that benchmarks this and other algorithms under varying conditions (e.g., noise levels, temporal resolution, calcium indicators) would help assess and confirm robustness and useability of this algorithm.

      The metrics used for comparison follow the field's benchmarking conventions (see the CASCADE paper, Rupprecht et al. 2021). Indeed, improved standardized methods would be ideal to develop, which is beyond the scope of this manuscript.

      - The mathematical complexity underlying the method may pose challenges for experimentalist who may want to use the methods for their analyses. While this is not a weakness of the approach itself, this highlights the need for further validation and benchmarking in future work, to build user confidence.

      We acknowledge the challenges of understanding the mathematics underlying our method, but such a study is necessary to ensure its accuracy and reliability. Indeed, we will strive to improve the technique's user-friendliness in future instantiations.

      Reviewer #2 (Public review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contains parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity, but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the github repository is well-organized.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz).

      In the revision, Figure 9 shows that temporal accuracy is very similar between PGBAR and the supervised method, CASCADE, and that PGBAR has a lower false positive rate. These results support the effectiveness of unsupervised Monte Carlo sampling, even with a simple autoregressive model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I'd like to thank the authors for their revisions. Their comments have addressed all my concerns, and I thank them for the clarifications. I have no further comments, except a few minor notes that the authors may consider or not:

      - The paragraph starting in line 367 is newly written and not yet as clear and mature as other parts of the manuscript. It is at several sentences roughly clear what it is about, but the precision of the wording is lacking. For example "distributions of the average time from ground-truth" seems a bit unclear, maybe "distributions of the average time of estimate spikes from ground-truth spikes" instead. Similarly, "the false detection rate, defined as the difference between detected and ground-truth spikes ..." could be rephrased using the difference between "numbers of spikes" instead of the difference between "spikes". But all of this is minor.

      - In the new Figure 9A, the error bars for the MLSpike method seem to be absent. In the same figure legend, it should be "excess" instead of "excess".

      We thank the reviewer for the feedback. We revised the wording of the new paragraph in response to the reviewer’s suggestions, restored the missing error bar in Figure 9, and corrected the figure legend.

      Reviewer #2 (Recommendations for the authors):

      Comparison to CASCADE: as far as I know there are no CASCADE models that have been trained on ground truth data in the regime of very fast (line scan) sampling, which is rarely used. A fair comparison of spike time estimates between PGBAR and CASCADE should take this into account. This can be done by training a new CASCADE model using the dataset of this paper. Given that performance of PGBAR and CASCADE is very similar already now (except for the false positive rate), a CASCADE model optimized for high sampling rate may be expected to catch up with (or even exceed) the performance of PGBAR. At a minimum, this possibility should be discussed.

      While this may be true, retraining a CASCADE model on high-frequency ground-truth data is beyond the scope of this manuscript. Indeed, a retrained CASCADE model optimized for line-scan or GCaMP8f data could improve performance and potentially match or exceed PGBAR, particularly in reducing false positives.

      Our aim, however, is not to benchmark supervised methods under their optimal retraining conditions, but to provide an unsupervised alternative that does not rely on labeled training data. In practice, retraining supervised models is constrained by the availability of suitable ground-truth datasets and by the uncertainty in how the method generalizes to acquisition regimes that differ substantially from the training set.

      We have therefore added a sentence in the Discussion (at the end of the subsection Comparison with benchmark datasets):

      [...] “While retraining supervised methods such as CASCADE on high-frequency or GCaMP8f ground-truth datasets could further improve its performance, limitations in dataset availability and generalization across acquisition regimes motivate complementary, training-free approaches such as PGBAR.”

      As stated in the manuscript, future extensions, such as using nonlinear biophysical models as the generative model for Monte Carlo–based inference, may further improve spike estimation accuracy.

    1. Author response:

      We thank the reviewing editor and the reviewers for their careful evaluation of our manuscript “Early sleep dependent sensory gating in the olfactory system”, and for their constructive feedback. We are encouraged by the overall positive assessment of the work.

      In the revised version, we will address all the points raised by the reviewers. Below, we outlined the main aspects of the revision.

      (1) Contextualization within prior literature.

      We will expand the text to better situate our findings within the existing literature and clarify the specific contribution of our work, particularly with respect to state dependent changes in olfactory bulb activity.

      (2) Distinction between sleep and urethane anaesthesia.

      We will revise the text to more clearly distinguish findings obtained during natural sleep from those obtained under urethane anaesthesia. While avoiding direct equivalence between states, we will clarify that the comparison is intended to highlight shared features of slow wave brain dynamics associated with sensory gating.

      (3) Clarification of analytical methods and statistical criteria.

      We will provide additional details regarding normalisation procedures, surrogate based analysis, and statistical criteria used to assess the presence or absence of coherence and phase amplitude coupling, ensuring consistency across figures.

      (4) Improvements in figures in terminology.

      We will revise figure annotations to improve clarity (axis, colour scales, units and labelling) and ensure consistent terminology throughout the manuscript.

      We believe these revisions will further strengthen the manuscript while preserving its central conclusions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      Strengths

      (1) The definition of highly variable yet highly reproducible sulci such as the slocs-v feeds the community with new anatomo-functional landmarks (which is emphasized by the provision of a probability map in supp. mat., which in my opinion should be proposed in the main body).

      We agree with Reviewer 2 that there is merit to including the probability maps as a main text Figure rather than Supplementary Figure. We have now added it to the main text.

      Weaknesses

      (1) While the identification of the sulci has been done thoroughly with expert validation, the sulci have not been labeled in a way that enables the demonstration of the reproducibility of the labeling.

      Our group was unable to use an approach amenable to calculating inter-rater agreements to expedite the process of defining thousands of sulci at the individual level in multiple regions as this was our first study comprehensively documenting the sulcal organization of this region. Nevertheless, our method followed a rigorous, three-tiered procedure to ensure accurate sulcal definitions were identified in all participants. In the case of this study, authors YT and TG first defined sulci. These sulci were then checked by a trained expert (EHW). Finally, sulcal definitions were finalized by the senior author, an expert neuroanatomist (KSW). We emphasize that this process has produced reproducible anatomical results when charting other regions such as posteromedial cortex (Willbrand et al., 2023 Science Advances; Willbrand et al., 2023 Communications Biology; Maboudian et al., 2024 The Journal of Neuroscience; Ramos Benitez et al., 2024 Neuropsychologia), ventral temporal cortex (Miller et al., 2020 Scientific Reports; Parker et al., 2023 Brain Structure and Function), and lateral prefrontal cortex (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Brain Structure and Function; Willbrand et al., 2023 The Journal of Neuroscience; Willbrand et al., 2024 Brain Structure and Function) across age groups, species, and clinical populations. For the present study, by the time the final tier of our method was reached, we emphasize that a very small percentage (~2%) of sulcal definitions were actually modified. We will include an exact percentage in future publications in LPC/LOPJ.

      Our Methods have been edited to describe these features (Pages 21-22):

      “As this is the first time the sulcal expanse of LPC/LOPJ was comprehensively charted with a focus on pTS, the location of each sulcus was confirmed through a three-tiered procedure for each participant in each hemisphere. First, trained independent raters (Y.T. and T.G.) identified sulci. Second, these definitions were checked by a trained expert (E.H.W.). Third, these labels were finalized by a neuroanatomist (K.S.W.). We emphasize that this procedure has produced reproducible results in our prior work across the cortex (Miller et al. 2021; Voorhies et al. 2021; Yao et al. 2022; Willbrand et al. 2023; Willbrand et al. 2022; Willbrand et al. 2024; Parker et al. 2023; Miller et al. 2020; Willbrand et al. 2022; Willbrand et al. 2023; Maboudian et al. 2024; Ramos Benitez et al. 2024). All LPC sulci were then manually defined and saved as .label files in FreeSurfer using tksurfer tools, from which morphological and anatomical features were extracted. We defined LPC/LPOJ sulci for each participant based on the most recent schematics of sulcal patterning by Petrides (2019) as well as pial, inflated, and smoothed white matter (smoothwm) FreeSurfer cortical surface reconstructions of each individual. In some cases, the precise start or end point of a sulcus can be difficult to determine on a surface (Borne et al., 2020); however, examining consensus across multiple surfaces allowed us to clearly determine each sulcal boundary in each individual. For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres). The specific criteria to identify the slocs and pAngs are outlined in Fig. 1b.”

      Reviewer #3 (Public Review):

      Weaknesses

      (1) The numbers of subjects are inherently limited both in number as well as in being typically developing young adults.

      First, although the sample size of the present study is small in number in comparison to large N, group-level neuroimaging analyses, it is comparable to precision neuroimaging studies examining sulcal features in individual participants (for example, Cachia et al., 2021 Frontiers in Neuroanatomy; Garrison et al., 2015 Nature Communications; Lopez-Persem et al., 2019 The Journal of Neuroscience; Miller et al., 2021 The Journal of Neuroscience; Roell et al., 2021 Developmental Cognitive Neuroscience; Voorhies et al., 2021 Nature Communications; Weiner, 2019 The Anatomical Record; Willbrand, et al., 2022 Science Advances; Willbrand, et al., 2022 Brain Structure & Function; Yao et al., 2022 Cerebral Cortex). We discuss this point in detail in the Limitations subsection of the Discussion (Page 17):

      “This manual method is also arduous and time-consuming, which, on the one hand, limits the sample size in terms of number of participants, while on the other, results in thousands of precisely defined sulci. This push-pull relationship reflects a broader conversation in the human brain mapping and cognitive neuroscience fields between a balance of large N studies and “precision imaging” studies in individual participants (Gratton et al., 2022; Naselaris et al., 2021; Rosenberg and Finn, 2022). Though our sample size is comparable to other studies that produced reliable results relating sulcal morphology to brain function and cognition (for example, Cachia et al., 2021; Garrison et al., 2015; Lopez-Persem et al., 2019; Miller et al., 2021; Roell et al., 2021; Voorhies et al., 2021; Weiner, 2019; Willbrand et al., 2022a, 2022b; Yao et al., 2022), ongoing work that uses deep learning algorithms to automatically define sulci should result in much larger sample sizes in future studies (Borne et al., 2020; Lee et al., 2024, 2025; Lyu et al., 2021). The time-consuming manual definitions of primary, secondary, and PTS also limit the cortical expanse explored in each study, thus restricting the present study to LPC/LPOJ.”

      Second, we utilized a young adult sample as this is what is the standard of the field when charting features of sulci for the first time (for example, Paus et al., 1996 Cerebral Cortex; Chiavaras & Petrides, 2000 Journal of Comparative Neurology; Segal & Petrides, 2012 European Journal of Neuroscience; Zlatkina & Petrides, 2014 Proceedings of the Royal Society B Biological Science; Sprung-Much & Petrides, 2018 Brain Structure & Function; Miller et al., 2021 The Journal of Neuroscience; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 Communications Biology; Drudik et al., 2023 Cerebral Cortex). Nevertheless, it is indeed crucial to confirm that this schematic is translatable to other age groups; however this exploration is beyond the scope of the present project and is for future investigation. We have added text to the Limitations subsection of the Discussion to emphasize the points (Pages 17-18):

      “Additionally, the scope of the present study is limited in that the sample was only in young adults. This sample was selected as it is the standard of the field when charting features of sulci for the first time (for example, Paus et al. 1996; Chiavaras and Petrides 2000; Segal and Petrides 2012; Zlatkina and Petrides 2014; Sprung-Much and Petrides 2018; Miller et al. 2021; Willbrand et al. 2022; Willbrand et al. 2023; Drudik et al. 2023). Nevertheless, it is necessary to explore how well this updated schematic translates to different age groups, species, and clinical populations.”

      Finally, it is worth mentioning that we have begun preliminary analyses on the translatability of this schematic, and have shown that it does hold in a pediatric sample (ages 6-18 years old; Author response image 1).

      Author response image 1.

      Example pediatric participant with all LPC/LOPJ sulci identified in both hemispheres. Incidence rates for the variable pTS identified in the present work in a pediatric sample are included below (N = 79 participants)

      (2) While the paper begins by describing four new sulci, only one is explored further in greater detail.

      We focused on the slocs-v as it has a high incidence rate, making it amenable to our analytic pipelines relating sulci to cortical morphology, architecture, and function, as well as cognition (Miller et al., 2021 The Journal of Neuroscience; Voorhies et al., 2021 Nature Communications; Yao et al., 2022 Cerebral Cortex; Willbrand et al., 2022 Science Advances; Willbrand et al., 2023 The Journal of Neuroscience; Maboudian et al., 2024 The Journal of Neuroscience). However, we want to emphasize that throughout the paper there are multiple analyses that further describe the three more variable sulci: 1) detailing their sulcal patterning (Supplementary Tables 1-4) and 2) detailing their morphology and architecture (Supplementary Fig. 6). We do agree though that it is a worthwhile endeavor to further describe these sulci—especially if the data is readily available. As such, to complement our behavioral analysis identifying a relationship between the morphology of the consistent sulci and spatial orientation and considering the well-documented relationship between sulcal incidence and cognition (for review see Cachia et al., 2021 Frontiers in Neuroanatomy), we tested whether the number of variable sulci and the incidence of each variable sulcus specifically were related to spatial orientation. This procedure produced null results on all neuroanatomical variables, which we now mention in the Results (Page 11):

      “Finally, as in prior work examining variably-present PTS in other cortical expanses (for example, (Amiez et al., 2018; Cachia et al., 2014; Fornito et al., 2004; Willbrand et al., 2024b), we assessed whether the presence/absence of the more variable PTS identified in the present work (slocs-d, pAngs-v, and pAngs-d) was related to spatial orientation, reasoning, and processing speed task performance. We identified no significant associations between the presence/absence of these sulci in either hemisphere with performance on these tests (ps > .05).”

      (3) There is some tension between calling the discovered sulci new vs acknowledging they have already been reported, but not named.

      To resolve this tension, we have revised the text to 1) ensure proper acknowledgment that sulci have been noticed in this region, 2) point out that these sulci were left unnamed and undescribed, and 3) emphasize that one of the primary goals of this project was to comprehensively detail the sulcal organization of this region at a precise, individual-level considering these often-overlooked sulci.

      This is primarily done at the beginning of the Results (Pages 4-5), where we now write:

      “Four previously undescribed small and shallow sulci in the lateral parieto-occipital junction (LPOJ)

      In previous research in small sample sizes, neuroanatomists noticed shallow sulci in this cortical expanse, but did not describe them beyond including an unlabeled sulcus in their schematic at best (Supplementary Methods and Supplementary Figs. 1-4 for historical details). In the present study, we fully update this sulcal landscape considering these overlooked indentations. In addition to defining the 13 sulci previously described within the LPC/LPOJ, as well as the posterior superior temporal cortex in individual participants (Methods) (Petrides, 2019), we could also identify as many as four small and shallow PTS situated within the LPC/LPOJ that were highly variable across individuals and left undescribed until now (Supplementary Methods and Supplementary Figs. 1-4). Though we officially name and characterize features of these sulci in this paper for the first time, it is necessary to note that the location of these four sulci is consistent with the presence of variable “accessory sulci” in this cortical expanse mentioned in prior modern and classic studies (Supplementary Methods). For four example hemispheres with these 13-17 sulci identified, see Fig. 1a (Supplementary Fig. 5 for all hemispheres).”

      (4) The anatomy of the sulci, as opposed to their relation to other sulci, could be described in greater detail.

      To detail these sulci above and beyond their relation to other sulci, we document the anatomical metrics of all sulci in Supplemental Figure 6:

      Results (Page 8):

      The morphological and architectural features of all LPC/LPOJ sulci are described in Supplementary Fig. 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates how neuropeptidergic signaling affects sleep regulation in Drosophila larvae. The authors first conduct a screen of CRISPR knock-out lines of genes encoding enzymes or receptors for neuropeptides and monoamines. As a result of this screen, the authors follow up on one hit, the hugin receptor, PK2-R1. They use genetic approaches, including mutants and targeted manipulations of PK2-R1 activity in insulin-producing cells (IPCs) to increase total sleep amounts in 2nd instar larvae. Similarly, dilp3 and dilp5 null mutants and genetic silencing of IPCs show increases in sleep. The authors also show that hugin mutants and thermogenetic/optogenetic activation of hugin-expressing neurons caused reductions in sleep. Furthermore, they show through imaging-based approaches that hugin-expressing neurons activate IPCs. A key finding is that wash-on of hugin peptides, Hug-γ and PK-2, in ex vivo brain preparations activates larval IPCs, as assayed by CRTC::GFP imaging. The authors then examine how the PK2-R1, hugin, and IPC manipulations affect adult sleep. Finally, the authors examine how Ca2+ responses through CRTC::GFP imaging in adult IPCs are influenced by the wash-on of hugin peptides. The conclusions of this paper are somewhat well supported by data, but some aspects of the experimental approach and sleep analysis need to be clarified and extended.

      Strengths:

      (1) This paper builds on previously published studies that examine Drosophila larval sleep regulation. Through the power of Drosophila genetics, this study yields additional insights into what role neuropeptides play in the regulation of Drosophila larval sleep.

      (2) This study utilizes several diverse approaches to examine larval and adult sleep regulation, neural activity, and circuit connections. The impressive array of distinct analyses provides new understanding into how Drosophila sleep-wake circuitry in regulated across the lifespan.

      (3) The imaging approaches used to examine IPC activation upon hugin manipulation (either thermogenetic activation or wash-on of peptides) demonstrate a powerful approach for examining how changes in neuropeptidergic signaling affect downstream neurons. These experiments involve precise manipulations as the authors use both in vivo and ex vivo conditions to observe an effect on IPC activity.

      Weaknesses:

      Although the paper does have some strengths in principle, these strengths are not fully supported by the experimental approaches used by the authors. In particular:

      (1) The authors show total sleep amount over an 18-hour period for all the measures of 2nd instar larval sleep throughout the paper. However, published studies have shown that sleep changes over the course of 2nd instar development, so more precise time windows are necessary for the analyses in this study.

      (2) Previously published reports of sleep metrics in both Drosophila larvae and adults include the average number of sleep episodes (bout number) and the average length of sleep episodes (bout length). Neither of these metrics is included in the paper for either the larval sleep or adult sleep data. Not including these metrics makes it difficult for readers to compare the findings in this study to previously published papers in the established Drosophila sleep literature.

      (3) Because Drosophila adult & larval sleep is based on locomotion, the authors need to show the activity values for the experiments supporting their key conclusions. They do show travel distances in Figure 2 - Figure Supplement 1, however, it is not clear how these distances were calculated or how the distances relate to the overall activity of individual larvae during sleep experiments. It is also concerning that inactivation of the PK2-R1-expressing neurons causes a reduction in locomotion speed. This could partially explain the increase in sleep that they observe.

      (4) The authors rely on homozygous mutant larvae and adult flies to support many of their conclusions. They also rely on Gal4 lines with fairly broad expression in the Drosophila brain to support their conclusions. Adding more precise tissue-specific manipulations, including thermogenetic activation and inhibition of smaller populations of neurons in the study would be needed to increase confidence in the presented results. Similarly, demonstrating that larval development and feeding are not affected by the broad manipulations would strengthen the conclusions.

      (5) Many of the experiments presented in this study would benefit from genetic and temperature controls. These controls would increase confidence in the presented results.

      (6) The authors claim that their findings in larvae uncover the circuit basis for larval sleep regulation. However, there is very little comparison to published studies demonstrating that neuropeptides like Dh44 regulate larval sleep. Because hugin-expressing neurons have been shown to be downstream of Dh44 neurons, the authors need to include this as part of their discussion. The authors also do not explain why other neuropeptides in the initial screen are not pursued in the study. Given the effect that these manipulations have on larval sleep in their initial screen, it seems likely that other neuropeptidergic circuits regulate larval sleep.

      We thank Reviewer #1 for the constructive comments. According to the suggestions, we have compared the relative sleep amounts of wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations between 6hr-period and 18-hour periods in the 2nd instar larval stage and found consistent sleep phenotypes. We have also showed the sleep metrics data of larva and adults. We have included additional data of locomotion and feeding behavior in wild-type control and Hugin/PK2-R1/IPCs mutants/manipulations, which suggest that sleep phenotypes of Hugin/PK2-R1/IPCs mutants/manipulations are less affected by locomotion and feeding behavior changes. As pointed out, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, other pathways including DH44 could act in larval sleep control. We have included these points in Discussion. Please see point-to-point responses for details.

      Reviewer #2 (Public review):

      Summary:

      This study examines larval sleep patterns and compares them to sleep regulation in adult flies. The authors demonstrate hallmark sleep characteristics in larvae, including sleep rebound and increased arousal thresholds. Through genetic and behavioral analyses, they identify PK2-R1 as a key receptor involved in sleep modulation, likely via the HuginPC-IPC signaling pathway. Loss of PK2-R1 results in increased sleep, which aligns with previous findings in hugin knockout mutants. While the study presents significant contributions to the field, further investigation is needed to address discrepancies with earlier research and strengthen mechanistic claims.

      Strengths:

      (1) The study explores a relatively understudied aspect of sleep regulation, focusing on larval development.

      (2) The use of an automated behavioral measurement system ensures precise quantification of sleep patterns.

      (3) The findings provide strong genetic and behavioral evidence supporting the role of the HuginPC-IPC pathway in sleep regulation.

      (4) The study has broader implications for understanding the evolution and functional divergence of sleep circuits.

      Weaknesses:

      (1) The manuscript does not sufficiently discuss previous studies, particularly concerning hugin mutants and their metabolic effects.

      (2) The specificity of IPC secretion mechanisms is unclear, particularly regarding potential indirect effects on Dilp2.

      (3) Alternative circuits, such as the HuginPC-DH44 pathway, require further consideration.

      (4) Functional connectivity between HuginPC neurons and IPCs is not directly validated.

      (5) Developmental differences in sleep regulatory mechanisms are not thoroughly examined.

      We thank Reviewer #2 for the positive comments. As suggested, our study could not exclude the possibility that in addition to the Hugin/PK2-R1/IPCs axis, alternative pathways including the Hugin/DH44 axis could contribute to sleep control in larvae. We have added these points in Discussion. We also have added additional data to show mechanistic differences of larval and adult sleep control. Please see point-to-point responses for details.

      Reviewer #3 (Public review):

      Summary:

      Sleep affects cognition and metabolism, evolving throughout development. In mammals, infants have fast sleep-wake cycles that stabilize in adults via circadian regulation. In this study, the author performed a genetic screen for neurotransmitters/peptides regulating sleep and identified the neuropeptide Hugin and its receptor PK2-R1 as essential components for sleep in Drosophila larvae. They showed that IPCs express Pk2-R1 and silencing IPCs resulted in a significant increase in the sleep amount, which was consistent with the effect they observed in PK2-R1 knock-out mutants. They also showed that Hugin peptides, secreted by a subset of Hugin neurons (Hug-PC), activate IPCs through the PK2-R1 receptor. This activation prompts IPCs to release insulin-like peptides (Dilps), which are implicated in the modulation of sleep. They showed that Hugin peptides induce a PK2-R1 dependent calcium (Ca²⁺) increase in IPCs, which they linked to the release of Dilp3, showing a connection between Hugin signaling to IPCs, Dilp3 release, and sleep regulation. Additionally, the activation of Hug-PC neurons reduced sleep amounts, while silencing them had the opposite effect. In contrast to the larval stage, the Hugin/PK2-R1 axis was not critical for sleep regulation in Drosophila adults, suggesting that this neuropeptidergic circuitry has divergent roles in sleep regulation across different stages of development.

      Strengths:

      This study used an updated system for sleep quantification in Drosophila larvae, and this method allowed precise measurement of larval sleep patterns which is essential for the understanding of sleep regulation.

      The authors performed unbiased genetics screening and successfully identified novel regulators for larval sleep, Hugin and its receptor PK2-R1, making a substantial contribution to the understanding of neuropeptidergic control of sleep regulation.

      They clearly demonstrated the mechanism by which Hugin-expressing neurons influence sleep through the activation of IPCs via PK2-R1 with Ca2+ responses and can modulate sleep.

      Based on the demonstrated activation of PK2-R1 by the human Hugin orthologue Neuromedin U, research on human sleep disorders may benefit from the discoveries from Drosophila since sleep-regulating mechanisms are conserved across species.

      Weaknesses:

      The study primarily focused on sleep regulation in Drosophila larvae, showing that the Hugin/PK2-R1 axis is critical for larval sleep but not necessary for adult sleep. The effects of the Hugin axis in the adult are, however, incompletely explained and somewhat inconsistent. PK2-R1 knockout adults also display increased sleep, as does HugPC silencing, at least for daytime sleep. The difference lies in Dilp3/5 mutant animals showing decreased sleep and IPCs seemingly responding with reduced Dilp3 release to PK-2 treatment (Figure 6). It seems difficult to reconcile the author's conclusions regarding this point without additional data. It could be argued that PK2-R1 still regulates adult sleep, but not via Hugin and IPCs/Dilps.

      Another issue might be that the authors show relative sleep levels for adults using Trikinetics monitoring. From the methods, it is not clear if the authors backcrossed their line to an isogenic wild-type background to normalize for line-specific effects on sleep. Thus, it is likely that each line has differences in total sleep time due to background effects, e.g., their Kir2.1 control line showed reduced sleep relative to the compared genotypes. This might limit the conclusions on the role of Hugin/PK2-R1 on adult sleep.

      We thank Reviewer #3 for the valuable comments. According to the suggestions, we have included additional data of adult sleep phenotypes with IPCs/Dilps and HugPC/PK-2 manipulations. We believe that these additional data further support the idea that the Hugin/PR2/IPCs axis acts differently in larval and adult sleep control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Show all data as individual data points in the graphs. The use of box-and-whisker plots makes it difficult to determine how much variation there is in each experiment.

      According to the comments, we have changed all graphs to the dots-and-whisker plots (Figures 1–6; Figure 1—figure supplements 2–4; Figure 2—figure supplement 1; Figure 3—figure supplement 1 and 3; Figure 5—figure supplement 1; and Figure 6— figure supplements 1 and 3).

      (2) Show all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6-hour period of 2nd instar development. Larval sleep changes over the course of 2nd instar development so showing an 18-hour period is not as informative for the different manipulations in the study. This also allows for a more thorough comparison to Szuperak et al 2018.

      According to the comments, we have shown all larval sleep metrics (total sleep duration, bout #, bout length, & activity) over the first 6 hours for PK2-R1 KO mutants (Figure 1-figure supplemental 5). These PK2-R1 mutant phenotypes are consistent with those described by our sleep amount data over an 18 hr period (Figure 1-figure supplemental 5). We thus consistently show all the sleep phenotype data in the 18 hr period window in the 2nd instar larvae in this paper.

      (3) Show activity values for every experiment. Behavior is based on locomotion, so there is a need to show that larvae in each manipulation do not have locomotive defects.

      According to the reviewer’s comments, we have shown the activity values for each experiment (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). These data clearly indicated that changes in sleep amounts in each manipulation are not only due to locomotion alterations. We have thus added the sentence below at line 151156 in the manuscript.

      Locomotion changes were not consistently observed upon either activation or suppression of Hug neurons (Figure 3—figure supplement 1), suggesting that changes in sleep amounts is unrelated to locomotor alterations.

      (4) Provide additional explanation as to why PK2-R1 was pursued in the study. There are several candidates in Figure 1 - Figure Supplement 4 (like sNPF-Gal4, Dh31-Gal4, and DskGal4) that have effects on sleep. These have also not been studied in the context of larval sleep regulation.

      According to the reviewer’s comments, we have added the following sentences at line 108-114 in the manuscript.

      The role of PK2-R1 in larval sleep, on the other hand, has been unknown to date. Given its strong expression in insulin-producing cells (Schlegel et al., 2016) and its function as a receptor for the neuropeptide Hugin, which modulates feeding (Schoofs et al., 2014), we hypothesized that PK2-R1 might mediate neuropeptidergic signaling that links metabolic and sleep regulation during development. We thus focused on this gene as a candidate connecting behavioral and endocrine sleep control.

      (5) Insulin manipulations are known to disrupt Drosophila development (Rulifson et al, 2002). Therefore, it would be beneficial to show that larvae develop normally in dilp3 and dilp5 mutants by examining the time to pupal formation in these mutants compared to controls. If the mutant larvae take longer to reach the pupal stage, how do the authors know that the 2nd instar control and mutant larvae are the same developmental age? As indicated above, the developmental age of larvae does affect the total amount of sleep, so this could affect the authors' conclusions.

      We agree that this is an important point in this study. In each experiment, we carefully checked the developmental stage of larvae progeny by mouth hook analysis and measuring larval size and used only larvae with characteristics comparable to wildtype 2nd instar larvae. We have added these descriptions in Methods (line 411–416).

      (6) Figure 1 data is only supported by homozygous mutants & 1 fairly-broadly expressed Gal4 driver. The authors need to show that inactivation of PK2-R1 neurons with more tissuerestrictive Gal4 driver lines has the same effect as the other manipulations to further support the conclusions. Examining sleep in activation of PK2-R1 neurons with the broadly expressed Gal4 driver & UAS-TrpA1 would also provide better support for the conclusions.

      We agree. Indeed, we tried to narrow down to small subsets of neurons using multiple different Gal4 drivers, but unfortunately, we did not obtain potential candidates.

      Therefore, although our data show that the Hugin/PK2-R1axis contributes to sleep control in larvae, we cannot rule out the possibility that other axises could also function in larval sleep control. We mentioned this point in the original version of the submitted manuscript (line 134-137).

      (7) Provide more explanation as to how your methods of defining sleep compare/contrast to published papers. It is not clear how many frames = 1 sec in your recordings. The definition of sleep as 12 frames needs to include a time component as well. This allows for easier comparison to other published papers examining Drosophila larval sleep (Szuperak et al 2018; Churgin et al 2019; Poe et al 2023; Poe et al 2024).

      Our recordings were acquired at 0.87 frames per second. We have added this information in Method (line 431).

      (8) Figure 2 data is only supported by mutants & inactivation with 1 Gal4 driver per cell population. Showing activation of Gal4-expressing cells with UAS-TrpA1 would add more support to the conclusions.

      We have already showed the reduced sleep amounts in both HuginGAL4>ReaChR and HuginGAL4>TrpA larvae (Figure 3 C & D) in the original version.

      (9) Need to clarify in the methods how the authors calculated travel distances as a measure of locomotive activity. It's not clear if this is done during larval sleep experiments or in independent experiments. It is also not clear why the y-axes of Figure 2-Figure Supplement 1 are not consistent across the panels. Finally, the authors do see decreases in locomotive activity in PK2-R1>Kir2.1 and in dilp3 mutants, so the conclusions presented in the results section of the paper need to be modified to reflect those results.

      We calculated travel distances from the same video recording datasets used for sleep quantification. We have added this information in Method (line 431-435). As the reviewer indicated, locomotor activity was reduced in a part of conditions/mutants including PK2-R1 > Kir2.1 and dilp3 mutants, and therefore we cannot exclude the possibility that locomotion changes might contribute to sleep phenotypes. On the other hand, a large part of manipulations of Hugin neurons and IPCs caused a sleep increase without significant changes in locomotor activity (Figure 2—figure supplement 1 and Figure 3—figure supplement 1). It is thus likely that Hugin and IPCs contribute to sleep control independent of locomotion, whereas other neurons trapped by PK2-R1 GAL4 might contribute to locomotion control.

      (10) Given the role that hugin neurons play in Drosophila feeding (Schlegel et al, 2016), the authors should include feeding data for the hugin/PK2-R1 manipulations. It is also unclear from the methods if their thresholding for defining sleep can detect feeding behaviors. Changes in feeding behavior could explain some of the reported increases in sleep if feeding is not classified as a waking but is instead picked up as inactivity.

      We agree that this is an important point. According to reviewer’s points, we have added feeding amounts of the wild-type control and the HuginPC>Kir2.1 larvae (Figure 3-figure supplement 3). These data suggest that feeding amounts of the HuginPC>Kir2.1 larvae are significantly reduced compared to those of the control. Given that our data analysis typically categorized feeding behavior into “moving (not sleep)” (see Materials and Methods) and that HuginPC>Kir2.1 larvae showed increased sleep amounts compared to the wild-type control, it is likely that the increased sleep amounts in HuginPC>Kir2.1 larvae are unrelated to changes in feeding behavior.

      (11) The Hugin-IPC localization data (Figure 3E) would be better supported by the use of more specific synaptic and dendritic markers. Specifically, expressing Syt-eGFP (axon marker) in hugin neurons & DenMark (dendritic marker) in IPCs. Using GRASP or P2X2 to demonstrate the anatomical/functional connections between hugin & IPC neurons would also provide better support for this conclusion.

      According to the reviewer’s suggestion, we have added Syt-eGFP signals in HuginPC neurons (Figure 4—figure supplement 1). We also tried DenMark expression in IPCs, but we could not obtain dipl3>DenMark F1 progeny for unknown season. We also applied GRASP to the HuginPC-IPCs interaction, but we could not detect obvious GRASP signals. It is well known that peptidergic transmission is often independent of conventional synapse structures, called as volume transmission, in which peptidergic signals can transmit over a long-range distance to targeting neurons. It is thus possible that IPCs might receive Hugin signals from HuginPC neurons through volume transmission.

      (12) Figure 4 is missing temperature controls for thermal activation experiments. Also missinggenetic control for UAS/+. It would be more convincing to see experiments in Figure 4 with the more specific hug-PC-Gal4 line instead of the broadly expressed hugin-Gal4 line.

      According to reviewer’s comments, we have added the control data in Figure 4.

      (13) Representative images for Figure 4B & 4C would provide better support for the quantifications & conclusions presented.

      According to the reviewer’s suggestions, we show the representative imagine for Figure 4B and 4C (please see Author response image 1). We are, however, afraid that these images might not help readers’ further understanding in addition to the quantitative data, so we have decided to not add these images in the manuscript.

      Author response image 1.

      mCD8::mCherry (top) and CRTC::GFP (bottom) are shown under high-temperature conditions without ("−") or with ("+") hugin neuron activation. "-" denotes a high-temperature genetic control lacking LexAop-TrpA1, thus no thermogenetic activation occurs. CRTC::GFP is shown in pseudocolor.

      (14) A more zoomed-out image of all the IPC neurons in the bath application of hugin peptides (Figure 5D) would help with the interpretation of the results. It's not clear if the authors only measured the same exact neuron in each IPC cluster or if they examined all of the IPC neurons. If they measured all of the IPC neurons, did they observe similar results across the different neurons? How much variability is there in the response of IPC neurons to hugin peptide application?

      For Figure 5, we obtained images of multiple brains from each genotype and quantified the NLI values from all IPC neurons. For reference, we show plots of the CRTC signals of Figure 5C each brain by bran (Author response image 2). We have added detailed information of CRTC analysis in Methods (lines 552-554).

      Author response image 2.

      Distribution of CRTC signals across individual brains. Plots of nuclear localization index (NLI) for individual brains, corresponding to the conditions shown in Figure 5C. The x-axis represents each larval brain preparation, and each dot indicates the NLI value of a single IPC neuron. Horizontal bars represent the median within each brain. These plots illustrate variability both within and across individual brains.

      (15) The conclusion that application of Hug peptides results in dilp3 release is not well supported (Figure 5E). There is a large amount of variation in anti-dilp3 signal. Representative images for these quantifications would be beneficial. The authors also don't directly show that dilp3 vesicles are released. They only see a reduction in antibody accumulation in IPCs. Could there be other reasons for the reduction in accumulation in the IPCs? Would changes in dilp3 gene expression or membrane localization cause a reduction in signal? Showing that actual release of dilp3 is affected by Hug peptides using a reporter like ANF-GFP would be more convincing.

      According to the reviewer’s comments, we have added representative images (Figure 5—figure supplement 2). As for the ex vivo experiments in Fig5, we treated the extracted brain tissues with Hugin/NMU peptides for only 5minutes. It is thus most likely that reduction of Dilps in IPCs is mediated by Hugin/PK2-R1 signal-dependent secretion, rather than transcriptional control and/or degradation of Dilps.

      (16) Show all sleep metrics (total sleep duration, bout #, bout length, and activity) for adult sleep experiments. Showing relative total sleep for the adult experiments is confusing & would benefit from plots of total average sleep in minutes for each genotype.

      According to the reviewer’s comments, we have added the sleep metrics in adults (Figure 6; Figure 6-figure supplement 3).

      (17) The authors can't conclude that expression patterns of PK2-R1 & hug between larvae & adults are "almost comparable." They don't track neurons over development or immortalize neurons in larvae & check expression patterns in adults. They need to show some type of quantification to support these claims. Or revise the text to remove this conclusion.

      We agree. We have changed our augments as follow (line 211-214).

      Interestingly, the expression patterns of PK2-R1 and Hug as well as the morphology of HugPC neurons in adults appeared to be similar to those in larvae (Figure 6—figure supplement 2), implying that the differential roles of Hug in larvae vs adults are likely due to physiological differences in HugPC neurons and/or IPCs.

      (18) For Figure 6, what effect does genetic inactivation of IPCs have on adult sleep? A more specific manipulation of these cells would provide better support for the conclusion that IPC manipulations have distinct effects on larval & adult sleep. The sleep traces for the hugin manipulation & dilp mutants (Figure 6-Figure Supplement 1) also look inconsistent when comparing genetic controls in (Figure 6-Figure Supplement 1D) or when comparing the dilp mutants. Plotting this data as total sleep amount in the day & night (2 separate graphs) would be beneficial. It would also be helpful to see additional sleep traces for these experiments.

      According to the reviewer’s comments, we have added the sleep amounts of added dilp3 and dilp5 adults (Figure 6-figure supplement 1C-D) as well as IPC silencing (Figure6-figure supplement 3D) in a daytime/night time sleep-separated manner.

      (19) For Figure 6, what effect does thermogenetic activation of hugin neurons have on IPC activity? The authors demonstrate in Figure 5 that thermal activation results in an increase in larval IPC activity, but they do not show these experiments in the adult brain. These experiments would provide more support for their conclusion that hugin has differential effects on IPC activity depending on the developmental age (larvae vs adults).

      According to the reviewer’s comments, we performed thermo-activation of hugin neurons and found no significant effects on adult IPCs (see Author response image 3), consists with the ex vivo data in Figure 6.

      Author response image 3.

      (20) A figure legend is needed for Figure 7. The model is not self-explanatory, nor is there an adequate explanation in the discussion section.

      We have added legends (line 781-785).

      (21) Since hugin is known to be downstream of Dh44 in larvae, the discussion needs to include comparison to published work on Dh44 in larvae (Poe et al, 2023). The hugin receptor, PK2R1, is also expressed in Dh44 & DMS neurons (Schlegel et al, 2016), so a discussion of what role Dh44/DMS neurons may play in their model is necessary.

      We agree. We have added discussion as follow in Discussion (line 313-320).

      We cannot rule out the possibility that other neurons could function downstream of HuginPC neurons in sleep regulation. For instance, given that Dh44 neurons in the brain promote arousal (Poe et al. 2023) and are PK2-R1-positive (Schlegel et al. 2016), Hugin might control sleep in part through Dh44 neurons.

      (22) Minor point: Line 97 should say "resulted in a significant sleep increase." Currently, it says "decrease" which is not what is depicted in the figure.

      We appreciate the reviewer’s point. We have corrected this.

      (23) Minor point: Figure 5 should be renamed as Figure 4 since the text describing the results in Figure 5A & 5B occurs before the text describing the results in Figure 4.

      We do understand the point the reviewer arose. However, since Fig5A explains the experimental setup of the whole Fig5s, we would like to keep Fig5A at the original position.

      Reviewer #2 (Recommendations for the authors):

      First, the study would benefit from a more comprehensive discussion of previous research, particularly the work by Schlegel et al. (2016) and Melcher and Pankratz (2006). A key inconsistency that should be addressed is the observation that hugin mutant larvae exhibit reduced body size and feeding behavior, which may influence Dilp2 secretion. The selective effect on Dilp3 and Dilp5 without affecting Dilp2 warrants further clarification. Conducting conditional gene expression experiments to control hugin, dilp3, and dilp5 expression, along with neuronal activity modulation, would help determine whether the observed effects are direct or secondary consequences.

      According to the review’s comments, we tried to manipulate neuronal activity in IPCs, but unfortunately, expression of Kir2.1 in IPCs caused die or very weak animals. Instead, we cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Second, the specificity of IPC secretion mechanisms should be clarified. Given that IPCs coexpress Dilp2, Dilp3, and Dilp5, it remains unclear how the pathway selectively modulates Dilp3 and Dilp5 while leaving Dilp2 unaffected. Additional experiments, such as electron microscopy, could provide insights into whether anatomical differences in vesicular pools influence peptide secretion. Since hugin mutants are reported to have reduced body size, confirming that Dilp2 secretion remains truly unchanged is crucial for eliminating potential indirect effects.

      We thank this reviewer for the valuable suggestions. Since the selective Dilp secretion mechanisms in IPCs are not the main scope in this paper, we would like to attempt detailed EM analysis in next studies. We cited a recent paper that shows a differential secretion of Dilp2 and Dilp6 from IPCs in a stimulant-dependent manner (Suzawa et al. PNAS 2025) and added more discussion about selective Dilp3/5 secretion by Hugin-PK2-R1 signals (line 275-297).

      Third, the study should explore the potential role of alternative circuits, such as the HuginPCDH44 pathway, in sleep regulation. The observation that DH44 mutants exhibit even greater sleep amounts than PK2-R1 mutants suggests the involvement of additional regulatory mechanisms. Prior studies indicate that HuginPC neurons may influence DH44 neuron activity, which could impact sleep. Furthermore, recent findings link DH44 with starvation-induced sleep loss in adult flies. Discussing and experimentally investigating the HuginPC-DH44 axis in larval sleep regulation would provide additional depth to the study.

      As far as we understand, any direct evidence for HuginPC→DH44 pathway has not been reported in larvae as well as adults. Instead, DH44 influences Hugin neuron activity in adults (King et al. 2017). We thus examined whether optogenetic DH44 activation could influence HuginPC activity using CRTC analysis, but unfortunately, we could not detect significant changes in HuginPC activity.

      Given that PK2-R1 is expressed in DH44-positive neurons (Schelgel et al 2016) and that DH44-positive neurons are localized at the regions to which HuginPC neurons innervate, it is still possible that the HuginPC→DH44 pathway might function in parallel to the HuginPC→IPCs pathway. We feel that this is quite an interesting possibility and should be a nice scope in the next paper.

      Fourth, validating the functional connectivity between HuginPC neurons and IPCs using calcium imaging would significantly enhance the study. Employing real-time calcium imaging with GCaMPs would provide direct evidence of synaptic activity between these neuronal populations. Such data would strengthen the claim that the observed sleep regulatory effects result from direct neural communication rather than secondary systemic influences.

      We agree. Indeed, we tried Ca<sup>2+</sup> imaging of HuginPC neurons and IPCs in living larvae as well as using ex vivo preparations, and realized that it was quite technically difficult to obtain reliable Ca<sup>2+</sup> dynamics data in the brain of living larvae/ex vivo brain tissue. Therefore, instead of live Ca<sup>2+</sup> imaging, we performed the CRTC analysis using fixed brain preparations. We have added the mention that we tried Ca<sup>2+</sup> imaging in the larval brain, but it did not work well (line 555-558).

      Finally, a more detailed discussion of developmental differences in sleep regulatory mechanisms would be beneficial. The manuscript should address why genes involved in sleep modulation during development may function differently from their roles in adult sleep regulation. Providing a conceptual framework or experimental evidence to explain these developmental differences would enhance the study's contribution to understanding the evolution of sleep circuits. Clarifying how these findings fit into broader sleep regulation models would increase the impact of the research.

      We agree. We would like to add discussions about how factors/circuits involved in sleep modulation during development may function differently from their roles in adult sleep regulation as follows (line 349-371), as it is rather difficult to discuss why.

      It is thus possible that Hugin/PK2-R1 signaling along the HugPC-IPCs circuitry is suppressed in adults. IPCs in adults receive multiple positive and negative modulatory inputs through GPCRs including the metabotropic GABA<sub>B</sub> receptors (Enell et al., 2010), which suppresses IPC activity and Dilp release in adult IPCs (Enell et al., 2010). It is thus plausible that such negative modulatory inputs to IPCs in adults might counteract with the Hugin/PK2-R1 axis to suppress Dilp release. In addition, our data suggest that Dilps modulate sleep amount in the opposite directions in larvae and adults (Figure 7). Comparing the expression levels and activities of GPCRs in larval and adult IPCs would be essential to better understand how the same modulatory signals over the course of development come to exert differential impacts on sleep. Interestingly, Hugin in adults appears irrelevant for the baseline sleep amount but is required for homeostatic regulation of sleep (Schwarz et al., 2021). Thus, testing if Hugin/PK2-R1 axis is involved in the homeostatic regulation of larval sleep, and how such a system compares to its adult counterpart, may further provide mechanistic insights into how homeostatic sleep regulation matures over development.

      By addressing these aspects, the manuscript will provide a clearer, more robust, and wellsupported analysis of larval sleep regulation. These refinements will help improve the study's clarity and impact, ensuring that its findings are effectively communicated to the research community.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 97: "Silencing neurons expressing Oamb and PK2-R1 resulted in a significant sleep decrease?" But there is an increase in sleep amounts from Figure 1A. (Typo error).

      We thank the reviewer for pointing out this typo. We have corrected this typo in the revised version.

      (2) Line139: "HugPC and IPCs labeled by Dilp3-GAL4 are located in close proximity to each other." While proximity does not equal synaptic connections, direct connectivity of HugPC and IPCs was already shown in larval connectome analyses with HugPC providing the strongest input of larval IPCs (Hückesfeld et al. eLife 2021). This could be cited in this context instead.

      We agree. We have cited this paper in References (line 163).

      (3) Figure 2 Supplement 1: Locomotion speed is affected in PK2-R1 knockouts; what is the significance regarding the observed sleep increase?

      We agree that this is a very important point. As the reviewer pointed out, since locomotion speed was reduced in PK2-R1 KO larvae, sleep increase phenotype in PK2-R1 KO larvae might be in part due to reduction of locomotion. On the other hand, IPCs silencing by Kir2.1caused sleep increase phenotype without significant changes in locomotion (Figure 2; Figure 2-figure supplement 1). It is thus possible that since PK2-R1 is broadly expressed in the nervous system in addition to IPCs (Figure 2), PK2-R1 neurons other than IPCs might contribute to locomotion control.

      (4) Why are Dilp3 levels changing (increasing) in adult IPCs after PK-2 treatment? This is not mentioned in the text and is not discussed at all.

      As the reviewer indicated, this data is unexpected to us. At this moment, we could only assume that PK-2 could act in larval and adult IPCs in a different manner. We have added this sentence in Results (line 211-214).

      (5) It has been shown in other publications that Dilps play a role in sleep regulation (Cong et al., Sleep 2015), this study should be cited.

      We have cited this paper in References (line 224).

      (6) The order of discussing figure panels is sometimes confusing, e.g. Figure 6C is discussed at the very end after 6D-F.

      We agree. Indeed, we discussed a lot about this order during preparation of the first draft. However, we finally decided the current order, as grouping “sleep phenotype data” and “ex vivo data” should be easier to understand for readers. We thus keep the current order in the revised submission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs— Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses:

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The editor suggested addressing points regarding the young age at diet onset, use of males only, and justification for the choice of tissues analyzed without requiring new data generation.

      We agree that these are important points for context. We have now added a dedicated paragraph to the Discussion section (page 22) to explicitly acknowledge and discuss these as limitations of our study. We justify our initial experimental design choices in the context of the existing literature while acknowledging the valuable insights that studies in females and with different diet onset timings would provide.

      The editor and reviewers recommended a more integrative analysis, suggesting the use of freely available tools, and a deeper discussion to frame the work against the existing literature.

      We thank the editor for this excellent suggestion. In response to this and the detailed points from Reviewer #2, we have performed a new, integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool, a state-of-the-art, freely available package for integrative multi-omics analysis. This new analysis, presented in a new Figure 4 and described in the Results section (pages 20-23), identifies the key sources of variation across tissues and omics layers, directly addressing the request for a true integrative approach. Furthermore, we have thoroughly revised the Results and Discussion to more sharply frame our findings and highlight the new insights gleaned from our study.

      The editor requested clarification on whether mice were fasted at euthanasia and to rephrase the statement on page 12 regarding mitochondrial pathways.

      - We have clarified in the Methods section (page 4) that mice were euthanized at the end of their fasting period, precisely detailing the stage of the IF cycle.

      - We thank the editor for this critical correction. We have rephrased the statement on page 12 to more accurately reflect that we observed a lower abundance of proteins involved in mitochondrial oxidative pathways, and we now carefully discuss the important distinction between protein abundance and functional activity in this context.

      The editor noted that the introduction is missing key citations and should acknowledge foundational work.

      We apologize for this oversight. We have now revised the Introduction to include several key foundational citations that were previously missing, ensuring proper credit to the important work of our colleagues.

      Reviewer #2 (Recommendations for the authors):

      We thank the reviewer for their exceptionally detailed and helpful technical suggestions, which have greatly improved the analytical rigor of our manuscript.

      (1) & (4) 3D PCA and Integrated Multi-Omics Analysis:

      We agree with the reviewer that a more sophisticated integrative analysis was needed. As detailed in our response to the editor, we have replaced the original side-by-side analysis with a proper integrated multi-omics analysis using Latent variable approaches (DIABLO), implemented in the mixOmics R package version 6.28.0 tool. This new analysis simultaneously models the proteomic and transcriptomic data from all three organs, identifying shared and tissue-specific sources of variation. This directly and more powerfully validates our claim of "conserved and tissue-specific responses." The results of this analysis are now central to our revised Results section and Figure 4 and supplementary figures (PCA analysis). 

      (2) Concordance/Discordance Analysis:

      This is an excellent point. We have now performed a comprehensive analysis of transcript-protein concordance for the differentially expressed molecules in each tissue. A new figure 4 summarizes these findings, and we discuss the biological implications of both concordant and discordant pairs in the Results section.

      (3) Organ-Specific Functional Remodeling:

      We have taken this advice to heart. The new analysis inherently addresses whether the functional remodeling is shared or tissue-specific. 

      (5) Missing Citations:

      We have thoroughly reviewed the literature and added key citations throughout the manuscript, particularly in the Introduction and Discussion, to properly situate our work within the field.

      (6) Starting Results with Supplementary Data:

      As the study design, including the timing of experimental interventions and blood and tissue collections, is summarized in the supplementary figures, the Results and Discussion section begins with those figures. However, we have now renamed the figures according to the eLife style, in which supplementary figures are linked to the main figures. This ensures a more logical and coherent flow.

      (7) Figure Presentation and Explanation:

      We have completely revised all figures to improve their clarity, consistency, and professional appearance. We have also carefully gone through the manuscript to ensure that every panel in every figure is explicitly mentioned and explained in the main text.

      Reviewer #3 (Recommendations for the authors):

      We thank the reviewer for their important comments regarding the model system.

      (1) Sex Differences and Limitations:

      We fully agree that studying sex differences is a critical and profound aspect of dietary interventions. As noted in our response to the editor, we have added a paragraph to the Discussion to explicitly acknowledge this as a key limitation of our current study. We discuss the existing evidence for sex-specific responses to IF and state that this is an essential direction for future research.

      (2) Early Diet Onset and Developmental Programs:

      This is a valuable point. We have added text to the Discussion acknowledging that starting IF at 6 weeks of age could potentially interact with developmental programs. We discuss this as a consideration for interpreting our data and for the design of future studies.

      We believe that our revised manuscript is substantially stronger as a result of addressing these comments. We are grateful for the opportunity to improve our work and hope that you and the reviewers find these responses and revisions satisfactory.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study delineates a highly specific role for the pPVT in unconditioned defensive responses. The authors use a novel, combined SEFL and SEFR paradigm to test both conditioned and unconditioned responses in the same animal. Next, a c-fos mapping experiment showed enhanced PVT activity in the stress group when exposed to the novel tone. No other regions showed differences. Fiber photometry measurements in pPVT showed enhancement in response to the novel tone in the stressed but not nonstressed groups. Importantly, there were also no effects when calcium measurements were taken during conditioning. Using DREADDS to bidirectionally manipulate global pPVT activity, inhibition of the PVT reduced tone freezing in stressed mice while stimulation increased tone freezing in non-stressed mice.

      Strengths:

      A major strength of this research is the use of a multi-dimensional behavioral assay that delineates behavior related to both learned and non-learned defensive responses. The research also incorporates high-resolution approaches to measure neuronal activity and provide causal evidence for a role for PVT in a very narrow band of defensive behavior. The data are compelling, and the manuscript is well-written overall.

      Weaknesses:

      Figure 1 shows a small, but looks to be, statistically significant, increase in freezing in response to the novel tone in the no-stress group relative to baseline freezing. This observation was also noticed in Figures 2 and 7. The tone presented is relatively high frequency (9 kHz) and high dB (90), making it a high-intensity stimulus. Is it possible that this stimulus is acting as an unconditioned stimulus?

      We thank the reviewer for this insightful comment. In our view, the freezing behavior elicited by the tone reflects an unconditioned response; accordingly, the tone functions as an unconditioned stimulus. Indeed, in our data we found a modest increase in freezing in the no-stress group during the tone presentation relative to baseline (Figures 1, 2, and 7). This effect, however, was considerably smaller in magnitude than the robust freezing observed in stressed mice. We conclude that prior footshock stress enhances the unconditioned tone response.

      In addition, in the final experiment, the tone intensity was increased to 115 dB, and the freezing % in the non-stressed group was nearly identical (~20\%) to the non-stressed groups in Figures 1-2 and Figure 7. It seems this manipulation was meant as a startle assay (Pantoni et al., 2020).

      We appreciate the opportunity to clarify this aspect of the model. In Figure 7, the rationale for selecting a tone amplitude to 115 dB was not to conduct a startle assay. Instead, we sought to determine whether chemogenetic inhibition of the pPVT influenced tone-elicited unconditioned fear in stress naïve mice. Given our prior experiments demonstrating that a 90 dB tone elicits relatively low levels of freezing in non-stressed groups, we increased the tone amplitude to 115 dB in an attempt to elicit a more robust freezing response that would be sufficient to detect meaningful group differences (i.e., prevent a floor effect). As noted by the reviewer, the 115 dB tone yielded moderate levels of freezing behavior. Although freezing levels were not very high, we believe they were sufficient to avoid a floor effect. There was no effect pPVT inhibition in this version of the task, which suggests that pPVT is preferentially engaged after stress. Future studies that identify tone parameters capable of eliciting high levels of freezing will be necessary to further strengthen this finding.

      Because the auditory perception of mice is better at high frequencies (best at ~16 kHz), would the effect seen be evident at a lower dB (50-55) at 9 kHz? If the tone was indeed perceived as “neutral,” there should be no freezing in response to the tone. This complicates the interpretation of the results somewhat because while the authors do admit the stimulus is loud, would a less loud stimulus result in the same effect? Could the interaction observed in this set of studies require not a novel tone, but rather a highintensity tone that elicits an unconditioned response?

      Within our framework, it is important to emphasize that tone intensity (amplitude and frequency), rather than the perceived novelty of the stimulus, is the primary determinant of unconditioned freezing behavior. Moreover, numerous studies have demonstrated that auditory stimuli have the capacity to elicit unconditioned fear responses, as in the case of pseudoconditioning. Accordingly, we agree with the reviewer that decreasing the tone amplitude from 90 dB to 50 dB would diminish the unconditioned freezing response. For example, Kamprath and Wotjak (2004) demonstrated that stress-naïve mice exposed to a 95 dB tone exhibited significantly greater levels of freezing compared to those exposed to an 80 dB tone. This graded effect of tone amplitude on unconditioned freezing was also observed in mice previously exposed to footshock stress. Notably, the authors also reported a plateau effect, such that increases in tone amplitude beyond 95 dB did not further elevate freezing levels. As it relates to our findings, this plateau effect may explain the rather modest changes in freezing behavior that we observed between the 90 dB and 115 dB tone.

      Along these same lines, it appears there may be an elevation in c-fos in the PVT in the non-stress tone test group versus the no-stress home cage control, and overall it appears that tone increases c-fos relative to homecage. Could PVT be sensitive to the tone outside of stress? Would there be the same results with a less intense stimulus?

      Indeed, as the reviewer noted, we observed an increase in PVT c-Fos expression in non-stressed animals exposed to the SEFR tone test relative to homecage controls. The finding is consistent with previous reports demonstrating that PVT neurons are robustly activated by salient stimuli and regulate properties of arousal (Penzo and Gau, 2022). Moreover, the PVT has been shown to exhibit neuronal activity responses that are scaled to stimulus intensity. For example, PVT neurons display increased firing rates in response to a tail shock compared to an air puff (Zhu, 2018). Thus, it is conceivable that a less intense stimuli would evoke a diminished level of c-Fos expression.

      I would also be curious to know what mice in the non-stressed group were doing upon presentation of the tone besides freezing. Were any startle or orienting responses noticed?

      We thank the reviewer for raising this important question. Regarding startle responses, we have found that our standard 90 dB, 9 kHz tone parameter elicits similar degrees of startle between stressed and non-stressed mice (data unpublished). However, Golub et al. (2009) observed effects of prior footshock stress on acoustic startle. Further investigation of behavioral responses expressed during the tone is certainly warranted.

      Reviewer #2 (Public review):

      Summary:

      Nishimura and colleagues present findings of a behavioral and neurobiological dissociation of associative and nonassociative components of Stress Enhanced Fear Responding (SEFR).

      Strengths:

      This is a strong paper that identifies the PVT as a critical brain region for SEFR responses using a variety of approaches, including immunohistochemistry, fiber photometry, and bidirectional chemogenetics. In addition, there is a great deal of conceptual innovation. The authors identify a dissociable behavior to distinguish the effects of PVT function (among other brain regions).

      Weaknesses:

      (1) The authors find a lack of difference between the Stress and No Stress groups in pPVT activity during SEFL conditioning with fiber photometry but an increase in freezing with Gq DREADD stimulation. How do authors reconcile this difference in activity vs function?

      The reviewer points out a curious dissociation. Fiber photometry showed no effect of prior stress on the PVT response during single-shock contextual fear conditioning; however, Gq DREADD stimulation of PVT led to increased postshock freezing during this session. We don’t have a definitive explanation for this dissociation, but we wish to emphasize two relevant points. The first is that in our experience, post-shock freezing during the one-shock contextual fear conditioning session is modest, variable, and an unreliable predictor of long-term contextual fear. Thus, we are hesitant to draw firm conclusions from these data. Second, we did not observe differences in freezing during the SEFL context test, indicating that stimulation of pPVT during conditioning is not sufficient to elicit long-term enhancement of conditioned fear (i.e., SEFL). This suggests that the acute freezing response following shock exposure is mechanistically distinct from expression of conditioned contextual fear. Clearly, further research will be needed to clarify the conditions under which PVT activity regulates / does not regulate freezing.

      (2) Because the PVT plays a role in defensive behaviors, it would be beneficial to show fiber photometry data during freezing bouts vs exclusively presented during tone a shock cue presentations.

      We appreciate the reviewer's suggestion. Unfortunately, freezing data are not available for the fiber photometry experiment because the fiber optic patch cable interfered with mouse activity. We now acknowledge this as a limitation in the paper (line #202).

      (3) Similar to the above point, were other defensive behaviors expressed as a result of footshock stress or PVT manipulations?

      In addition to freezing behavior and locomotor activity in the open field, we examined the time and distance spent in the center of the open field arena. Consistent with our previous report (Hassien, 2020), we did not observe significant group differences between stress conditions, nor did we detect differences across the various experiential manipulations. We did not examine other defensive behaviors in this study. Ongoing research in the lab is examining a broader range of defensive behaviors in this paradigm.

      (4) Tone attenuation in Figure 8 seems to be largely a result of minimal freezing to a 115-dB tone. While not a major point of the paper, a more robust fear response would be convincing.

      Although our data indicate that DREADD-mediated inhibition of the pPVT did not attenuate freezing in non-stressed mice, we agree with the reviewer’s assessment that the 115 dB tone elicited only minimal freezing. Therefore, we remain open to the possibility that higher baseline levels of freezing might reveal a significant behavioral effect. We found it challenging to identify a decibel range that reliably evokes robust freezing in non-stressed mice. Future studies could explore varying tone frequencies to achieve a stronger freezing response.

      (5) In the open field test, the authors measure total distance. It would be beneficial to also show defensive behavioral (escape, freezing, etc) bouts expressed.

      We agree this would be valuable information, and we have noted it as a future direction in the discussion.

      (6) The authors, along with others, show a behavioral and neural dissociation of footshock stress on nonassociative vs associative components of stress; however, the nonassociative components as a direct consequence of the stress seem to be necessary for enhancement of associative aspects of fear. Can authors elaborate on how these systems converge to enhance or potentiate fear?

      We appreciate the reviewer for recognizing this important point regarding the mechanistic relationship between nonassociative fear sensitization and associative fear learning that occurs following footshock stress. At present, the majority of research on this topic has been conducted using the SEFL paradigm.

      At the behavioral level, previous studies indicate that manipulations that interfere or attenuate associative fear memory of the footshock stress event fail to block nonassociative fear sensitization. For example, both SEFL and SEFR persist in animals that have successfully undergone fear extinction training in the footshock stress context (Rau et al., 2005; Hassien et al., 2020). Furthermore, reports also find that infantile or pharmacological amnesia of the footshock stress memory does not occlude the emergence of SEFL (Rau et al., 2005; Poulos et al., 2014). Taken together, associative fear memory of the footshock stress event does not appear to be necessary for fear sensitization.

      If and how the associative and nonassociative mechanisms interact is an interesting question that we are currently investigating. PVT has direct projections to the central and basolateral amygdala, regions well known to mediate conditioned fear acquisition and expression (Penzo et al., 2015). Why PVT activity does not modulate conditioned fear in our hands is intriguing. PVT is a heterogeneous structure with a variety of projections (e.g., Shima et al., 2023), and it is possible that the PVT-Amygdala projections are not hyperactive in our paradigm. As we alluded above, further research will be needed to understand why stress-induced PVT hyperactivity affects some forms of fear and not others.

      (7) In the discussion, authors should elaborate on/clarify the cell population heterogeneity of the PVT since authors later describe PVT neurons as exclusively glutamatergic.

      The reviewer is correct that additional explanation of PVT cellular heterogeneity is warranted. We now provide clarity on this point in the discussion.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Nishimura et al. examines the behavioural and neural mechanisms of stress-enhanced fear responding (SEFR) and stress-enhanced fear learning (SEFL). Groups of stressed (4 x shock exposure in a context) vs non-stressed (context exposure only) animals are compared for their fear of an unconditioned tone, and context, as well as their learning of new context fear associations. Shock of higher intensity led to higher levels of unlearned stress-enhanced fear expression. Immediate early gene analysis uncovered the PVT as a critical neural locus, and this was confirmed using fiber photometry, with stressed animals showing an elevated neural signal to an unconditioned tone. Using a gain and loss of function DREADDs methodology, the authors provide convincing evidence for a causal role of the PVT in SEFR.

      Strengths:

      (1) The manuscript uses critical behavioural controls (no stress vs stress) and behavioural parameters (0.25mA, 0.5mA, 1mA shock). Findings are replicated across experiments.

      (2) Dissociating the SEFR and SEFL is a critical distinction that has not been made previously. Moreover, this dissociation is essential in understanding the behavioural (and neural) processes that can go awry in fear.

      (3) Neural methods use a multifaceted approach to convincingly link the PVT to SEFR: from Fos, fiber photometry, gain and loss of function using DREADDs.

      Weaknesses:

      No weaknesses were identified by this reviewer; however, I have the following comments:

      A closer examination of the Test data across time would help determine if differences may be present early or later in the session that could otherwise be washed out when the data are averaged across time. If none are seen, then it may be worth noting this in the manuscript.

      Given the sex/gender differences in PTSD in the human population, having the male and female data points distinguished in the figures would be helpful. I assume sex was run as a variable in the statistics, and nothing came as significant. Noting this would also be of value to other readers who may wonder about the presence of sex differences in the data.

      We appreciate the reviewer’s thoughtful feedback and have addressed these points as follows: In the methods section, we clarify that pre-tone and post-tone freezing behavior was averaged because we did not detect a significant effect of time across all experiments (line #474). With regards to sex differences, we clarify in the methods section that we did not detect sex as a statistically significant variable across tests (line #443). In addition, we have revised the figures to denote male and female subjects separately.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Following discussion, the reviewers and editors agreed that the strength of the evidence could be updated to compelling, provided the comments were adequately addressed.

      Reviewer #1 (Recommendations for the authors):

      (1) In the discussion around line 333, there is also data indicating a time-dependent role for PVT in conditioned fear (Quinones-Laracuente 2021; Do-Monte 2015).

      We agree with the reviewer’s assessment and have revised the discussion accordingly (line #364).

      (2) The 129S6/SvEvTac mouse exhibits impaired fear extinction but intact discrimination (Temme, 2014). Was there any rationale for using this line of mice?

      The reviewer is correct that additional explanation is warranted. We have amended the manuscript to include additional rationale for using the 129S6/SvEvTac mouse strain as well as address the findings of Temme, 2014 as they relate to our study (line #94).

      (3) Was there any reason why there were no c-fos results in the PAG and IPBM? You discuss those brain regions and their importance in the circuit in the discussion.

      In the current manuscript, we do show c-fos results for the lPAG, dlPAG, and lPBN (Figure 3). We highlight in the discussion the relevance of these regions in the fear circuit.

      (4) Take a look at Sillivan et al., 2018 for an additional reference in the introduction (around lines 61).

      We thank the reviewer for their suggestion and have included the reference in the introduction (line #63).

      (5) Can the authors show the c-fos data for aPVT and pPVT separately? The authors focus on pPVT for later manipulations, but the c-fos data is collapsed. Along these same lines, were there any corrections for multiple comparisons across the brain regions? While the subsequent experiments firmly support a role for pPVT in unlearned stressinduced fear response, a proper correction for multiple comparisons is warranted.

      We have revised Figure 3 to include c-fos expression for both the anterior and posterior PVT separately. To correct for multiple comparisons, we conducted twoway ANOVA (Brain Region X Group) with Tukey's-corrected posthoc tests detailed in methods section (line #577).

      (6) Do the authors provide rationale for why they began to focus specifically on pPVT versus aPVT?

      We agree that additional clarity is warranted. We have provided additional rationale for selecting pPVT as our primary focus in the results section (line #197).

      (7) Lines 298-337 of the discussion could be shortened. This long preamble is a summary of the results.

      We agree with the reviewer’s assessment and have revised the manuscript accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional analyses for fiber photometry and open field data to probe for PVT-related changes in defensive behaviors beyond freezing.

      As stated above, we agree with the reviewer that additional behavioral analyses would be valuable. Unfortunately, such measures are not available for the current experiment.

      Reviewer #3 (Recommendations for the authors):

      As mentioned in the weaknesses, just checking for differences across time on the Tests, highlighting the M vs. F datapoints in the figures, and reporting if there are sex differences in any of the analyses.

      In the revised manuscript, we have included separate male and female data points for each figure. In addition, we provided clarity in the methods section reporting a lack of statistically significant sex differences across each experiment (line #443).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It is well established that many potivirids (viruses in the Potiviridae family), particularly potyviruses (viruses in the Potyvirus genus), recruit (selectively) either eIF4E or eIF(iso)4E, while some others can use both of them to ensure a successful infection. CBSD caused by two potyvirids, i.e., ipomoviruses CBSV and UCBSV, severely impedes cassava production in West Africa. In a previous study (PBI, 2019), Gomez and Lin (co-first authors), et al. reported that cassava encodes five eIF4E proteins, including eIF4E, eIF(iso)4E-1, eIF(iso)4E-2, nCBP-1 and nCBP-2, and CBSV VPg interacts with all of them (Co-IP data). Simultaneous CRISPR/Cas9-mediated editing of nCBp-1 and -2 in cassava significantly mitigates CBSD symptoms and incidence. In this study, Lin et al further generated all five eIF4E family single mutants as well as both eIF(iso)4E-1/-2 and nCBP-1/-2 double mutants in a farmer-preferred casava cultivar. They found that both eIF(iso)4E and nCBP double mutants show reduced symptom severity, and the latter is of better performance. Analysis of mutant sequences revealed one important point mutation, L51F of nCBP-,2 that may be essential for the interaction with VPg. The authors suggest that the introduction of the L51F mutation into all five eIF4E family proteins may lead to strong resistance. Overall I believe this is an important study enriching knowledge about eIF4E as a host factor/susceptibility factor of potyvirids and proposing new information for the development of high CBSD resistance in cassava. I suggest the following two major comments for authors to consider for improvement:

      (1) As eIF(iso)4e-1/-2 or nCBP-1/-2 double mutants show resistance, why not try to generate a quadruple mutant? I believe it is technically possible through conventional breeding.

      (2) I agree that L51F mutation may be important. But more evidence is needed to support this idea. For example, the authors may conduct a quantitative Y2H assay on the binding of VPg to each of the eIF4E (L51F) mutants. Such data may add as additional evidence to support your claim.

      We thank the reviewer for their overall assessment. Regarding investigating a quadruple mutant, we agree that this is a logical next step to investigate. A conventional breeding approach with existing mutant lines, however, is problematic for several reasons; 1) cassava does not flower where this work was conducted, and 2) cassava is subject to inbreeding depression, resulting in both low seed set and considerable heterogeneity among progeny that do arise. Editing existing double mutants is possible, but would require a significant, multi-year investment to produce embryogenic tissue from existing lines and generate the new lines. Cassava has practical limits as a non-model plant. Given these constraints, we conclude that investigating a quadruple mutant is beyond the scope of the current work.

      For investigating the HPL to HPF mutation in other cassava eIF4E-family proteins and their interaction with VPg in yeast, we have now completed this experiment and included the data in the paper. Notably we find that generating this mutant for eIF(iso)4E-2 attenuates VPg interaction without impairing eIF(iso)4E-2 accumulation, while similarly mutating nCBP-1 and eIF(iso)4E-1 results in total and reduced protein accumulation, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors generated single and double knockout mutants for the eIF4E family members eIF4E, iso4E1, iso4E2, nCBP1, and nCBP2 in cassava. While a single knockout of these eIF4E genes did not abolish viral infection, the nCBP1/nCBP2 double knockout mutant displayed the weakest symptoms and viral infection. Through yeast two-hybrid screening, the nCBP-2 L51F mutant was identified, and the mutant was unable to interact with VPg, yet the nCBP-2 L51F mutant could complement the eIF4E yeast mutant. This L51F is a potentially important editing site for eIF4E.

      Strengths:

      This study systematically generated single and double knockout mutants for the eIF4E family members and investigated their antiviral activity. It also identified a L51F site as a potentially important antiviral editing site in eIF4E, however, its antiviral genetic evidence remains to be validated.

      Weaknesses:

      (1) The symptoms of the iso4E1 & iso4E2 double-knockout mutant are slightly alleviated, and those of the nCBP1 & nCBP2 double-knockout mutant are alleviated the most. If the iso4E1 & iso4E2 and nCBP1 & nCBP2 mutants are crossed to obtain quadruple-knockout mutant plants, whether the resistance of the quadruple mutant will be more excellent should be further investigated.

      (2) Although the yeast two-hybrid identified the nCBP-2 L51F mutant, there is no direct biological evidence demonstrating its antiviral function. While the 6-amino acid deletion mutant (including L51F) showed attenuated symptoms, this deletion might be sufficient to cause loss-of-function of nCBP-2. These indirect observations cannot definitively establish that the L51F mutation specifically confers antiviral activity.

      (3) Given that nCBP-2 can rescue yeast eIF4E mutants, introducing wild type and L51F nCBP2 into the Arabidopsis iso4e mutant viral infectious clones into yeast systems could clarify whether the L51F mutation (and the same mutations in eIF4E, iso4E1, iso4E2) abrogates their roles as viral susceptibility factors - critical genetic evidence currently missing.

      We sincerely thank the reviewer for their constructive feedback.

      With regards to investigating a quadruple eIF4E mutant, please see our response to reviewer 1.

      The reviewer makes a salient point regarding the nCBP-2 L51F and K45_L51del mutations. Ideally, complementation of the ncbp double mutant with nCBP-2 L51F, followed by viral challenge, would address this question. However, the practical limitations, as noted in our response to reviewer 1, make this difficult within the context of this manuscript. We acknowledge that this is a limitation of our study and have been cautious in not overstating our conclusions.

      Reviewer #3 (Public review):

      In the manuscript, the authors generated several mutant plants defective in the eIF4E family proteins and detected cassava brown streak viruses (CBSVs) infection in these mutant plants. They found that CBSVs induced significantly lower disease scores and virus accumulation in the double mutant plants. Furthermore, they identified important conserved amino acid for the interaction between eIF4E protein and the VPg of CBSVs by yeast two hybrid screening. The experiments are well designed, however, some points need to be clarified:

      (1) The authors reported that the ncbp1 ncbp2 double mutant plants were less sensitive to CBSVs infection in their previous study, and all the eIF4E family proteins interact with VPg. In order to identify the redundancy function of eIF4E family proteins, they generated mutants for all eIF4E family genes, however, these mutants are defective in different eIF4E genes, they did not generate multiple mutants (such as triple, quadruple mutants or else) except several double mutant plants, it is hard to identify the redundant function eIF4E family genes.

      (2) The authors identified some key amino acids for the interaction between eIF4E and VPg such as the L51, it is interesting to complement ncbp1 ncbp2 double mutant plants with L51F form of eIF4E and double check the infection by CBSVs.

      We thank the reviewer for their assessment and feedback.

      Regarding analysis of higher-order mutants, please see our response to Reviewer #1’s public review.

      For investigation of nCBP-2 L51F in planta, please see our response to Reviewer #2’s public review.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since nCBP2 can complement a yeast mutant, it indicates that nCBP2 can also complement Arabidopsis. Wild-type nCBP2 should be introduced into the Arabidopsis iso4e mutant to determine whether it can complement Arabidopsis iso4e and whether the virus can re-establish the infection. The nCBP2 L51F mutant should also be introduced into the Arabidopsis iso4e mutant to see if this mutant fails to re-establish the virus infection. Similarly, eIF4E, iso4E1, iso4E2, nCBP1, etc., should be introduced into the Arabidopsis iso4e mutant to determine whether they can truly complement the virus-infected mutant Arabidopsis, while the L51F mutants cannot.

      Arabidopsis encodes multiple eIF4E proteins, an nCBP protein, and an eIF(iso)4E protein, and knocking out the eIF(iso)4e gene specifically confers resistance to TuMV. Introducing cassava nCBP-2 into arabidopsis eif(iso)4e mutants is unlikely to restore TuMV susceptibility. Because TuMV belongs to a different genus than CBSV, we used the TuMV VPg interaction with arabidopsis eIF(iso)4E to test the generality of mutating the eIF4E HPL motif to HPF potyvirid VPg-eIF4E interaction. However, since this mutation disrupts arabidopsis eIF(iso)4E’s endogenous translation initiation activity in yeast, this mutant protein is not worth pursuing further. In contrast, cassava eIF(iso)4E-2 L27F retains translation initiation activity and has reduced interaction with CBSV VPg by quantitative yeast two-hybrid. It would be interesting to see if this particular mutant protein could interact with TuMV VPg, and if not, would then be worth testing for the ability to restore TuMV susceptibility in Arabidopsis eif(iso)4e. Unfortunately, we are unable to pursue these experiments at this time.

      (2) Given that nCBP-2 can complement yeast eIF4E mutants, the authors may introduce viral infectious clones into yeast systems expressing nCBP-2 variants to determine whether nCBP-2 supports viral translation. This approach could further clarify whether the L51F mutation (and mutations in eIF4E, iso4E1, so4E2) abolishes their roles as viral susceptibility factors.

      This is an intriguing suggestion, but challenging for a few reasons. First, an infectious clone of CBSV Naliendele isolate does not exist, although we have tried to construct one, without success. There is also no guarantee such a clone could infect yeast. We are aware of yeast being used as a surrogate host for a few plant viruses, such as Tomato bushy stunt virus and Brome mosaic virus but are unaware of a similar system for any potyvirid. Developing such a system would undoubtedly require a significant investmentbeyond the scope of this manuscript.

      (3) Phenotypes of all mutant lines with and without virus inoculation in Table 1 should be presented.

      Photos of un-challenged mutants are included in supplemental figures. Representative storage root symptoms for all lines have now been included in the supplemental figures as well.

      (4) In Figure 1c, the results of viral accumulation assays should be presented for additional mutant lines beyond ncbp-1, ncbp-2, ncbp-1 nCBP-2 K45_L51del, and ncbp-1 ncbp-2, particularly eif(iso)4e-1 & eif(iso)4e-2#172 and eif(iso)4e-1 & eif(iso)4e-2#92.

      We have previously found that subtle reductions in visible disease do not always translate to clear differences in viral titer when analyzed by qPCR (Gomez et al., 2018). As such, we focused on lines with the strongest phenotypes in viral titer experiments.

      (5) Inconsistently, the ncbp-1 nCBP-2 K45_L51del line showed reduced symptoms compared to wild-type in Figures 1a and 1b, yet viral accumulation levels were comparable to wild-type in Figure 1c. The explanations for this discrepancy are required.

      Please see our response to (4).

      (6) Root phenotypic data for all mutant lines shown in Figure 1d should be presented.

      Please see our response to (3).

      (7) In Figure 2b, GST control pulldowns showed detectable proteins. This background signal requires explanation.

      It is not uncommon to see weak signal in bead or tag-only negative control pulldown and IP reactions. Importantly, we see strong enrichment of VPg relative to these controls in our experimental samples.

      (8) Contrary to the abstract's implication, Figure 5c indicates that the L51F mutation impacts yeast growth, suggesting potential pleiotropic effects of this mutant.

      We interpret the results to be that nCBP2 L51F does not fully complement the yeast eif4e mutation, rather than nCBP2 L51F impacts yeast growth.

      (9) In vivo protein-protein interaction assays (e.g., co-immunoprecipitation) should be performed to complement the in vitro GST pull-down data in Figure 6.

      We appreciate the desire for these experiments and agree that they would bolster our Y2H and pulldown data. Unfortunately, we are not able to complete these experiments at this time, so have been careful not to over interpret the data.

      (10) Since the AteIF(iso)4E L28F mutant fails to complement yeast, the authors should test whether introducing the L51F mutation into other family members (eIF4E, iso4E1, iso4E2, nCBP1) preserves their yeast complementation capacity.

      This has now been done for additional cassava eIF4E-family proteins.

      (11) Indicate molecular weight sizes in all Western blots.

      This was done. As differences in buffer formulations between gel types can affect the mobility and thus apparent molecular weight of markers, we have provided in the methods section SDS-PAGE gel chemistries and specific protein ladders used in this study. Importantly we note in our experience that certain markers, in relation to proteins of interest, can vary up to 15 kDa between gel chemistries.

      (12) Figures 4d,e are not provided in the paper. Based on the content of the paper, the description in the paper likely corresponds to Figures 5c, d.

      Thank you for catching this error, this has now been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially valuable modeling study on sequence generation in the hippocampus in a variety of behavioral contexts. While the scope of the model is ambitious, its presentation is incomplete and would benefit from substantially more methodological clarity and better biological justification. The work will interest the broad community of researchers studying corticalhippocampal interactions and sequences.

      Thank you very much for your comments. We are very encouraged by your positive feedback. We have revised our manuscript to clarify our model, strengthen its biological justification, and make it more accessible to a broader audience.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ito and Toyozumi proposes a new model for biologically plausible learning of context-dependent sequence generation, which aims to overcome the predefined contextual time horizon of previous proposals. The model includes two interacting models: an Amari-Hopfield network that infers context based on sensory cues, with new contexts stored whenever sensory predictions (generated by a second hippocampal module) deviate substantially from actual sensory experience, which then leads to hippocampal remapping. The hippocampal predictions themselves are context-dependent and sequential, relying on two functionally distinct neural subpopulations. On top of this state representation, a simple Rescola-Wagner-type rule is used to generate predictions for expected reward and to guide actions. A collection of different Hebbian learning rules at different synaptic subsets of this circuit (some reward-modulated, some purely associative, with occasional additional homeostatic competitive heterosynaptic plasticity) enables this circuit to learn state representations in a set of simple tasks known to elicit context-dependent effects.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      The idea of developing a circuit-level model of model-based reinforcement learning, even if only for simple scenarios, is definitely of interest to the community. The model is novel and aims to explain a range of context-dependent effects in the remapping of hippocampal activity.

      Weaknesses:

      The link to model-based RL is formally imprecise, and the circuit-level description of the process is too algorithmic (and sometimes discrepant with known properties of hippocampus responses), so the model ends up falling in between in a way that does not fully satisfy either the computational or the biological promise. Some of the problems stem from the lack of detail and biological justification in the writing, but the loose link to biology is likely not fully addressable within the scope of the current results. The attempt at linking poor functioning of the context circuit to disease is particularly tenuous.

      We thank the reviewer for the insightful comments.

      To better characterize our model, we added formal descriptions of each task setting and explicitly specified the sources of uncertainty. We revised the schematic figures in Figure 1 to more clearly illustrate our model. An important revision is that we now distinguish between stimulus prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping. SPEdriven remapping is triggered by mismatches between actual sensory stimuli and those predicted from past history and serves to update the current contextual state or to create a new one. In contrast, RPE-facilitated remapping is more likely to occur when executing an action planning sequence associated with recent negative reward prediction errors, possibly due to environmental changes, and promotes exploration of alternative planning sequences.

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)–driven remapping and reward prediction error (RPE)–facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E).”

      In addition, we added Figure 2C-E to clarify the neural representations of external stimuli and contextual states in the X module, as well as the neural representations within the H module. We also clarified the purpose of each model component and discussed plausible biological implementations to justify our modeling choices. Furthermore, we added a schematic illustration of our results related to psychiatric disorders in Figure 5B and revised the corresponding section of the manuscript to explicitly frame these results as a computational hypothesis. We also expanded the discussion to relate our findings to existing computational psychiatry models (see point-bypoint responses below).

      We believe that these revisions have improved the clarity of our model and broadened its accessibility to a wider audience.

      Reviewer #2 (Public review):

      Summary:

      Ito and Toyoizumi present a computational model of context-dependent action selection. They propose a "hippocampus" network that learns sequences based on which the agent chooses actions. The hippocampus network receives both stimulus and context information from an attractor network that learns new contexts based on experience. The model is consistent with a variety of experiments, both from the rodent and the human literature, such as splitter cells, lap cells, and the dependence of sequence expression on behavioral statistics. Moreover, the authors suggest that psychiatric disorders can be interpreted in terms of over-/under-representation of context information.

      We appreciate it for carefully reading the manuscript and finding the novelty and significance in our work.

      Strengths:

      This ambitious work links diverse physiological and behavioral findings into a self-organizing neural network framework. All functional aspects of the network arise from plastic synaptic connections: Sequences, contexts, and action selection. The model also nicely links ideas from reinforcement learning to neuronally interpretable mechanisms, e.g., learning a value function from hippocampal activity.

      Weaknesses:

      The presentation, particularly of the methodological aspects, needs to be majorly improved. Judgment of generality and plausibility of the results is hampered, but is essential, particularly for the conclusions related to psychiatric disorders. In its present form, it is unclear whether the claims and conclusions made are justified. Also, the lack of clarity strongly reduces the impact of the work in the larger field.

      We appreciate the reviewer’s valuable feedback. In the revised manuscript, we have improved the presentation of the methodological aspects by providing a more intuitive and general explanation of the model framework and training procedure. We also rewrote the section on psychiatric implications to more clearly explain how dysfunction in contextual inference occurs in our model. These revisions enhance both the clarity and plausibility of our conclusions.

      More specifically:

      (1) The methods section is impenetrable. The specific adaptations of the model to the individual use cases of the model, as well as the posthoc analyses of the simulations, did not become clear. Important concepts are only defined in passing and used before they are introduced. The authors may consider a more rigorous mathematical reporting style. They also may consider making the methods part self-contained and moving it in front of the results part.

      Thank you for raising the important point.

      To improve readability, we have updated Figure 1 to more clearly illustrate the main model structure and its adaptation to individual use cases. Additionally, we have moved the previous Figure 6 (now Figure S1) to an earlier point in the Results to facilitate understanding of the methodological flow. Method section is also revised to explain the algorithmic structure indicated in Figure S1. These revisions make the methods more self-contained and easier to follow.

      In the revised manuscript, we have clarified that our model is qualitatively related to the Bayesadaptive reinforcement learning framework (Guez et al., 2013) as follows.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from agent’s lack of experience or due to abrupt environmental changes. Once a context selector X infer the hidden state, the sequence composer H generates episodic sequences that correspond to trajectories in a search tree, each branch representing possible action–outcome sequences. Just as Monte Carlo tree search explores potential future paths to evaluate expected rewards, H produces hippocampal sequences that simulate future states and rewards based on its learned connectivity. In this way, X defines the context that anchors the root of the tree, while H expands the tree through replay or planning, thereby our model provides a simplified algorithmic implementation model-based reinforcement learning via tree search planning.”

      (2) The description of results in the main text remains on a very abstract level. The authors may consider showing more simulated neural activity. It remains vague how the different stimuli and contexts are represented in the network. Particularly, the simulations and related statistical analyses underlying the paradigms in Figure 4 are incompletely described.

      Thank you for pointing this out.

      In the revised manuscript, we have added explicit examples of simulated neural activity. Specifically, we added new figures in Figure 2C–E and showed representative activity patterns from both Context selector (X) and Sequence composer (H). We also clarified the distinction between activity in the stimulus domain (externally driven) and the context domain (internally inferred states)

      “Figure 2C illustrates an example of both the environmental state transition and the corresponding contextual state transition of an agent. The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states … are represented in the stimulus domain and the contextual states … are represented in the context domain. … In the example transition shown in Figure 2C, the agent selected an environmental state transition from S2 to S4 in the 2nd, 5th, and 8th trials, which corresponds to a contextual state transition from X2β to X4β in the X module. However, because this transition was not rewarded, no synaptic potentiation occurred among hippocampal neurons. Subsequently, in the 11th trial, the agent attempted an environmental state transition from S2 to S5, corresponding to the transition from X2β to X5β in the contextual states.

      The agent received a reward at S5, and the corresponding hippocampal sequence was strengthened, enabling the agent to acquire the alternation task in the following trials (Figure 2E).”

      (see point-by-point responses below).

      We also added a detailed explanation of our results in Figure 4 as follows.

      “We consider a simplified environment of a probabilistic cueing paradigm (Ekman et al., 2022). In this study, two auditory contextual cues probabilistically predicted distinct visual motion sequences, and fMRI decoding was used to examine the frequency of hippocampal replay. We simplified this task as shown in Figure 4A. ”

      “... This result replicates Ekman et al. (2022), who showed that the probability of the contextual cues is reflected in the statistically significant differences in hippocampal replay probability in humans (Figure 4F).”

      “F, Our model behavior is similar to the human fMRI result of the cue-probability-dependent hippocampal replay (Ekman et al., 2022). Paired sample t-test. **P<0.01.”

      We believe that these revisions make the model description and simulation results more concrete and easier to interpret.

      (3) The literature review can be improved (laid out in the specific recommendations).

      Thank you for pointing this out. We revised the literature review to the best of our ability.

      (4) Given the large range of experimental phenomenology addressed by the manuscript, it would be helpful to add a Discussion paragraph on how much the results from mice and humans can be integrated, particularly regarding the nature of the context selection network.

      Thank you for your suggestion.

      In the revised manuscript, we added a new paragraph in the Discussion explicitly addressing how results from mice and humans can be integrated.

      “Our model is a functionally modular account of the cortical regions and hippocampus, enabling it to capture experimental findings across species. While hippocampal activity in rodents has been extensively characterized in terms of spatial coding, human hippocampal representations are more often non-spatial and episodic-like (Bellmund et al., 2018; Eichenbaum, 2017). For episodic memory to support flexible behavior, it would be beneficial to retrieve each episode in a contextdependent manner. The episodic contents may vary across species and individuals, yet the fundamental computations—estimating the current context from external stimuli and their history, and flexibly updating this estimate via prediction errors—are likely conserved. Holding context information until the contextual prediction error is detected is analogous to the belief state in model-based reinforcement learning, which is known to improve performance under partially observable conditions (POMDPs) (Kaelbling et al., 1998). Our model provides a simple algorithmic implementation of this principle.”

      (5) As a minor point, the hippocampus is pretty much treated as a premotor network. Also, a Discussion paragraph would be helpful.

      Thank you for pointing this out.

      We define action as a transition from one environmental state to another, and transition-coding hippocampal neurons are used for action-planning. Because our model does not incorporate errors in transitions (actions), the generated hippocampal sequences are perfectly correlated with the executed transitions (actions). However, we acknowledge that computations in the brain are more complex, with contributions from other regions such as the premotor network and the basal ganglia. To clarify this, we added formal representations of state transitions (action) in each task and the following sentences to the manuscript.

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons (Materials and Methods). Note that in the real brain, not only hippocampus but also the premotor cortex and the basal ganglia contribute to action planning and execution (Hikosaka et al., 2002). Here, however, we focus on how simplified planning sequences are learned and composed in a context-dependent manner.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state without errors in action.”

      Reviewer #3 (Public review):

      Summary:

      This paper develops a model to account for flexible and context-dependent behaviors, such as where the same input must generate different responses or representations depending on context. The approach is anchored in the hippocampal place cell literature. The model consists of a module X, which represents context, and a module H (hippocampus), which generates "sequences". X is a binary attractor RNN, and H appears to be a discrete binary network, which is called recurrent but seems to operate primarily in a feedforward mode. H has two types of units (those that are directly activated by context, and transition/sequence units). An input from X drives a winner-take-all activation of a single unit H_context unit, which can trigger a sequence in the H_transition units. When a new/unpredicted context arises, a new stable context in X is generated, which in turn can trigger a new sequence in H. The authors use this model to account for some experimental findings, and on a more speculative note, propose to capture key aspects of contextual processing associated with schizophrenia and autism.

      We thank the reviewer for this summary of our model.

      We would like to clarify that the hippocampal Sequence composer (H) is a recurrent network that iteratively composes the next state and the associated sensory stimuli in the sequence based on the current contextual state.

      Strengths:

      Context-dependency is an important problem. And for this reason, there are many papers that address context-dependency - some of this work is cited. To the best of my knowledge, the approach of using an attractor network to represent and detect changes in context is novel and potentially valuable.

      Weaknesses:

      The paper would be stronger, however, if it were implemented in a more biologically plausible manner - e.g., in continuous rather than discrete time. Additionally, not enough information is provided to properly evaluate the paper, and most of the time, the network is treated as a black box, and we are not shown how the computations are actually being performed.

      We thank the reviewer for suggesting an important direction for future work. The goal of this research is to develop a minimal, functionally modular neural circuit model that provides general insights into how context-dependent behavior can be realized across species, including humans. To simplify our model, we only considered discrete-time environmental states, where the exact length of the time step depends on each environment. Extending the model to a more biologically plausible, continuous-time framework is a promising direction for future work, such as using continuous-time modern Hopfield networks and synfire chains. We modified the Discussion section to clearly point out this direction.

      “... the resolution at which our model should distinguish different contextual states, including the stimulus resolution and time resolution, is hand-tuned in this work. While we used an abstract, gridlike state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, … In realistic, continuously changing environments, such resolutions should be adjusted autonomously. Introducing continuous and hierarchical representations with multiple levels of spatial and temporal resolution would facilitate such adjustments, potentially through mechanisms such as modern Hopfield networks (Kurotov and Hopfield, 2020) or synfire-chain–based hippocampal sequence generation (Abeles, 1982; Diesmann et al., 1999; Shimizu and Toyoizumi, 2025; Toyoizumi, 2012), but this is beyond the focus of the current study”

      Also, we would like to emphasize that our model is not treated as a black box. To improve the understandability, we have majorly revised Figures 1 and 2 to include additional details illustrating the neural activity and the internal computational mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments and suggestions for improvement:

      (1) Formal link to model based RL is unclear: a core feature of inference is the role of uncertainty in modulating computation and corresponding circuit dynamics, in particular defining expected and unexpected degree of errors; as far as I understand the degree of tolerable errors within a context is defined by the size of the basin of attraction of the context module (which is dependent on number of items and the structure of correlations across patterns) and in no obvious way affected by sensory uncertainty (unless the inputs from H serve that purpose in a more indirect way). Similarly, most experiments are deemed to have deterministic (unambiguous) maps between sensory inputs and world state (although how the agent's state relates to environmental state is more complex and not completely clear based on the existing text).

      Thank you for raising this important point. Our model bears conceptual similarities to model-based RL frameworks, for example, the optimal-inference formulation that underlies Monte Carlo Tree Search (Guez et al., 2013), as we now clarify in the revised manuscript. These similarities, however, are qualitative rather than quantitative. In particular, the error thresholds that separate expected from unexpected outcomes are manually specified in our model, but their exact values do not appreciably influence the simulation results.

      Concretely, the heuristic threshold for SPE-driven remapping (𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub>) is set to 5 bits, allowing for small miss-convergence during recall in the Amari–Hopfield model. For RPE-facilitated remapping, the threshold is set to 𝜃<sub>𝑁𝐺</sub> = 0.7, making the agent sufficiently sensitive to abrupt environmental changes and enabling it to explore some candidate contexts after RPE-facilitated remapping. This simple thresholding scheme is adequate for our largely deterministic simulation setting, where contextual switches are rare and occur abruptly in an otherwise stable and unambiguous environment.

      Importantly, our goal in this work was not to achieve Bayesian optimality. Mice and likely humans in certain settings often deviate from optimal inference. Instead, we focus on the qualitative remapping-related processes that support goal-directed planning following epistemic errors. We have clarified this scope in the revised manuscript.

      “In the framework of reinforcement learning, our model can be mapped onto a Bayesian-adaptive model-based architecture in which contextual state serves as the root of Monte Carlo tree search (Guez et al., 2013) in a simple, largely stable environment with noiseless and unambiguous sensory stimuli, and only occasional abrupt changes. In this setup, prediction errors arise from the agent’s lack of experience or due to abrupt environmental changes. … However, these conceptual similarities are qualitative rather than quantitative. The goal of this work is not to achieve Bayesian optimality, but rather to show qualitative remapping-related processes that support goal-directed planning following epistemic errors.”

      “Note that we set the remapping threshold 𝜃<sub>𝑟𝑒𝑚𝑎𝑝</sub> = 5 bits to allow for small miss-convergence during recall in the Amari–Hopfield model.”

      “Note that we set 𝜃<sub>𝑁𝐺</sub> as 0.7 to make the agents sufficiently sensitive to abrupt environmental changes and enable exploring some candidate contexts after RPE-facilitated remapping.”

      (2) Improvement: start describing each task specification in explicit model-based RL terms, then explain how the environmental specification translates into agent operations. Be explicit about what about the process is inferential, in particular, sources of uncertainty.

      Thank you for this important suggestion. Following your recommendation, we revised the manuscript to describe each task explicitly in model-based RL terms. For each task, we now identify the relevant sources of uncertainty, which arise either from imperfections in the agent’s internal model of the environment or from occasional abrupt switches in task rules. We also explain how the agent infers the hidden state from experience to construct an appropriate context representation, enabling the model to perform the task successfully.

      (3) A lot of seemingly arbitrary model choices need additional computational and biological justification; the description of the process is fundamentally an algorithmic one, which includes a lot of if-then type of operations: the dynamics of different elements of the circuit switch between "initialization to landmark/other", "error detected/not", different forms of plasticity on/off etc and it is not discussed in way how this kind of global coordination of different processes is supposed to be orchestrated biologically; e.g. as far as I understand the sequential structure in H activity is largely hardcoded rather than an emergent property of the learning+neural dynamics.

      Thank you for this important suggestion. We have made a concerted effort to clearly describe the biological context and the relevant literature motivating each of our algorithmic assumptions. Notably, as highlighted in Fig. 1F, we emphasize that the sequential structure in H activity emerges as a consequence of the agent’s exploration and learning. We also explain how the two remapping mechanisms concatenate sequence segments to support long-term planning and to predict both stimuli and rewards.

      About Fig. 1F

      “At the beginning of learning, hippocampal segments are not connected, and H yields only short sequences that generate immediate actions and short-term predictions. As learning continues, the three-factor Hebbian plasticity rule concatenates these segments, thereby creating longer sequences that reflect the task structure (Figure 1F).”

      About “initialization to landmark/other,”

      “While the history-based initialization was introduced to select contextual state based on the history input from H (episodic), the landmark-based initialization was introduced to terminate the episodes that would otherwise continue indefinitely. Biologically, the landmark-based initialization corresponds to the operation of anchoring a contextual state to salient environmental landmarks - such as an animal’s nest - that serve as clear reference points.”

      About “error detected/not,”

      “Based on the source of prediction errors, we consider two types of remapping: sensory prediction error (SPE)-driven remapping and reward prediction error (RPE)-facilitated remapping (Figure 1C). SPE-driven remapping is triggered when the mismatch between the predictive inputs from H to X and externally driven sensory inputs exceeds a threshold (see Materials and Methods), causing X to either transition to a different contextual state or form a new one (Figure 1D). RPE-facilitated remapping is more likely to be triggered when the agents execute an action plan following a hippocampal sequence marked by a no-good indicator. The no-good indicator indicates that the action plan, i.e. the hippocampal sequence, has recently been associated with negative reward prediction errors, possibly due to environmental changes (see Materials and Methods). It then facilitates the exploration of alternative hippocampal sequences (Figure 1E). ”

      About “different forms of plasticity on/off”

      “We used different learning rules for the intra-hippocampal synaptic weights depending on withinepisodic and between-episodic segments.”

      “Within-episodic connections, i.e., state-coding to transition-coding synapses, are constantly updated in a reward-independent manner … This modeling is inspired by behavioral time scale plasticity in the hippocampus (Bittner et al., 2017), in which synaptic potentiation occurs for events that are close in time regardless of reward, and such plasticity is believed to support the formation of place cells, etc..”

      “Between-episodic connections, i.e., transition-coding to state-coding synapses, are constantly updated in a reward-dependent manner … This is supported by the finding that dopaminergic neuromodulation gates LTP, enabling preferential consolidation of reward-associated experiences (Lisman and Grace, 2005; Takeuchi et al., 2016).”

      (4) Improvement: Justify individual design choices by biology whenever possible; in the absence of such justification, provide at least a computational rationale for each such model choice. Additional justification for the neural substrate of different prediction errors.

      Thank you for pointing this out. Following the advice, we have added the computational objectives behind each algorithmic component in addition to the biological motivations described above. In particular, we have completely updated Fig. 1 to help readers better understand the key remapping mechanisms in our algorithm: SPE-driven and RPE-facilitated remapping.

      About the Amari-Hopfield model

      “We employ the Amari–Hopfield model because it allows multiple contexts to be stably maintained and selected in response to stimuli and can be trained via Hebbian plasticity. We assume that similar computations are carried out in prefrontal and entorhinal cortical circuits in the brain.” “As one possible biological implementation, we consider that Context selection in X as the brainwide evoked potential during which bottom-up information may be integrated with top-down signals to select the current context (Mohanty et al., 2025). In this case, it takes several hundred milliseconds for the contextual states in X to settle (Massimini et al., 2005).”

      About the default matrix

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      About state-coding neurons and transition-coding neurons

      “The state-coding neurons receive input from X and represent the current contextual state, while the transition-coding neurons send output to X and predict the next contextual state after an action ... One possible biological grounding for this functional separation is that entorhinal cortex provide contextual inputs to CA3, and CA3 and CA1 generates predictions of next state through its recurrent architecture (Chen et al., 2024).”

      About the no-good indicator

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping (see RPE-facilitated remapping section) that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      (5) In particular, the temporal scale at which processes unfold with reference to behavioral time scale actions is fundamentally unclear: what determines the time scale of a sequential element? What stitches them together? What is the temporal relationship between H and X operations? At what time scale do actions happen in terms of those operating scales? How does this align with what is known about hippocampal dynamics during behavior?

      (6) Improvement: make the time scales of different aspects of the process explicit in the text, potentially with additional graphic support.

      Thank you for the questions and suggestions. In this work, we model the agent’s behavior in an abstract grid-world environment with discrete time steps, as is common in classical RL. At each time step, the agent observes a sensory stimulus, makes a plan, and executes an action based on it. The action induces a state transition in the environment. Accordingly, the model includes a single fundamental timescale: the environmental (behavioral) time step.

      The modeled brain dynamics in both X and H are similarly locked to this environmental clock. As clarified in Fig. 1F, each sequence segment corresponds to one behavioral time step. These segments are then chunked based on reward events, enabling longer-horizon planning and prediction.

      The agent’s cognitive operations at each behavioral time step are summarized in Fig. S1. Briefly, the agent infers the contextual state X from the current stimulus and its stimulus history, generates a sequential action plan H with predictions using chunked sequence segments, and then follows the plan when it is sufficiently promising. In addition, when sensory or reward prediction errors occur, the agent reorganizes the synaptic-weight parameters of the context selector and sequence composer. Once the agent becomes familiar with the environment, H typically generates an extended action sequence along with predictions of future stimuli and the resulting reward. The agent then executes this sequential plan, bypassing step-by-step context estimation by X, until a prediction error triggers remapping.

      The revised manuscript includes the following additions.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli. The model operation relies on the environmental (behavioral) time step. At each time step, the agents perform contextual state estimation by Context selector and activate a corresponding hippocampal neuron. Then, this hippocampal neuron initiates sequential activity based on hippocampal synaptic connectivity. Each hippocampal sequence represents a planned course of action and is used to predict a series of external stimuli. … The hippocampal sequence from which actions are generated is updated upon a reward. After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      (7) As far as I understand it, the existence of splitter cells is directly inherited from the task specification, and to some extent the same can be said about the lap cells; please explain what can be understood from the model simulations that goes beyond what was put into the inputs/reward function for each experiment. Emphasize numerical results that are counterintuitive or where additional predictions about the dynamics come directly from simulating the model but would have been less obvious beforehand.

      The existence of splitter cells in our model is not inherited from the task specification. Instead, it emerges directly from the hippocampal module retaining sensory history (namely, whether the agent approached from the left or right arm), independent of reward structure or other task details. When sensory history is removed from the sequence composer (and, consequently, from the context selector), splitter-cell representations disappear.

      To develop lap-cell representations, immediate sensory history alone is not sufficient. The sequence composer must chunk episodic segments based on rewards to support sufficiently long action plans (i.e., history dependence) that span the multiple laps required by the task. The planning horizon - the length of action sequences - typically increases as animals learn a task. This progressive development of hippocampal sequences and their dependence on reward yields experimentally testable predictions. Notably, as we clarified in Fig. S2, the required sensory history length must also be learned adaptively: if it is too short, the agent cannot solve the task, whereas if it is too long, learning becomes unnecessarily slow.

      In the revised manuscript, we explicitly described the emergent process of splitter cells and lap cells as follows.

      About splitter cells

      “A second contextual state at S2, X2β, was generated through SPE-driven remapping at the second visit of S2 (second trial) due to history mismatch… In our model, the transition-coding neurons exhibit right/left turn-specific firing at S2 after learning is complete (Figure 2E, I), replicating the emergence of splitter cells.”

      About lap cells

      “the task environment changes again and the agents are rewarded for two laps, …. Either the shortest transition, ..., or the one-lap transition, …, is no longer rewarded, which triggers another RPE-facilitated remapping and exploration. During exploration, a history mismatch occurs …, and the contextual states for the second lap … are generated. Finally, the rewarded transition of contextual states and corresponding sequence… is reinforced (Figure 3B).”

      “This task can also be solved by simply preparing temporal contexts with three steps of sensory history (n=3), which is the minimal number to solve this task. (see Materials and Methods for Model-free learning). However, it takes much longer to find the correct transition for solving the 1-lap task than our model because it involves an excessive number of states (Figure S2).”

      “As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent.”

      (8) The partitioning of H subpopulation into current input vs predictive subpopulations seems to fundamentally deviate from known CA1 properties like theta phase processing, where the same neurons encode information about recent past, present, and future at different moments in time within a theta cycle. The existence of such populations (especially since they come with distinct plasticity mechanisms and projection patterns) seems like a strong avenue for validating the model experimentally.

      (9) Improvement: biologically justify the two subpopulations, discuss neural signatures of this distinction that could be used to identify such neurons in experiments

      We thank the reviewer for bridging our model with biological circuits.

      First, we would like to clarify that we do not claim that our H module corresponds to CA1 specifically.

      Rather, we assume that within the broader hippocampal loop (EC–DG–CA3–CA1–EC), subpopulations emerge that preferentially encode the current contextual states and the transitions to the next contextual states. This assumption reflects our hypothesis that the hippocampus implements a mechanism for predicting the next context given the current one. Importantly, this functional separation does not contradict known theta-phase coding in which the same neurons can represent past, present, and future information at different phases of the theta cycle.

      As a possible biological grounding, we particularly emphasize the CA3–CA1 projection. Recent studies have shown that CA1 representations exhibit a temporal delay relative to CA3 activity (Chen et al., 2024), suggesting a circuit-level mechanism by which predictions of upcoming contextual states may be computed based on the current context. In this framework, state-coding and transition-coding functions could be assigned to CA3 and CA1, or dynamically expressed through their interactions. Based on our model, we make testable experimental predictions. Specifically, we predict that neural representations in CA3 and CA1 should precede contextual switching in tasks such as alternation or multiple-lap tasks, and that perturbing CA3–CA1 computations would impair task performance.

      Please note, however, that our model does not characterize the sequence composer’s activity at such fine-grained neuronal timescales. Instead, we model the computation it performs in abstract time steps corresponding to the grid states (e.g., while the animal is at a corner of the maze).

      We have added these points to the Discussion to clarify the biological interpretation and to suggest potential experimental validations of the proposed subpopulation distinction as follows.

      “Our model posits that the Sequence composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state. Consistent with this idea, the temporal lag in CA3→CA1 transmission suggests a functional gradient in which CA3 represents present-oriented information while CA1 carries more futureoriented predictions (Chen et al., 2024), and neurons in both CA3 and CA1 exhibit action-driven remapping and encode action-planning signals (Green et al., 2022). Our framework, therefore, predicts that changes in CA3→CA1 population activity precede behavioral switching in contextdependent alternation in Figure 2 or multi-lap tasks in Figure 3, and perturbation of this input will degrade the behavioral performance.”

      “While we used an abstract, grid-like state space with discrete time, an important direction for future work is to model its activity at finer-grained neural timescales, such as theta cycles (Foster and Wilson, 2007; Wikenheiser and Redish, 2015).”

      (10) The flexibility of the new solution in terms of learning contexts with variable temporal horizons seems an important feature of the model, but one poorly demonstrated in the existing numerical experiments. Could more concrete model predictions be generated by designing an experiment targeted specifically for such scenarios?

      Thank you for raising this point.

      As we showed in Figure S2, in environments with variable temporal horizons, our model performs better than model-free learning (Q-learning) that incorporates temporal context.

      To further demonstrate this point, we added a new task in Figures 3G and H, in which the 1-lap task and the 2+ lap task are alternated. Our model exhibits rapid switching between these tasks, regardless of differences in sequence length or temporal horizon. We added the following text.

      “To demonstrate the advantage of our model in a rapidly switching task that requires different history lengths, we show that an agent trained on both the 1-lap and 2-lap tasks can flexibly alternate between them in a reward-dependent manner (Figure 3G), selectively engaging hippocampal sequences of different lengths according to the current task context (Figure 3H). Together, these results illustrate how hippocampal lap-like representations emerge through learning and enable flexible context switching across tasks with distinct temporal demands.”

      In such a scenario, a subjective representation of laps in the hippocampus is the key to solving the task. As we responded to points (8) and (9), neural representations, especially in CA1, are expected to bifurcate between the 1-lap and 2-lap conditions, and this bifurcation would precede and critically govern the animal’s behavior.

      (11) I found figures confusing/uninformative, specifically in making it explicit what is external task structure and what is the agent's internal representation of it; as a result it is not clear what of the results is trivially inherited from the task specification and what is an emergent property of the model; e.g. Figure 2A described external transition specification according to world model but it is unclear to me if Figure 2B shows the ideal agent state representation across context or a graphical summary of what the agent actually learned from the sensory experience described in A; from the text. Figure 2F is supposed to describe a property of the emergent representation, but what is shown is another cartoon... etc.

      We appreciate the reviewer’s insightful comments regarding the clarity of our figures.

      To clarify the neural representation of the agent and how it links to the action, we have revised Figure 2 and the descriptions in the main text.

      First, Figure 2A schematically depicts the external stimulus as being determined solely by the task. In this task, animals must keep track of the immediately preceding state (S1 or S3) to correctly choose between S4 and S5 upon reaching S2. Without such a memory of prior states, an agent would have no basis for distinguishing which action is appropriate, and therefore cannot selectively move to S4 and S5. Therefore, any reinforcement learning model that does not incorporate at least a onestep state history cannot solve the task.

      To solve the task, S2 must be represented as two distinct contextual states depending on the previous state. Figure 2B therefore illustrates an example of internal representation that separates S2 into X2α and X2β: transitions from S1 to S2 are internally represented as X1 → X2α, whereas transitions from S3 to S2 are represented as X3 → X2β. Although the sensory inputs provided to the model correspond only to the task-defined states in Figure 2A, the combination of the sensory input with contextual states in Context selector successfully achieves this contextual representation of X2α and X2β (see Figure 2C, D). Also, the hippocampal neurons in Sequence composer indicate the next contextual states given the current contextual states, i.e., X2α→X4 and X2β→X5 (see Figure 2E). Thus, combining Context selector and Sequence composer successfully achieves the task requirement indicated in Figure 2B.

      Regarding the reviewer’s concern that Figure 2F (now Figure 2I) appeared to be another cartoon, we have revised the panel to clearly display our result. These results demonstrate that some hippocampal neurons in our model encode the transition from X2α→X4 and X2β→X5. The updated figure clarifies that our hippocampal neurons functionally work similarly to the splitter cells in Wood et al., 2000.

      (12) Improvement: use visuals and captions. Make it clear what is a cartoon, what is a model specification, and what is an actual result. Replace/complement algorithmic cartoons in Figure 1 with a description of the actual result.

      Thank you for raising this point.

      As we explained in the previous point (11), we added Figure 2D and Figure 2E for displaying the actual neural activity, and the corresponding annotations in the manuscript, e.g, X2α. Also, we revised the cartoons of our model description in Figure 1 to better describe our model structure.

      (13) Map between model and experimental results is poorly justified: in particular the nature of sensory inputs is not clearly specified, and how the experimental manipulations (e.g. MEC input disruption) map into model manipulations is not intuitive and no justification is provided for the choices beyond that the model ends up matching the experiment by some metric. Also, not clear why a tradeoff of neural resources as implemented in the model makes sense for the clinical case and how this hypothesis deviates from alternative Bayesian accounts invoking imperfections in inference (e.g. relative strength of priors vs likelihood as reported by e.g. P.Series's group, or issues with hierarchical inference more generally along R.Jardri's work).

      Thank you for raising this important point. We have revised the manuscript to clarify the mapping between model components, sensory inputs, and the experimental manipulations, and to further justify the clinical interpretation.

      About sensory inputs

      First, each environmental state in our model is represented as a binary (0/1) pattern. We have added Figure 2D to explicitly illustrate these sensory stimuli and how they are provided to the context-selection module.

      About mapping between model components and brain circuits

      Functionally, we speculate that Context selector (X) corresponds to computations carried out in the prefrontal cortex (PFC) and entorhinal cortex (EC), and Sequence composer (H) corresponds to the hippocampus. Inputs from the PFC are thought to reach the hippocampus via the EC. Therefore, suppression of MEC→hippocampus inputs in Sun et al. (2020) naturally maps onto blocking a subset of the inputs from X to H in our model.

      We clarified this correspondence in the revised manuscript and now explicitly justify why this manipulation matches the biological experiment.

      Relation to Bayesian theories

      We agree that Bayesian accounts have provided influential explanations of psychiatric symptoms by invoking imperfections in inference, such as imbalances between priors and likelihoods (e.g., work by P. Series and colleagues) or disruptions in hierarchical inference (e.g., work by Jardri and others). Our model complements these frameworks by explicitly incorporating sequential structure and context remapping. Rather than treating priors as static or fixed-weight quantities, our model allows contextual representations to be dynamically reorganized based on prediction errors over time. In the SZ-like condition, we assume that an excessively expanded context domain increases the influence of internally generated contextual predictions, causing them to override sensory inputs and resulting in maladaptive behavior with hallucination-like percepts. Importantly, this effect reflects not only stronger priors but also excessive generation and competition of contextual states, leading to unstable and non-reproducible remapping. In contrast, in the ASD-like condition, sensory-weighted context representations limit the ability to flexibly incorporate newly introduced contexts, causing the model to perseverate on an initially learned context and thereby reproduce inflexible behavior. We added a schematic illustration in Figure 5B and expanded the Discussion to clarify this point.

      “When the stimulus domain is relatively underrepresented, the reconstruction of contextual state in the Amari-Hopfield network tends to infer contextual states based on the context domain rather than the stimulus domain. Consequently, it converges to an incorrect attractor that is not assigned to the current environmental state, thereby increasing perceptual error for external stimuli (hallucination-like effects). Moreover, SPE-driven remapping and the corresponding synaptic plasticity occur more frequently. In contrast, when the stimulus domain is overrepresented, the Amari-Hopfield network rarely assigns multiple contextual states to a given environmental state, leading to an overuse of default contextual states (see Figure 5B and Materials and Methods). ”

      “Our model also provides an algorithmic-level account of psychiatric symptoms by changing the relative weighting of sensory-encoding versus context-coding neurons. This implementation is analogous to Bayesian theories linking priors to psychiatric symptoms. In SZ, hallucinations and delusions have been modeled as arising from overly strong top-down priors (Powers et al., 2016) or circular inference, which leads to erroneous belief formation (Jardri et al., 2017; Jardri and Denève, 2013). In our model, we used an underrepresented stimulus domain to increase the relative influence of internally generated context representation in context selection. Crucially, this implementation does not simply strengthen priors but induces excessive generation and competition of contextual states, leading to frequent yet non-reproducible remapping of hippocampal contextual activity and a failure of learning to converge despite repeated experience. In ASD, it has been argued that abnormally high sensory precision reduces the updating of expectations (Karvelis et al., 2018) or leads to sensory-dominant perception, which has been interpreted as weak priors (Angeletos, Chrysaitis, and Seriès, 2023; Lawson et al., 2014; Pellicano and Burr, 2012). In our framework, we used an overrepresented stimulus domain to increase the relative influence of external stimulus representations in context selection. Importantly, our model captures not only sensory-dominant processing emphasized in previous studies, but also a distinctive impairment in flexibly utilizing newly introduced contexts, reflecting a failure of context reconstruction and resulting in persistent inflexible behavior. Thus, our conjunctive modeling of sensory and context processing complements Bayesian accounts of psychiatric symptoms and provides a mechanistic explanation for the role of sensory processing in maladaptive, inflexible behavior. ”

      (14) Improvement: justify choices, explain in more detail relationships with computational psychiatry literature.

      Thank you for pointing it out. As we explained in the previous point (13), we justified our model choice in the revised version.

      Minor comments:

      (1) Typos: "algorism" (pg2), duplicate Sun reference.

      Thank you for finding the typo and the missing reference. We revised accordingly.

      (2) Unclear statements from Methods:

      • "preparing temporal context with three histories" not sure what is meant by this.

      • "... state estimation by the context-selection module becomes less frequent." (Methods/Overview): what is the mechanism?

      • "default pattern" and failure to converge: What is the biological basis for them?

      • Why is the converter function used on some occasions but not others?

      • "new contextual state is prepared": What does that mean?

      We thank the reviewer for pointing out several unclear statements in the Methods section.

      • “preparing temporal context with three histories”

      We now explicitly state the formal description of three histories in the Methods as follows.

      “the state is defined by the recent n-step transition history of task state (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘). We changed n from 0 to 3.”

      • “state estimation by the context-selection module becomes less frequent”

      In our model, context selection is performed every time the agents execute an action sequence generated by Sequence composer. As learning progresses, the Sequence composer comes to predict distant future states and executes coherent action sequences based on these predictions. When no unexpected errors are encountered during execution, context estimation is suppressed, resulting in less frequent context selection. We modified the manuscript as follows.

      “After the action execution, the agents repeat the process by selecting the current contextual state. As the agents become familiar with the environment, hippocampal sequences that enable future predictions to become longer, and contextual state estimation by Context selector becomes less frequent. The algorithmic flow chart of our model is described in Figure S1.”

      • “default pattern”

      In biological systems, it is reported that the frontal cortex shows sensory modality-specific representation without prior learning (Manita et al., 2015). We refer to these innate modalityspecific sensory representations as the default pattern. In the early stages of learning, we assume that no stable contextual representations have yet been formed in the brain, and therefore, a default pattern uniquely driven by external stimuli is used as the context representation. Even during intermediate stages of learning, the context selector may fail to converge to a specific state. In such context-uncertain environments, it has been reported that agents often rely on previously learned or habitual action choices (psychological inertia), which is evident in ASD patients.

      “This contextual state is set as a default context, ensuring that the X module assigns a unique contextual state to each environmental state. Biologically, one possible interpretation is that this default context corresponds to modality-specific innate representations in prefrontal regions (Manita et al., 2015).”

      “This default implementation is analogous to psychological inertia, particularly under uncertainty (Ip and Nei, 2025; Sautua, 2017), which has been reported to be more pronounced in ASD patients (Joyce et al., 2017).”

      • Why is the converter function used only in some cases?

      The converter function A(stim → context) was introduced to compose the default pattern (one-toone mappings between stimuli and contexts) as we described above. In other cases, the Hopfield dynamics were used to select contextual states; therefore, we did not use the converter function.

      • “new contextual state is prepared”

      Thank you for pointing this out.

      The term “prepared” was inaccurate. We revised it to “generated”.

      In the case of remapping, we assumed that X generates a new random neural activity pattern in its contextual domain and stores it as a new contextual state. We described this process as “a new contextual state is generated”.

      (3) Please explain the mapping between hippocampal sequences to actions in more detail for each task.

      • Why 9 attempts before rejection?

      • Why all the variations on Hebb?

      We appreciate the reviewer’s request for clarification. Below, we provide additional explanations point by point.

      Mapping between hippocampal sequences and actions

      In this research, we defined action as the transition from one environmental state to another environmental state. The hippocampal sequences predict the transition of environmental states; therefore, they correspond to a set of action plans from the current environmental state. In the revised manuscript, we added the formal definition of environmental states and actions in each task.

      • Why 9 attempts before rejection?

      These repetitions ensure adequate exploration of the contextual states in X and the episodic sequence in H before committing to an action. Increasing the number of attempts excessively causes the reward value function to be dominated by a single highest-scoring sequence, thereby causing excessive exploitation and narrowing behavioral variability. While the exact number 9 is not critical—the qualitative results are robust to moderate changes—we selected this value because it provides a good balance between exploration and exploitation and produces the clearest visualizations in our figures. We have clarified this in Method below.

      “We set the number of attempts before rejection to nine, providing a balance between exploration and exploitation and serving as a good compromise for visualization.”

      • Why all the variations on Hebbian learning?

      We consider three loci of plasticity in our model: the X module, the H module, and their reciprocal connections. Within the H module, synaptic connections that link episodic segments—specifically from transition-coding neurons to state-coding neurons—are assumed to follow a reward prediction error–dependent, supervised form of Hebbian learning. This choice reflects the need to selectively reinforce transitions that lead to successful outcomes. In contrast, all other synaptic updates in the model are assumed to follow reward-independent, activity-based Hebbian learning. These learning rules support the unsupervised formation and stabilization of contextual representations and action execution.

      In addition to the basic Hebbian rule, we introduced biologically motivated constraints, such as upper and lower bounds on synaptic weights and heterosynaptic depression, which weakens nonpotentiated synapses. Importantly, these mechanisms do not alter the fundamental nature of Hebbian learning but increase the stability of our model.

      (4) For Q learning: please clarify "the state is defined by the recent transition history of task state.

      As you suggested, we clarified the statement by adding the following sentences in Method. “To highlight the advantage of our model, we compared it to the Q-learning with temporal contexts, namely, the state is defined by the recent n-step transition history of task states (i.e. 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> =(𝑆<sub>𝑘</sub>,𝑆<sub>𝑘−1</sub>, ⋯,𝑆<sub>𝑘−𝑛</sub>)<sup>𝑇</sup> , where 𝑠<sub>𝑘</sub><sup>(𝑛)</sup> is the temporal context state, and 𝑆<sub>𝑘</sub> is the environmental state at time 𝑘.”

      (5) What is the purpose and biological justification for the NG addition to RW?

      Thank you for raising this point. The prediction-error–based update of each sequence’s value function 𝑅 alone cannot distinguish between two fundamentally different cases:

      (a) the value of a sequence has genuinely decreased, or

      (b) the sequence remains useful, but it is just not appropriate in the current context. This distinction is essential for modeling context-dependent switching of behavioral strategies. To address this, we introduced the No-good (NG) indicator. NG allows the agent to temporarily mark certain sequences as unsuitable without altering their long-term value, thereby facilitating short-term exploration of alternative sequences. In other words, NG provides a mechanism for transiently suppressing a previously valid sequence in case of contextual changes, while preserving the underlying value learned in past experiences.

      This mechanism is consistent with several lines of biological evidence. First, extinction learning after fear conditioning does not erase the original fear memory but instead forms a new memory trace, known to be stored in the medial PFC (Milad & Quirk, 2002). This suggests that animals may switch to a different contextual representation rather than simply downgrading the value of the conditioned stimulus, supporting the idea of temporarily suppressing a sequence without modifying its intrinsic value.

      Second, recent studies in the ventral hippocampus show that dopamine D2–expressing neurons in the ventral subiculum promote exploration specifically under anxiogenic contexts (Godino et al., 2025). This finding is consistent with the short-term exploratory behavior enabled by our NG mechanism. Thus, we added the following statement to the manuscript:

      “No-good indicator is introduced to transiently suppress previously established sequences that have not been recently rewarded, without devaluing them. This no-good indicator facilitates RPEfacilitated remapping … that leads to exploration of different contextual states in X and sequences in H. The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025).”

      Together, these biological findings provide a conceptual basis for modeling NG as a contextsensitive, transient modulation that encourages exploration without overwriting previously learned sequence values.

      (6) Missing details about H network size

      Thank you for pointing it out.

      We used 300 neurons for H. We indicated it as below.

      “We model the hippocampus with an N = 300 binary recurrent neural network.”

      (7) S1 figure: learning is slower even in the early, easy phases of learning when the temporal dependence should not matter; how are learning rates calibrated across models?

      Thank you for raising this point. In our model, the learning rate was fixed at 0.15, whereas the control model (now shown in Figure S2) uses a higher learning rate of 0.4, independent of temporal context.

      Regarding why learning appears slower even in the early, easy phases, when the number of temporal contexts increases, the size of the state space expands. This broadening of the state space makes it more time-consuming to identify and reinforce the appropriate state transitions. This is especially evident in easy phases because the temporal context prepared in the model is excessive to the number of temporal contexts that the task requires.

      Importantly, unlike the control model, which postulated a fixed number of temporal contexts, our model gradually increases the number of temporal contexts depending on prediction error. This adaptive mechanism allows the model to achieve fast learning during early, easy phases while still enabling more complex learning in later phases.

      Reviewer #2 (Recommendations for the authors):

      (1) "Hippocampal neurons show sequential activity...." The authors should include more classical references for hippocampal sequential activity at this point, too.

      Thank you for your suggestion. We added the citations below

      Skaggs and McNaughton, 1996; Wilson and McNaughton, 1993

      (2) "...called remapping" also here, please reference classic work (Bostock, Muller, ...)

      As suggested, we added the citations below

      Bostock et al., 1991; Muller and Kubie, 1987

      (3) "Several theoretical models..." What I miss here are models that explain remapping by inputs from the grid cell population, and/or the LEC (see Latuske 2017 for review), still widely considered the standard mechanism. Also, the models by Stachenfeld et al. 2017, Mattar and Daw 2019, and Leibold 2020 specifically address context dependence. Accordingly, "A comprehensive model that can explain the formation of context-dependent hippocampal sequences of various lengths through remapping, while relying on a biologically plausible learning process,..." somewhat overstates the novelty of the current paper.

      Thank you for pointing this out and for suggesting relevant citations. We agree with the reviewer that inputs from MEC and LEC to the hippocampus constitute a fundamental mechanism underlying remapping. However, in our view, a key open question in the remapping field is how MEC and LEC estimate the current context and convey this information to the hippocampus in a manner that supports goal-directed behavior. While previous studies have addressed remapping at the representational level and the hippocampal sequence at planning, the overall relationship between remapping, reinforcement learning, and planning has not yet been explained within a single unified model. In this work, we propose a simple and biologically plausible model that integrates an Amari–Hopfield network for context selection with hippocampal sequences, providing an account of coordination under goal-directed behavior. To more accurately position the novelty of our contribution, we have revised the manuscript as follows.

      “While previous works have explored hippocampal sequential activity for planning (Jensen et al., 2024; Mattar and Daw, 2018; Pettersen et al., 2024; Stachenfeld et al., 2017) and hippocampal remapping for contextual inference (Low et al., 2023) separately, they have yet to elucidate how these two aspects jointly enable flexible behavior. A simple biologically plausible model-based reinforcement learning model that uses the Amari-Hopfield model for context selection and hippocampal sequences of various lengths as a state-transition model for long-horizon planning, relying on remapping driven by prediction errors to form state representation, would thus provide valuable insights into the neural mechanisms underpinning context-dependent flexible behavior.”

      (4) Please properly introduce nomenclature "C2α, C2β, S2,...." S is sometimes used for stimulus, sometimes for location (state?), or even action?

      Thank you for pointing it out. We acknowledge that the annotation of Cn (e.g., C1, C2…) was not straightforward. Therefore, we changed the annotation to Xn (e.g., X1, X2, …) in order to indicate the contextual state of X.

      We define Sn (e.g., S1, S2…) as the external input given by the environment and represented in stim. domain of X, while Xn (e.g., X1, X2…) is the subjective contextual state generated by the agent and represented in the context domain of X. As a reference, we added the neural representation of X in Figure 2D and added the following text below.

      “The neural activity of X at each contextual state is shown in Figure 2D, where the environmental states (e.g., S1, S2…) are represented in the stimulus domain, and the contextual states (e.g., X1, X2α…) are represented in the context domain.”

      (5) "Our model replicates this result by blocking the synaptic transmission from most of the neurons in the context domain of X to H (Figure 3F).". Does this mean the X module is hypothesized to be in the EC?

      Thank you for the thoughtful question. In our model, the X module is intended as a functional abstraction that combines the roles of several brain regions known to contribute to contextual representation, including the prefrontal cortex (PFC) and the entorhinal cortex (EC). Although X is not necessarily meant to correspond to a single anatomical region, we consider it likely that the contextual information represented in X would reach the hippocampus (H) (CA3 and CA1) primarily through the EC. Thus, the experimental manipulation shown in Figure 3F—suppression of medial EC axon at the hippocampus—is interpreted in our framework as weakening the input from X to H.

      We added the following texts in the Discussion section.

      “We speculate that Context selector is implemented across multiple brain regions with varying degrees of resolution, including a part of the entorhinal cortex and prefrontal cortex.”

      “Our model posits that the Sequence Composer corresponds to computations within the hippocampus. As a biologically plausible projection, we consider the CA3–CA1 circuit, where contextual inputs from regions such as the PFC and EC provide the current contextual state to CA3, enabling the recurrent CA3–CA1 architecture to generate predictions of the next contextual state.”

      (6) Discussion "model-based reinforcement learning": Please detail where the model is here. In my understanding, the naive agent does not have a model (this would be model-free then?).

      Thank you for asking.

      Unlike model-free reinforcement learning, where each action is evaluated step by step, we use hippocampal sequences for multiple-step prediction and action planning. This is the “model” in our research. As you mentioned, initially, animals do not have a “model”, but Sequence composer gradually chunks the episodic segments to compose a longer sequence.

      (7) "...can change the attractor dynamics in the hippocampus (34)": What is (34)? I also would doubt that one can make such absolute statements about the human hippocampus.

      Thank you for pointing out the missing citation. We corrected it accordingly.

      Rolls E. 2021. Attractor cortical neurodynamics, schizophrenia, and depression. Transl Psychiatry 11. doi:10.1038/s41398-021-01333-7

      (8) "To the best of our knowledge, this is the first model that describes the formation of contextdependent hippocampal activity through remapping and its contribution to flexible behavior." See "Several theoretical models...".

      Thank you for pointing this out. We admit that it was an overstatement. We corrected it accordingly.

      “To the best of our knowledge, this is the first model that uses associative memory for describing the formation and switching of context-dependent hippocampal activity through remapping and its contribution to flexible behavior.”

      (9) "We speculate that the context-selection module is implemented across multiple brain regions..." How would an attractor network be implemented over "multiple brain regions"?

      We thank the reviewer for raising this important conceptual question. Context information in realistic environments is likely to have a hierarchical structure. We therefore speculate that multiple brain regions may jointly support context selection by maintaining different levels or components of this hierarchy. In particular, the prefrontal cortex (PFC), medial entorhinal cortex (MEC), and lateral entorhinal cortex (LEC) have all been implicated in representing contextual or task-state information at different levels of abstraction. These regions are known to exhibit attractor-like dynamics and to provide inputs to the hippocampus. Thus, an attractor network spanning multiple regions could arise, with different areas stabilizing distinct components of the contextual representation, depending on the timescale of memory, task demands, or sensory features.

      We used the Amari–Hopfield network as a functional abstraction to explain such multi-regional interactions underlying context representation, rather than to provide a one-to-one mapping onto a specific brain region. How region-specific attractor dynamics jointly contribute to maintaining global contextual information and enabling context switches in response to prediction errors remains an important direction for future research.

      Methods:

      (10) "... agents move through discrete environmental states characterized by distinct external stimuli.": How is this exactly implemented? What is the neural representation of these states, xi? What is the difference to a "landmark"?

      We appreciate the reviewer’s thoughtful question regarding the implementation and neural representation of environmental states. In our model, each environmental state is represented as a binary stimulus pattern provided to the stimulus-domain neurons in Context Selector. Specifically, for each state, we constructed a pattern in which half of the neurons are set to 1 and the other half to 0. We chose this design because, in the Amari–Hopfield model, memory performance is maximized when stored patterns contain approximately equal proportions of 0 and 1. For clarity, we have added an illustration of these stimulus patterns in the revised Figure 2D.

      Regarding the reviewer’s question about landmarks: in our framework, a landmark denotes an environmental state for which the contextual state is uniquely determined, regardless of the preceding transition history. For simplicity in this study, we designated the initial environmental state in each task (S0 or S1) as the landmark. Importantly, in our implementation, landmarks do not differ from other states in terms of their stimulus pattern; their special role arises solely from the task structure, not from additional sensory properties.

      In real environments, what constitutes a landmark likely varies depending on stimulus saliency and the agent’s prior experience. Determining how landmarks should be optimally defined or learned is an interesting direction for future work.

      (11) How are different contexts represented for the same stimulus xi^stim?

      We added an example of neural activity in X in Figure 2D, illustrating the distinction between the stimulus domain and the context domain. While the activity in the stimulus domain depends on the external stimulus, the contextual domain consists of uncorrelated random neural states. We exploit a key property of the Amari–Hopfield network to associate each contextual state with a given external stimulus.

      (12) "...and its stimulus domain ??stim becomes identical to ??xistim ." Does that mean every stimulus is an attractor in the context net? How can that work with only 1200 neurons? Is that realistic for real-life environments? Neuron numbers would need to increase dramatically.

      As you mentioned, we assigned each stimulus to a corresponding attractor in the Context selector (X). An Amari–Hopfield network with 1,200 neurons can store approximately 10–20 attractors, which is sufficient to solve the tasks considered in this study. We adopted the Amari–Hopfield network for its simplicity and conceptual clarity; however, in biological neural systems, it is not necessary to construct such rigid attractors for every stimulus. For example, modality-specific neural projections exist in the brain and are sometimes sufficient to form loose attractor states across different stimuli. In addition, the prefrontal cortex is known to support working memory, which may also serve as a form of contextual representation incorporating recent history. Thus, we propose that multiple brain regions cooperate to implement the Context selector.

      (13) How are WHX and WHH initialized?

      Thank you for pointing this out.

      We set the initial condition of all W to 0. We added the following text in the Method section.

      “Note that the initial synaptic weights of 𝑊<sup>𝐻𝑋</sup> and 𝑊<sup>𝑋𝐻</sup> are all 0.”

      (14) It is unclear why the hippocampus separates into state and transition neurons. Why cannot one pattern serve both purposes?

      Thank you for asking about this important point.

      The reason why we prepare two kinds of hippocampal neurons is that state-coding neurons represent the current contextual state, and transition-coding neurons predict the following contextual state under the current contextual state. These two separations enable it to predict multiple scenarios under the current contextual state and to choose a sequence most suitable in the environment.

      We rewrote the following sentences in the manuscript.

      In result section,

      “In Sequence composer, there exist two types of neurons: state-coding neurons, which represent each contextual state, and transition-coding neurons, which encode transitions to successive contextual states given the contextual state indicated by the state-coding neurons”

      In Method section,

      “The state-coding neurons receive input from 𝑋 and represent the current contextual state, while the transition-coding neurons send output to 𝑋 and predict the next contextual state after an action i.e., T(𝑋<sub>𝑘+1</sub>|𝑋<sub>𝑘</sub>,𝑎<sub>𝑘,𝑘+1</sub>).”

      (15) "the agents execute actions according to this sequence." How are the actions defined? Are they part of the state?

      We thank the reviewer for raising this important point. In our model, an action is defined as the transition from a given environmental state to the next environmental state. To avoid ambiguity, we have added a formal mathematical definition of actions for each task in the revised manuscript. In our framework, the transition-coding neurons in Sequence Composer (H) predict the upcoming environmental state, and thus the hippocampal sequence intrinsically contains the representation of an action. Consequently, the sequence generated before actions functions as the agent’s internal action planning process.

      (16) "Because the input source for the state-coding neuron and the transition coding neuron differ (the former is selected from ??, while the latter is selected from ??), the same hippocampal neuron could occasionally be used for both state-coding and transition-coding across different contextual states. This is evident when an excessive number of contextual states are prepared, especially in the SZ condition. This phenomenon degrades state estimation at X (eq.3)." I have no idea what you want to convey here, .... and how is state estimation related to Equation 3?

      We appreciate the reviewer’s feedback and agree that our original explanation was unclear. Our intention was to clarify why context estimation deteriorates specifically in the SZ condition.

      In our model, state-coding neurons in the hippocampus represent the current contextual state, and transition-coding neurons predict the next contextual state given the current contextual state. Under normal conditions, these two sets of neurons remain sufficiently distinct, allowing accurate prediction of the upcoming contextual state, which is conveyed to X. However, when an excessively large number of contextual states are stored in the SZ condition, representations in the hippocampus begin to overlap. As a result, some hippocampal neurons are inadvertently recruited for both state-coding and transition-coding across different contextual states. This overlap disrupts the H’s ability to accurately predict the next contextual state.

      This degraded prediction directly affects the state-estimation process in X (Eq.3), because Eq.3 relies on receiving an accurate predicted next state from H. When this signal becomes ambiguous, X may converge to an incorrect contextual state, potentially mimicking hallucination-like inference errors.

      We have rewritten the relevant passage in the manuscript to clarify this mechanism as follows.

      “When the number of contextual states increases - particularly in the SZ condition - representational overlap arises between hippocampal state-coding and transition-coding neurons.

      This overlap makes the prediction of the next contextual state by the transition-coding neurons unreliable. The degraded prediction from H, in turn, corrupts the initial condition for context selection in X (Eq. 3), leading to hallucination-like behavior.”

      (17) The figures hardly show simulated activity. Consider displaying more neuronal simulations to help the reader grasp the workings of the model.

      Thank you for your suggestion. We indicated the neural activity of X and H in Figures 2D and 2E, respectively, to show the overview of our model.

      (18) Figure 5: What is the "Hopfield count"?

      Thank you for pointing this out. The definition of the Hopfield count was ambiguous. We added an explicit explanation of “context selection” and its possible outcomes (correct association, hallucination-like, and default contexts) in Fig. S1. To clarify our claim, we replaced the countbased measure with the probability of selecting hallucination-like and default contexts during context selection. Accordingly, we removed the term “Hopfield count” and revised the caption of Figure 5 as follows.

      “The result of context selection (see Figure S1). The probability of wrong stimulus reconstruction (hallucination-like effects) is plotted in red, and the probability of default context usage due to failures in context reconstruction (see Materials and Methods) is plotted in blue.”

      (19) Figure 6: Consider moving this upfront.

      Thank you for the suggestion. We moved Fig.6 to Fig.S1 and introduced it earlier in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      I was a bit confused about the implementation, which may not be autonomous, meaning there are numerous stages that require intervention from outside the X-H network (see Figure 6). It seems that the X network might wait to converge before providing input to H, rather than having the entire network evolve in parallel. There are also aspects to the implementation that seem rather ad hocsuch as the "no-good indicator".

      Thank you for the thoughtful comments. We would like to clarify several points regarding the implementation and its biological motivation.

      First, regarding the concern that the X–H interaction may not be fully autonomous:

      In our framework, the convergence time of the X module under external sensory input is assumed to be on the order of several hundred milliseconds, consistent with the timescale of stimulus-evoked cortical population dynamics observed in biological systems. Especially when hippocampal input is present, X does not need to explore the full attractor landscape. Instead, it quickly settles into an attractor located near the hippocampal cue, which substantially shortens the convergence time.

      Second, although our current implementation proceeds in an algorithmically sequential manner for clarity, we do not intend to imply that the brain performs these steps sequentially. Biologically, the states of X and H are expected to co-evolve and mutually constrain each other through recurrent interactions. The sequential algorithm in the model is therefore a practical choice for implementation, not a theoretical claim about strict temporal ordering in the neural system.

      Finally, the “no-good indicator” is introduced to suppress hippocampal sequences transiently and thereby accelerate switching behavior. Our no-good indicator is most consistent with the biological findings on D2-expressing neurons in the hippocampus. We added the following text below.

      About the no-good indicator

      “The no-good indicator is inspired by recent findings in the ventral hippocampus, where dopamine D2-expressing neurons of the ventral subiculum selectively promote exploration under anxiogenic contexts (Godino et al., 2025)”

      Besides the hippocampus, similar mechanisms—temporary suppression of recently visited or lowvalue attractor states—have been proposed in biological decision-making and working-memory literature, providing conceptual support for the no-good indicator in our model.

      After exposure to a new context, a new memory/context is stored in the X network. As the storage of a new memory requires synaptic plasticity, this step would presumably take a significant amount of time in an animal.

      Thank you for raising this important point. We agree that the formation of a new memory or context requires synaptic changes, and it is well established that processes such as tagging during wakefulness and consolidation during sleep take considerable time. However, once a context has been learned, switching between contexts can be achieved just by moving between attractors in the X network. This mechanism allows for rapid, context-dependent behavior without requiring new synaptic modifications each time. Our study focuses on this aspect of fast context-dependent switching rather than the initial memory formation.

      My understanding is that the Amari-Hopfield network should be evolving in continuous time and not be binary. But there were no time constants mentioned, and the equations were not provided, and it seems that the elements of X were binary units, rather than analog. This should be clarified.

      Thank you for the comment.

      Although there are models with continuous firing rates and continuous time (Ramsauer et al., 2021), the original Amari-Hopfield model uses binary neurons operating in discrete time steps. As we answered the comments (5) and (6) from Reviewer 1, we considered only a discretely timestepped environment for which the timescale is arbitrary. At each environmental state where the current contextual state is selected, it typically takes about ten iterations for the conversion of the Amari-Hopfield network.

      In the text, we added the following text.

      “For simplicity, the environment is defined in discrete time, and agents move through environmental states characterized by distinct external stimuli.”

      Figure 3 is aimed at replicating the lap cell finding of Sun et al, 2020. In panel E, a comparison is made between the data and the model. Are the cells in the model the entire population of H neurons (state and transition), or just a subset? Does the absence of the "ghosts" (the weaker off diagonal responses seen in the experimental data) imply that the network is not encoding that it is in the same location, but a different lap? Why is there not any true sequentiality (i.e., why do all H units go on at once)?

      Thank you for your insightful comments. Throughout this study, we used 300 neurons for the Sequence composer (H); however, for simplicity, we constrained the model such that only a single H neuron was active at each time point. As a result, most other neurons remained silent. Accordingly, in Fig. 3E, we display only neurons with firing activity, and silent neurons are not shown.

      As you correctly inferred, hippocampal neurons in our model encode lap identity rather than the same physical location across laps. This design choice reflects our focus on hippocampal neurons representing contextual states, rather than place-coding neurons, as only the former contributes directly to contextual behavior in our framework. As shown in Fig. 3E, hippocampal neurons exhibit clear sequential activity with “episode-like” representations corresponding to individual laps. Nevertheless, we believe that incorporating a mixture of context-coding neurons and place-coding neurons is an important direction for future work, as illustrated in Fig. S3.

      We revised the caption of Fig. 3E as follows.

      “E, The comparison of (Left) lap cells in the hippocampus in the 4-lap task (Sun et al., 2020) and (Right) our results of active neurons in the H module.”

      Typo "but also makeS predictions".

      Thank you for pointing this out. We revised it correctly.

    1. Author Response:

      We appreciate the reviewers’ thoughtful assessments and constructive feedback on our manuscript. The central goal of our study was to propose a simple and biologically inspired model-based reinforcement learning (MBRL) framework that draws on mechanisms observed in episodic memory systems. Unlike model-free approaches that require processing at each state transition, our model uses sequential activity (= transition model) to predict environmental changes in the long term by leveraging episode-like representations.

      While many prior studies have focused on optimizing task performance in MBRL, our primary aim is to explore how flexible, context-dependent behavior—reminiscent of that observed in biological systems—can be instantiated using simple, neurally plausible mechanisms. In particular, we emphasize the use of an Amari-Hopfield network for the context selection module. This network, governed by Hebbian learning, forms attractors that can correct for sensory noise and facilitate associative recall, allowing dynamic separation of prediction errors due to sensory noise versus those due to contextual mismatches. However, we acknowledge that our explanation of these mechanisms, especially in relation to sensory noise, was not sufficiently developed in the current manuscript. We plan to revise the text to clarify this limitation and to expand on the implications of these mechanisms in the context of psychiatric disorder-like behaviors, as illustrated in Figure 5. Several reviewers raised concerns about the clarity of our model. Our implementation is intentionally algorithmic rather than formal, designed to provide an accessible proof-of-concept model. We will revise the manuscript to better describe the core logic of the model—namely, the bidirectional interaction between the Hopfield network (X) and the hippocampal sequence module (H), where X sends the information on estimated current context to H, and H returns a future prediction based on the episode to X. This interaction forms a loop enabling the current context estimation and its reselection.

      The key advantage of this architecture is its ability to flexibly adjust the temporal span of episodes used for inference and control, providing a potential solution to the challenge of credit assignment over variable time scales in MBRL. Because our model forms and stores the variable length of episodes depending on the context, it can handle both short-horizon and long-horizon tasks simultaneously. Moreover, because each episode is organized by context, reselecting contexts enables rapid switching between these variable timescales. This flexibility addresses a challenge in MBRL—the assignment of credit across variable time scales—without requiring explicit optimization. To better illustrate this important feature, we plan to include additional experiments in the revised manuscript that demonstrate how context-dependent modulation of episode length enhances behavioral flexibility and task performance.

      Finally, we will address the comments on the presentation and the biological grounding of our model. To improve clarity and biological relevance, we will revise the Methods section to explicitly describe how the model is grounded in mechanisms observed in real neural systems. Also, we will clarify which parts of our figures represent computational results versus schematic illustrations and more clearly explain how each model component relates to known neural mechanisms. These revisions aim to improve both clarity and accessibility for a broad audience, while reinforcing the biological relevance of our approach.

      We thank the reviewers again for their insightful comments, which will help us substantially improve the manuscript. We look forward to submitting a revised version that more clearly conveys the contributions and implications of our work.

    1. Author response:

      We thank the reviewers for their excellent and thoughtful comments and suggestions, along with their strong support of the work. We agree with the general feedback that there is opportunity for further mechanistic dissection of the data from a variety of interesting angles. This was a fascinating project to work on because of all of the possible directions, and we attempted to highlight a diversity of compelling findings. We wish we had time to devote to answering more of the open mechanistic questions, but, given competing priorities, we are unfortunately unable to do them justice at this time. At the suggestion of a reviewer, we have made results available through MaveDB (accession numbers urn:mavedb:00001270-a and urn:mavedb:00001271-a) as a way to empower others to explore more.

    1. Author response:

      We thank the editors and reviewers for their careful reading of our manuscript and for their insightful comments. We appreciate the opportunity to clarify several aspects of the derivations and experimental design, and we will revise the manuscript accordingly. Below we provide responses to the major weaknesses raised by the reviewers.

      The derivation of the main error term misses some important steps, which complicates peer review at this stage. In particular, factorisation of the covariance into noise and the inverse of the observation covariance matrix needs a more thorough justification. The cited sources do not contain the derivation for a noise term with full covariance, which is essential for deriving this error term.

      Thank you for pointing this out. We agree that the derivation of the main error term should be presented more explicitly to facilitate peer review. In the revised manuscript, we will explicitly cite the relevant equation numbers from the references to make each step of the argument easier to follow. We will also revise the text to more clearly discuss the assumption on the noise covariance matrix.

      The pratical recommendation at the end of the paper also requires clearer guidance on how the design perturbations are constructed, and how many times and for how long the system is stimulated in each iteration of the experiment.

      Thank you for this helpful suggestion. We agree that the practical implementation of the experimental design should be explained more clearly. In the revised manuscript, we will provide a more explicit description of how the input perturbations are constructed in each iteration. To more clearly explain how many times and for how long the system is stimulated, we will clarify the stopping criterion used in the iterative procedure and the time length of the external inputs. As shown in Eq. (8), the estimation error scales approximately as 1/T, so longer measurements improve accuracy. For clearer guidance, we will add additional explanations on the relation between the stimulation time and estimation accuracy, as well as on the role of iterative input design.

      Finally, there is no analysis of model mis-specification. In particular, the true dynamics are unlikely to be linear; the noise is unlikely to be either Gaussian or uncorrelated across time; and the B matrix is unlikely to be known perfectly. We're not suggesting that the authors consider a more complex model, but it's important to know how sensitive their method is to model mismatch. If nothing can be done analytically, then simulations would at least provide some kind of guide.

      We thank the reviewer for raising this important point. We agree that it is important to understand how sensitive the proposed method is to model mismatch. While our current theoretical analysis assumes linear dynamics with Gaussian noise for analytical tractability, real systems may deviate from these assumptions in several ways, including nonlinear dynamics, temporally correlated noise, or imperfect knowledge of the input matrix B. To address this concern, we will add simulation experiments to examine the robustness of our method under several types of model misspecification. These simulations will provide practical guidance on how deviations from the assumed model affect estimation performance. We will include these results and discuss their implications in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the role of transcriptional and translational controls of gene expression in dorsal root ganglia and lumbar spinal cord in neuropathic pain in mice. Using ribosome profiling (Ribo-seq) and translating ribosome affinity purification (TRAP), they show changes in transcriptomic and translational gene expression at the peripheral and central levels rapidly after nerve injury. While translational changes in gene expression remained elevated for more than two months in both DRGs and the spinal cord, transcriptomic regulation was absent in the spinal cord long after the onset of neuropathy. Disrupting mRNA translation in dorsal horn neurons using antisense oligonucleotides reduced mechanical withdrawal threshold and facial expression of pain. Using fluorescent noncanonical amino acid tagging (FUNCAT), the authors further show that de novo protein expression primarily occurs in inhibitory neurons in the superficial dorsal horn after nerve injury. Accordingly, a selective increase in translational control of gene expression in spinal inhibitory neurons, or a subset of mainly inhibitory neurons expressing parvalbumin (PV), using transgenic mice, led to a decrease in the excitability of PV neurons and mechanical allodynia. In contrast, decreasing the translational control of spinal PV neurons prevented the alteration of the electrophysiological properties of the PV cells induced by nerve injury.

      Strengths:

      This is a well-written article that uncovers a previously unappreciated role of gene expression control in PV neurons, which seems to play an important part in the loss of inhibitory control of spinal circuits typically seen after peripheral nerve injury. The conclusions are generally well supported by the data.

      Weaknesses:

      The study would benefit from further clarifications in the methods section and a deeper analysis of gene expression changes in mRNA expression and ribosomal footprint observed after nerve injury.

      We have improved the description of the methods and clarified the rationale underlying the presentation of gene expression changes. We have also added lists of the top differentially expressed genes at both the translational and transcriptional levels to Figure 1, and improved the description of the datasets in the Supplementary Materials.

      Antisense oligonucleotides used to reduce translation by disrupting eIF4E expression were administered i.c.v. It is unknown if the authors controlled for locomotor deficits, which might add confounds in the interpretation of behavioral results. A more local route should have been preferable to avoid targeting brain regions, which could potentially affect behavior.

      Thank you for raising this important point. We used i.c.v. administration to specifically target the central nervous system (CNS) without affecting the peripheral nervous system, as this is the recommended approach for selectively targeting the CNS using ASOs. Intraspinal administration of ASOs (into the spinal cord parenchyma) at an effective dose for long-term effects is not feasible. Intrathecal administration is possible but would result in exposure of the DRGs to the injected ASO and therefore would not be specific to the CNS.

      To rule out potential locomotor deficits, we now subjected mice to the rotarod and open field tests to assess motor function. We found no differences between eIF4E-ASO– and control-ASO– injected mice (Fig. 2J, K).

      In the revised version of the manuscript, we now better explain the rationale for i.c.v. injection. Moreover, we discuss the potential supraspinal effects of eIF4E-ASO in the Limitations section, while also describing the lack of motor phenotypes in the rotarod/open field tests.

      Only female mice were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology, but both sexes were used for behavior experiments.

      Our manuscript involves various complicated techniques and analyses. Due to limited resources, we therefore opted to use only females for expensive and labor-intensive experiments, such as Ribo-Seq, TRAP, FUNCAT, and electrophysiology, while using both sexes for behavioral studies.

      We now clearly acknowledge this limitation in the revised manuscript.

      The conditional KO of 4E-BP1 using transgenic animals should be total in the targeted cells. However, only a partial reduction is reported in Figure S2 in GAD2, PV, Vglut2, or Tac1 cells. Again, proper methods for quantification of fluorescence in these experiments are lacking.

      We apologize for the oversight; we have now updated the description of the methods for IHC signal quantification. Although genetic ablation is indeed expected to result in a complete loss of signal, in practice, previous studies employing IHC, but not Western blotting, for 4E-BP1 have also shown only a partial reduction in signal. This is likely because the 4E-BP1 antibody partially detects other epitopes. Using the same antibody, we and others have shown complete elimination of the band corresponding to 4E-BP1 in spinal cord and DRG tissue (e.g., PMID: 26678009).

      The elegant knockdown of eIF4E using AAV-mediated shRNAmir shows a recovery of the electrophysiological intrinsic properties of PV neurons after injury. It is unclear if such manipulation would be sufficient to reverse mechanical allodynia in vivo.

      Thank you for this concern, which was also raised by other reviewers. We have now performed two additional experiments, which revealed that suppressing the mTORC1–eIF4E axis in spinal PV neurons (using AAVs expressing eIF4E-shRNA in spinal PV neurons [Fig. 6A] and transgenic mice expressing non-phosphorylatable 4E-BP1 in PV neurons [Fig. 6B]) is not sufficient to alleviate neuropathic pain. These new findings need to be reconciled with our other results showing that eIF4E downregulation in PV neurons prevents the SNI-induced reduction in their excitability, and that ASO-mediated suppression of eIF4E, which affects all cell types, alleviates neuropathic pain.

      Together, these results suggest that targeting translational control in PV neurons is sufficient to reverse SNI-induced reduction in PV neuron excitability, but is not sufficient to prevent behavioral phenotypes, which likely require changes in other cell types and/or additional pathways, as well as other alterations within PV neurons. We have now included these new results in the revised manuscript (Fig. 6A and Fig. 6B) and revised the text accordingly. These changes include toning down the role of translational control in PV neurons after SNI in driving behavioral hypersensitivity.

      Reviewer #2 (Public review):

      Summary:

      I reviewed the manuscript titled "Translational Control in the Spinal Cord Regulates Gene Expression and Pain Hypersensitivity in the Chronic Phase of Neuropathic Pain." This manuscript compares transcription and translation in the spinal cord during the acute and chronic phases of neuropathic pain induced by surgical nerve injury. The authors chose to focus their investigation on translation in the chronic phase due to its greater impact on gene expression in the spinal cord compared to transcription.

      (1) The study is significant because the molecular mechanisms underlying chronic pain remain elusive. The role of translational regulation in the spinal cord has not been investigated in neuroplasticity and chronic pain mouse models. The manuscript is innovative and technically robust. The authors employed several cutting-edge techniques such as Rio-seq, TRAP-seq, slice electrophysiology, and viral approaches. Despite the technical complexity, the manuscript is wellwritten. The authors demonstrated that inhibition of eIF4E alleviates pain hypersensitivity, that de novo protein synthesis is more pronounced in inhibitory interneurons, and that manipulating mTOR-eIF4E pathways alters mechanical sensitivity and neuroplasticity.

      Strengths:

      Innovation (conceptual and technical levels), data support the conclusions.

      Weakness:

      Confusion about the sex of the animals. It is unclear whether eIF4E ASO affects translation and which cells. It is not determined that modulating translation in PV<sup>+</sup> neurons impacts neuropathic pain behaviors.

      We thank the reviewer for their thoughtful comments. In the revised version of the manuscript, we better explain that both sexes were used for behavioral experiments, whereas only females were used for Ribo-Seq, TRAP, FUNCAT, and electrophysiology experiments.

      ASOs are not known to be intrinsically cell-type-specific; therefore, we do not expect differential effects on excitatory versus inhibitory neurons. We demonstrated that eIF4E-ASO reduces the levels of eIF4E, a key translation initiation factor that is rate-limiting for cap-dependent translation.

      Moreover, in the revised manuscript we included two additional experiments (Fig. 6A and Fig. 6B) showing that decreased eIF4E-dependent translation in PV neurons is not sufficient to alleviate neuropathic pain, despite its effects on excitability measures. We have updated the manuscript to reflect these important new findings

      Reviewer #3 (Public review):

      Summary:

      This study provides evidence for translational changes in inhibitory spinal dorsal horn neurons following chronic nerve injury. Gene expression changes have been widely studied in the context of pain induction and provided key insights into the adaptation of the nervous system in the early phases of chronic pain. Whereas this is interesting biologically, most patients will arrive in the clinic beyond the acute phase of their injury, thus limiting the translational relevance of these studies. Recent studies have extended this work to highlight the difference between acute and chronic pain states, potentially explaining the cascading factors leading to chronic pain, and hopefully how to prevent this in vulnerable populations. The present study suggests that translational changes within spinal inhibitory populations could underlie long-term chronic pain, leading to decreased inhibition and heightened pain thresholds.

      Strengths:

      The approaches used and the broad outcomes of the manuscript are interesting and could be an exciting development in the field. The authors are using approaches more common in molecular biology and extending these into neuroscientific research, getting into the detail of how pathology could impact gene expression differentially across the course of an injury. This could open up new areas of research to selectively target not only defined populations but additionally help alleviate pain symptoms once an injury has already reached the maintenance phase. There is an opportunity to delve into what must be a very large data set and learn more about what genes are differentially translated and how this could affect circuit function.

      Weaknesses:

      Whereas the authors approach a key question in pain chronicity, the manuscript falls a little short of providing any conclusive data. The manuscript was in some areas very difficult to follow. Terminology was not always consistent or clear, and the flow of the manuscript could use some attention to highlight key areas. Whereas the overall message is clear in the summary, this would not necessarily be the case when reading the manuscript alone.

      To improve the clarity and flow of the manuscript, we made changes to the text, including the addition of intermediate summaries and further explanations of terms and experiments.

      The study claims to show that translational control mechanisms in the spinal cord play a role in mediating neuropathic pain hypersensitivity, but the studies presented do not fully support this statement. The authors instead provide some correlation between translation and behavioural reflex excitability (namely vfh and Hargreaves).

      It is difficult to fully interpret the work, as there are a number of inconsistencies, namely the range of timings pre- and post-injury, lack of controls for manipulations, the use of shmiRNA versus lineage deletions, and lack of detailed somatosensory testing. It is not completely clear how this work could be translatable as is, without a deeper understanding of how translational control affects circuit function and whether all of this is necessarily bad for the system, or whether this is a positive homeostatic adaptation to the hyperexcitability of the circuit following injury.

      A large portion of the work is focussed on showing an inhibitory-selective change in translation following chronic nerve injury. The evidence for this is however lacking. Statistics to show that translational effects are restricted to inhibitory subpopulations are inadequate. The author's choice of transgenic lines is not clear and seems to rely on availability rather than hypothesis.

      Although we agree with some of the criticism, we have reservations regarding other points raised by the reviewer. To address several of the concerns, we added new experiments (Fig. 2J, 2K, 6A, and 6B). We also made changes to the text to improve readability and to better explain the rationale for the study and our focus on inhibitory neurons.

      For example, we clarify that we do not state that changes in mRNA translation in the spinal cord during the chronic phase of neuropathic pain occur exclusively in inhibitory neurons. Although we observe changes in general protein synthesis, assessed using FUNCAT, in inhibitory but not excitatory neurons after SNI, alterations in the translation of specific transcripts, assessed using the TRAP approach, are observed in both excitatory and inhibitory neurons.

      The second part of the paper focuses on inhibitory neurons because these neurons demonstrate larger translational changes. We now clearly indicate that alterations in excitatory neurons are also likely important during the chronic phase of SNI. This conclusion is further supported by newly added results (Fig. 6A and Fig. 6B), showing that targeting eIF4E-dependent translation in spinal PV neurons using two different approaches is not sufficient to reverse pain hypersensitivity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Analysis of gene expression in Figure 1 lacks clarity, and the data do not effectively guide the reader toward their intended purpose. A list of the most dysregulated genes at the transcriptional level, the translational level, or both, would help the reader fully appreciate the outcome of this analysis. Similarly, what is the message conveyed by Figures 4 D-G?

      As requested, we have now included the top 10 upregulated and top 10 downregulated genes at both the translational and transcriptional levels in Figure 1. We also expanded the main text and figure legends to clarify that Supplementary Figure 1 includes volcano plots for all conditions, and that Supplementary Table 1 contains the complete datasets. In addition, we expanded the figure legends to explain the organization of the data in Supplementary Table 1. Finally, we provide pathway analyses of translationally regulated genes in the spinal cord, as this condition is the primary focus of the study.

      Figure 4D–G shows the top 15 translationally upregulated and downregulated genes in inhibitory neurons at days 4 (D) and 60 (E), and in Tac1<sup>+</sup> excitatory neurons at days 4 (F) and 60 (G) (four conditions in total) after SNI. These panels convey that translational regulation of specific transcripts occurs in both inhibitory and excitatory neurons. Panel 4H further demonstrates that, although translational changes are observed in both neuronal populations, a greater number of genes are altered in inhibitory neurons. We have improved the readability and flow of this section to better convey this message.

      Details about how AHA was quantified in Figure 3 are missing. It is unclear how and where the cells were selected for quantification. Objective criteria for expression/no expression of AHA in the cells are not indicated. Additionally, the signal seems to have somehow been normalized over images from the contralateral side. It is difficult to understand what the bar graphs actually represent in panel C. One would interpret them as percentages of excitatory/inhibitory cells expressing AHA.

      We apologize for the lack of clarity. We have now expanded the description of the analyses in the figure legend and in the Methods to better explain the results shown in Fig. 3. The imaged cells were selected based on specific criteria, such as lamina location and cell type. In panel C (the anisomycin experiment), values were normalized to the control group. In all other panels, no normalization was applied, and the values represent the AHA integrated density on maximumintensity projection images (averaged per mouse). We also describe the number of sections and cells per mouse, as well as other technical details, as requested.

      In addition, a few minor changes should be made:

      (1) Rephrase Introduction: "Peripheral nerve injury can cause neuropathic pain, a chronic pain condition [...]." Neuropathic pain is not necessarily chronic.

      This sentence was reworded to read “Peripheral nerve injury may result in neuropathic pain, a debilitating condition with limited effective treatment options”.

      (2) Host species for secondary anti-mouse antibodies are provided but not for the anti-rabbit (donkey?). Also, check for consistency in the methods section. The method mentions P21 two secondary antibodies and an apparent third antibody named "anti-HRP-conjugated antibody." Please provide information about this antibody, or remove it.

      Thank you for flagging it, the inadvertent repetition of “anti-HRP-conjugated antibody” was removed.

      (3) Provide primary antibody hosts on page 22.

      The hosts of all primary and secondary antibodies were now provided.

      (4) Define PBST on page 21 and PBS-T on page 22.

      We defined PBST in the revised manuscript (0.2% Triton-X100 in PBS).

      (5) Specify the filter sets used for fluorescent microscopy.

      We specified the filter sets used for fluorescent microscopy.

      (6) Change the legend to 50% withdrawal threshold for vF behavior tests.

      We addressed this by making the requested change in all relevant legends.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) The authors need to show that eIF4E ASO (Figure 2) reduces translation in both inhibitory and excitatory neurons.

      ASOs are not intrinsically cell-type specific, as they do not contain promoters or regulatory elements and act wherever they enter cells and engage RNase H1. However, differences in ASO effects across cell types can arise from variability in uptake, intracellular trafficking, RNase H activity, or target mRNA expression levels.

      In our study, we used eIF4E-ASO as a general approach to demonstrate that eIF4E-dependent translation contributes to SNI-induced hypersensitivity, particularly at the chronic phase. We show a marked reduction in eIF4E levels in the spinal cord of eIF4E-ASO–injected mice compared with controls. We do not claim that the effects of eIF4E-ASO are mediated by a specific cell type; rather, they may involve excitatory neurons, inhibitory neurons, and non-neuronal cells, such as microglia and astrocytes, among others.

      Notably, while eIF4E can promote general translation during development, in adult mice it predominantly regulates cap-dependent translation of specific mRNAs without having a major effect on overall protein synthesis. In our case, the partial reduction in eIF4E is unlikely to substantially affect general translation, as assessed by AHA incorporation, and would instead require TRAP or Ribo-Seq to detect transcript-specific translational changes. We now better explain the rationale for the eIF4E-ASO experiment and clearly state that the effects observed cannot be attributed to a specific cell type.

      In addition, our new results showing that inhibition of eIF4E-dependent translation in PV neurons is not sufficient to alleviate SNI-induced mechanical hypersensitivity suggest that translational changes in other neuronal and/or non-neuronal cell types contribute to hypersensitivity. This important point is now more clearly explained in the revised manuscript, and the role of PV neurons is toned down throughout the paper.

      (2) In Figure 5, it is necessary to show the effect of eIF4E-shRNA in PV+ neurons on neuropathic behaviors (von Frey and MGS).

      To address this important concern, we performed two new experiments, both of which showed that inhibiting the mTORC1–eIF4E axis in parvalbumin neurons is not sufficient to alleviate neuropathic pain. First, we injected PV-Cre mice with AAV-eIF4E-shRNAmir and a scrambled control. We found that downregulating eIF4E in spinal PV neurons has no effect on SNI-induced mechanical hypersensitivity. We used a second, complementary approach to validate this finding. Specifically, we generated transgenic mice in which a non-phosphorylatable form of 4E-BP1 is expressed in PV neurons. Because non-phosphorylatable 4E-BP1 acts as a translational suppressor of eIF4E, this approach is functionally similar to eIF4E deletion.

      Altogether, our findings indicate that cell-type–non-specific suppression of eIF4E using ASOs is sufficient to alleviate neuropathic pain, particularly at the chronic phase. In contrast, while activation of eIF4E-dependent translation in PV neurons (via 4E-BP1 deletion) induces pain hypersensitivity, suppression of eIF4E-dependent translation in PV neurons inhibits SNI-induced decrease in PV neuron excitability but does not alleviate pain hypersensitivity. Thus, increased eIF4E-dependent translation in PV neurons is sufficient to induce pain hypersensitivity, but targeting this pathway in PV neurons alone is not sufficient to reverse neuropathic pain.

      Potential explanations for these findings include: (1) the presence of other important mechanisms in PV neurons (e.g., changes in synaptic transmission) that are translation independent; (2) the insufficiency of correcting reduced PV neuron excitability to alleviate hypersensitivity; and (3) an essential role for mRNA translation in other neuronal and/or non-neuronal cell types in neuropathic pain. We have updated the manuscript to include these potential explanations in the Discussion section.

      Moderate:

      (1) In Figure 2, MGS should be performed at earlier time points as well.

      We performed MGS when von Frey testing, which is less noisy and less labor intensive in our hands, suggested altered phenotypes.

      (2) In Figure 4B, the gene markers are different in Gad2+ and Tac1+ cells. Please show the 12 markers for both cell types.

      We now better explain the selection of the markers.

      (3) In Figure 5, MGS should be performed to test if the effect is limited to mechanical sensation/reactivity or extends to nociception. Additionally, do these mice exhibit altered locomotion and grip strength?

      As described above, we added experiments involving downregulation of eIF4E and expression of a mutant non-phosphorylatable 4E-BP1 in PV neurons. We performed von Frey testing, which showed no effect of suppressing the mTORC1–eIF4E axis on mechanical hypersensitivity under these conditions. Given these negative results, we did not proceed with mouse grimace scale (MGS) analysis.

      (4) In Figure S2E, the reduction of eIF4E does not appear to be specific to GFP+ cells.

      We now replaced the representative images in this Figure.

      (5) Can chronic neuropathic pain be reduced by enhancing 4E-BP1 specifically in PV+ neurons?

      We added the experiment proposed by the reviewer in Fig. 6B. We found that enhancing 4E-BP1 activity, by expressing a non-phosphorylatable form of 4E-BP1 in PV neurons, is not sufficient to alleviate neuropathic pain hypersensitivity.

      (6) Why did the authors not use PainFace for the MGS?

      We began using manual, blinded MGS scoring, as originally described by Mogil and colleagues in 2010 (PMID: 20453868), for this project before PainFace became available around 2019 (e.g., Tuttle and Zylka) and in later versions (e.g., PMID: 39024163). For consistency, we therefore continued using the same approach throughout the experiments.

      (7) In Figures 2A-C, the labeling of the bar graphs seems incorrect: is it 4E-BP1 or eIF4E immunoreactivity?

      Thank you very much for noticing this; we have corrected the mistake.

      (8) In Figure 1, present the data by sex.

      We performed sequencing analyses only in females. This decision was based on the large number of mice and experimental conditions required for both Ribo-Seq (n = 15 mice per replicate, 3 replicates per condition, and 2 time points for SNI/Sham, ~180 mice total) and TRAP (n = 3 mice per replicate, 3 replicates per condition, 2 time points, and 2 genotypes [Tac1 and GAD2] for SNI/Sham), as well as the high cost of sequencing. Behavioral experiments were performed in both sexes. This information is clearly indicated in the Methods section, and we have now also included it in the Limitations section of the paper.

      (9) While the methods state that all behavioral testing was done with equal numbers of male and female mice, it seems that several experiments were done only in females. In the absence of a strong justification, all experiments should be conducted in both sexes.

      As explained above, due to the very large number of mice required for some experiments and the high cost of sample processing and sequencing, only behavioral experiments were performed in both sexes. We now clearly describe the sex of the animals used in each experiment in the figure legends.

      Minor:

      (1) In Figure 3, the legend is confusing and lacks labels.

      We expanded the Fig. 3 legends and added labels, as requested.

      Reviewer #3 (Recommendations for the authors):

      Overall, the manuscript needs to be made clearer and more specific. As it stands, the logic and flow are difficult to follow. Figure legends are not always indicative of the figure and are inconsistent.

      Regarding timelines:

      The logic of the different timelines is not clear. Either explain why different times post-injury were chosen between experiments or keep them consistent. It seems a key message here is that the timing is important. It therefore follows that the authors should be strict about this in their own experiments. Figure 1: 4 and 63 days. Figure 2: Day 3 and weeks 8 and 12. Figure 3: Days 4 and 60. Figure 4: Days 4 and 60. Figure 5: 6 weeks. Figure S1: 4 and 60. Clarifying why these timings were used in each case and showing at the transcript level that these are most appropriate would be needed.

      We thank the reviewer for carefully reviewing our manuscript. We focused on early versus late time points. For the sequencing experiments, we performed Ribo-seq at day 4 for the early time point and day 63 for the late time point, whereas TRAP analyses (and FUNCAT) were performed at day 4 for the early time point and day 60 for the late time point. These differences (day 60 versus day 63) were due to logistical issues related to sample collection. In our view, there are no major biological differences between day 60 and day 63 for the late time points, particularly because we do not perform direct comparisons across different experiments.

      In other experiments, we used several time points (e.g., day 3, as well as 6, 8, and 12 weeks) either to follow the development of phenotypes or based on previous publications regarding the timing of specific effects. We now acknowledge the potential limitation of using slightly different time points in the Limitations section of the paper.

      Regarding the use of inhibitory and excitatory markers:The comparisons they made between subpopulations seem a little random- for one, the number of Tac1 positive cells in the dorsal horn is not equal to that of PV, and so the comparison seems inappropriate.

      The number of cells from each subpopulation should not affect the number of DEGs. Because these analyses were performed on bulk mRNA rather than at the single-cell level, the comparisons are made between SNI and control groups within each subpopulation. Thus, the number of differentially translated genes is determined per cell type, not per individual cell.

      The lack of any semblance of variability or statistics with regard to gene changes makes it difficult to assess whether these comparisons were justified experimentally. Pax2 is a developmentally regulated transcription factor, with reduced levels in the adult. Using Pax2- NeuN+ to label excitatory interneurons is therefore not appropriate for comparison. A more appropriate comparison would be to use vGluT2 and GAD67. Similarly, the use of the GAD2Cre seems a poor choice. This is a restricted population of interneurons that have been suggested to have specific roles in presynaptic inhibition. If the authors were interested in this subpopulation for that reason, then they should state so.

      Pax2 is commonly used as a marker of inhibitory neurons in the spinal cord (e.g. PMID: 36323322) as in the adult dorsal horn, Pax2 protein remains expressed in nearly all inhibitory neurons, including both GABAergic (GAD65/67<sup>+</sup>) and glycinergic (GlyT2<sup>+</sup>) neurons. VGluT2 marks terminals of IB4-binding peripheral sensory neurons as well as those of spinal cord excitatory interneurons in lamina II of the dorsal horn, complicating the analyses. We attempted using Lmx1b for excitatory neurons (Pax2 for inhibitory and Lmx1b for excitatory) but could not obtain specific and robust signal using different commercial antibodies (we have no access to non-commercial Pax2 antibody).

      Regarding Cre lines, Gad2-Cre has been extensively used to target GABAergic neurons in the spinal cord. Although it is not expressed in purely glycinergic neurons, it is expressed in GABAergic and mixed GABA/glycine interneurons. Gad2-Cre is more restricted to superficial dorsal laminae I–III, which are relevant to pain processing, versus Gad1-Cre, which may also capture low-level GABAergic neurons in deep laminae and ventral horn inhibitory neurons. Moreover, there are also differences in the developmental profile, whereas Gad1-Cre is expressed earlier at embryonic stages during inhibitory neuron development, GAD2 is expressed later, in post-mitotic and mature inhibitory neurons. Because of these considerations (higher specificity to dorsal horn and later developmental expression), we used Gad2-Cre mouse line in our experiments.

      Regarding cKO experiments:

      It is unclear whether the deletion of Eif4ebp (which is not "ablation" as stated in the manuscript) has had any effect on the PV/GAD2 cells themselves seeing as this deletion would be a lineage deletion. One would imagine that altering transcription in such a population from early development would affect a host of neuronal and circuit properties, such as connectivity, dendritic branching, etc. The authors should show that the circuit properties were not broadly changed, not least as PV is expressed throughout the nervous system and in muscles. This could in itself explain the hypersensitivity described in their results. Experimenters should repeat the AAV shRNAmir experiments in non-injured animals, and not just control animals with the scrambled sh.

      We agree with the concerns related to potential developmental effects. Although it is nearly impossible to reliably and comprehensively demonstrate that circuit properties were not altered in our cKO mice, our manuscript presents several lines of evidence supporting a role for translational control in specific cell types in the regulation of gene expression and nociception independent of developmental effects. First, our translational gene expression analyses were performed in adult WT mice and reflect SNI-induced changes in gene expression at the translational level, assessed using complementary approaches. In addition, the effects of eIF4E ASO delivered to adult animals support a role for translational control in the regulation of SNI-induced pain hypersensitivity at later stages.

      Moreover, downregulation of eIF4E in PV neurons using an AAV-based approach in adult mice affects their SNI-induced excitability, further supporting a role for translational mechanisms in regulating PV neuron plasticity after peripheral nerve injury in adulthood. To acknowledge the potential developmental effects associated with 4E-BP1 deletion using Tac1-Cre, Gad2-Cre, and PV-Cre mouse lines (with PV-Cre beginning expression postnatally), we have included an explicit limitation statement in the Discussion of the revised manuscript.

      We also thank the reviewer for highlighting the distinction between deletion and ablation, and we have corrected this terminology in the revised manuscript.

      Regarding pain:

      A large sticking point within the study is the lack of clarity of the populations they are targeting. Many of the populations mentioned are not expressed solely in the dorsal somatosensory horn and instead are also expressed in the ventral motor horn. This is particularly important with regard to the sensory tests they are performing, which rely on reflex responses. It seems these results, although interesting, are not proof of a pain effect, but rather showing changes in vfh-behaviour. To show this is a pain-specific event, and not just correlative or reflexive, the authors should perform further behavioural tests beyond vfh, Hargreaves, and the grimace scale, such as low threshold touch, rotarod, etc. How much of this effect is due to changes in reflex excitability? Would the authors expect similar results for all neuropathic models but not for chronic inflammatory states for example? Western Blot analysis at the moment is for the whole cord, which could imply changes in the ventral or intermediate horn, it could help strengthen the study to show that these changes are selective to the dorsal cord.

      We have now added a new experiment showing that eIF4E-ASO has no effect on motor function in the rotarod and open field tests (Fig. 2J, K). In addition, the eIF4E-ASO experiment included in the original submission reflects supraspinal behavior, as assessed by MGS. Overall, our study includes numerous experiments and datasets. While we agree with some of the reviewer’s concerns, the extensive additional work requested, including additional neuropathic and inflammatory pain models, further assays of supraspinal behavior, Western blot analyses restricted to the dorsal horn, additional Cre lines and markers, and other analyses, is not feasible within the scope of the current manuscript.

      Notably, in the revised manuscript, we have added new experiments (Fig. 2J, 2K, 6A, 6B) that we believe address the most critical concerns raised by the reviewers, and we have revised the text to more clearly acknowledge the limitations of the study.

      Regarding patch clamp studies:

      An increase in rheobase alone in the PV cells would not in itself account for the changes seen in behaviour, seeing as the authors are suggesting this is a selective effect for von Frey and not radiant heat, for example. The authors should therefore show a change in mechanically-evoked firing of PV/GAD2 cells either by dorsal root stimulation in slice, or by cfos or equivalent marker of activation following sensory stimulation. The title of this figure is also misleading- it is not clear how there is any proof of promotion of plasticity in the experiments shown.

      In the original submission, in addition to an increase in rheobase, we also demonstrated decreased spiking activity in response to a range of stimulating currents (Fig. 4). We agree that assessing mechanically evoked responses of PV neurons would be informative; however, such studies are beyond the scope of the current manuscript.

      To address the final concern, we modified the title of Fig. 5 and the related text. Moreover, the newly added data showing that inhibition of translation in PV neurons does not alleviate SNIinduced hypersensitivity prompted us to tone down, throughout the manuscript, the link between translational changes in PV neurons and pain hypersensitivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful and constructive comments on our manuscript. We have carefully addressed all points raised, and believe the manuscript is substantially improved as a result. In particular, we have performed:

      - Comprehensive spatial analysis of stable mutants. Following Recommendations for the authors comment #1, we performed spatial analysis by binning the anterior-posterior axis into 200 µm strata. This analysis validates our initial conclusions and reveals striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants.

      - Substantially enhanced the statistical rigour of the screen analysis. We have implemented stratified Kolmogorov-Smirnov tests (within-experiment testing, then combined via Fisher's method) alongside linear mixed models to control for batch effects. In the revised manuscript, we now focus on three hypertrophy genes – foxp1b, txnipa and mmp14b – which are robustly validated by both methods.

      - Normalisation of adipose area to body size. To address concerns about developmental delay (Recommendations for the authors #2), we now normalise adipose area to standard length. With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity (updated from our original analysis), while the hypertrophic LD morphology remains highly significant - demonstrating the phenotype is independent of body size and not a developmental delay.

      - Revised title. As suggested by Recommendations for the authors comment #6, we have changed the title to: "A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish"

      - Extensive code and analysis availability. We now provide all code and extensive analysis pipelines in interactive HTML documents at https://github.com/jeminchin/zebrafish_adipose_morphology_screen

      Joint Public Review:

      We thank the reviewers for their thoughtful assessment of our work and their recognition of the rigorous experimental design, statistical approaches, and the utility of both the identified genes and screening pipeline for the field. We address their concerns below.

      Weakness:

      Distinguishing developmental patterning from adipose tissue plasticity

      We appreciate this important distinction and agree that separating developmental from adaptive effects is a key challenge in the field. We would like to make several points in response:

      First, we acknowledge this limitation in our discussion and have now expanded this section to more explicitly address the interpretive boundaries of our approach. Our screening platform was intentionally designed to capture the outcome of genetic perturbation across development and early adaptation, as these processes are inherently intertwined during the establishment of adipose tissue.

      Second, regarding the suggested analysis of lipid droplet size along the AP axis in response to HFD: we have now performed this analysis and include it as new Fig. 6 and new Supplemental Fig. 8 & 9. These data validate our initial conclusions and reveal striking spatiotemporal dynamics, including profoundly blunted HFD responses in foxp1b mutants (68% reduction) and loss of spatial gradients in foxp1a mutants. Further, these data provide additional resolution on regional responses to dietary challenge.

      Third, we note that our stable mutant validation experiments (Figure 6) do begin to disentangle these effects by examining both baseline and HFD-challenged conditions in animals with constitutive genetic loss. However, we agree that definitive separation would require temporally controlled genetic manipulation, which we now acknowledge as an important future direction.

      Lack of tissue-specific manipulations

      We agree that tissue-specific approaches would strengthen mechanistic conclusions and have acknowledged this limitation in our revised discussion. The current study was designed as a discovery-focused screen to identify candidate regulators, with the understanding that mechanistic dissection would require follow-up studies employing tissue-specific tools.

      We note that adipocyte-specific Cre/lox or Gal4-UAS approaches in zebrafish are feasible and represent an important next phase of investigation for the most promising candidates identified here, rather than a requirement for the current screening study. We have added text explicitly framing our findings as establishing genetic associations that warrant future tissue-autonomous investigation.

      Recommendations for the authors: 

      (1) Analysis: In Figure 6, the authors state that foxp1b mutants "fail to undergo further hypertrophic remodeling in response to a high-fat diet (HFD)." Foxp1b mutant juveniles are already hypertrophic before the high-fat diet. After a high-fat diet, these mutants reach mean lipid droplet diameters similar to WT, approximately 65 µm, which the authors state earlier in the manuscript are "a potential upper limit of LD growth at this developmental stage." The authors should perform additional analysis of their existing data. Specifically, determine lipid droplet size by binning the AP axis as shown in Figure 3. The rationale is that lipid droplet size differences in response to HFD may be more evident when not considering the anterior populations of lipid droplets that have already reached maximum steady state size for this juvenile stage. This would not require any new experiments, just reanalyzing data similar to how they did in Figure 3.

      We thank the reviewer for this excellent suggestion. We have performed the requested spatial analysis by binning the AP axis into 200 µm strata (Figure 3 approach). These data can be found in new Fig. 6H-M, and new Supplemental Figs 8 & 9. This new analysis verifies our initial conclusions, and also reveals several very interesting spatiotemporal dynamics

      (i) Baseline hypertrophy in foxp1b mutants across AP strata

      In support of our initial conclusion that foxp1b mutants have larger LDs at baseline, the spatial analysis confirms that on a control diet (baseline), foxp1b mutants have significantly larger LDs than WT across strata 1-5 (new Fig. 6I), ranging from +22.2 µm larger in strata 1 to +17.8 µm larger in strata 5 (all FDR-adjusted p < 0.05, linear mixed effects model). Extended analysis across all 15 strata is shown in Supplemental Figs. 8 & 9. By contrast, and also in support of our initial conclusion, foxp1a mutants showed no baseline hypertrophy on control diet (all strata p > 0.10, Supplemental Fig. 8).

      (ii) foxp1b mutants show a profoundly blunted hypertrophic response to HFD

      Using paired analysis (same fish on both control diet and after 14 days of high-fat diet) with a linear mixed effects model, we quantified the effect of HFD across all strata:

      (A) Anterior/oldest strata (1-6): WT + HFD increases LD diameter by +25.1-28.1 µm (+52-58%, p < 0.0001). Whereas, foxp1b mutants + HFD only increase LD diameter by +7.5-11.7 µm (+12-19%, p < 0.003). Therefore, in the oldest/most anterior regions, containing the largest LDs, the hypertrophic response of foxp1b mutants to HFD is ~57% weaker than WTs.

      (B) Posterior/newer strata (7-15): WT + HFD undergo significant increases in LD diameter of +17.7-23.7 µm (p < 0.024). However, in foxp1b mutants there is no significant hypertrophic response at all (p > 0.068), and hypertrophic effect sizes decline from +6.8 µm (stratum 7) to +0.4 µm (stratum 15).

      (C) Overall effect: Averaged across all strata, WT + HFD LDs show +24.4 µm increase (p < 0.0001), whereas foxp1b mutant LDs only show a +7.7 µm increase with HFD (p = 0.020). Therefore, foxp1b mutants show a 68% reduction in hypertrophic growth in response to HFD compared to WT (Fig. 6K).

      The consequence of these spatial dynamics is that WT SAT LDs - which start 22 µm smaller than foxp1b mutants on a control diet - undergo massive hypertrophy across all regions/strata in response to a HFD. Meanwhile, foxp1b mutants - starting larger than in WTs - show only a modest, spatially restricted response. This results in a convergence in LD size in early/anterior strata, but WT LDs actually surpass foxp1b mutant sizes in late/posterior strata (strata 14-15: +WT 14.7 µm larger on HFD, p = 0.028; Supplemental Figs. 8 & 9).

      By contrast, foxp1a mutants retain the capacity for HFD-induced hypertrophy but show a ~35% weaker response than WT (p = 0.023) – significantly less severe than the 68% reduction in foxp1b mutants. Interestingly, foxp1a mutants after HFD show a reduction in the AP gradation of LD size observed in WT and foxp1b mutants (uniform +14.4 mm across all strata versus WT range of +26.4 mm anteriorly to +16.6 mm posteriorly), suggesting that foxp1a may regulate spatial heterogeneity in adaptive responses to HFD (Fig. 6L-M).

      (iii) Developmental ceiling or impaired adaptive capacity?

      The reviewer raises an important question about whether anterior adipose LDs have reached a "developmental ceiling." After conducting the spatial analysis suggested by the Reviewer, we now believe several lines of evidence support an intrinsic defect in HFD-induced hypertrophy in foxp1b mutants, rather than reaching a developmentally determined limit:

      First, foxp1b mutants show reduced responses across ALL strata, not just anterior regions. The attenuation extends throughout the entire AP axis (57% reduction in strata 1-6, complete loss of response in strata 7-15). If anterior adipocytes had simply reached a size ceiling, we would expect normal responses in posterior regions where cells are smaller - but we don't observe this.

      Second, in posterior/newer regions of SAT (strata 14-15) the hypertrophic response to HFD in foxp1b is so limited that WT LDs actually become larger than foxp1b mutant LDs (+14.7 mm larger, p = 0.028; Supplemental Fig. 9). This demonstrates that these LD sizes are not developmentally limiting and argues for intrinsic hypertrophic defects in response to HFD.

      Third, foxp1a mutants provide an important control. These mutants show no baseline hypertrophy (all strata p > 0.10) yet still exhibit blunted hypertrophic responses to HFD (~35% reduction, p = 0.023), proving that reduced HFD responses can occur independently of baseline hypertrophy.

      We have updated the Results and Discussion to reflect these new conclusions. Methods have been updated to include the spatial analysis approach.

      (2) Adipose morphogenesis in WT is a function of standard length, as shown by the authors. At juvenile stages, foxp1 mutants are both smaller and have reduced adipocyte coverage, while adults show normal body length and very subtle adipose phenotypes. Can the authors demonstrate that the observed defects in foxp1 mutant juveniles are bona fide phenotypes rather than a developmental delay?

      We thank the reviewer for this key point. We agree it is critical to distinguish true foxp1b-dependent phenotypes from potential developmental delay. Importantly, our data strongly argue against a simple developmental delay. We show that LD size scales with body size in Fig. 3G, with smaller zebrafish having smaller LDs and larger zebrafish having larger LDs. In contrast to a developmental delay, our data show that foxp1b single and foxp1a;foxp1b double mutants are smaller (reduced standard length) but have larger LDs (Fig. 6E,G). This dissociation between body size and LD size is the opposite of what would be expected from developmental delay.

      To account for the body size difference, we have now normalised adipose area to standard length (Fig. 6F). With this normalisation, foxp1b single mutants show only a non-significant trend toward decreased adiposity, whereas foxp1a;foxp1b double mutants remain significantly reduced. This represents a change from our original analysis and we have updated the text accordingly. Critically, despite normalised adipose area showing only a trend in foxp1b singles, the hypertrophic LD morphology remains highly significant (Fig. 6G), demonstrating that the morphological phenotype is robust and independent of overall body size.

      We have clarified this interpretation in the Results and Discussion.

      (3) What was the rationale for selecting one amongst paralogous genes for the screen? For example, why did the authors choose ptenb rather than ptena?

      (4) Point 3 is particularly relevant for the final six genes that resulted in adipose phenotypes. Why did the authors choose not to target both paralogs, given that multi-plexed F0 CRISPR targeting is feasible in zebrafish (PMID: 29974860).

      We answer Points 3 & 4 together here.

      We used the DIOPT (DRSC Integrative Ortholog Prediction Tool) orthology tool to identify the zebrafish paralogue with the highest orthology score to each human gene. This tool integrates predictions from 20 orthology databases to generate a composite score. We selected the paralogue with the highest DIOPT score for each gene. For example, we selected ptenb over ptena because it showed a higher predicted orthology to human PTEN.

      We acknowledge this approach has important limitations, including orthology scores not necessarily predicting functional equivalence (ie, the "most orthologous" paralogue may not be the one with the most relevant adipose tissue function in zebrafish). We acknowledge that this may mean we have missed genuine hits - testing only one paralogue means we could fail to identify genes where the "less orthologous" paralogue has the relevant adipose function.

      Our findings with Foxp1 paralogues both validate this approach and reveal its limitations. The higher-scoring paralogue foxp1b (DIOPT score = 13/19) showed the more severe phenotype, validating our prioritisation. However, the lower-scoring paralogue foxp1a (DIOPT score = 5/19), which we tested subsequently, showed a distinct but significant phenotype (altered spatial patterning) – a finding that would have been missed had we not pursued secondary validation.

      For future screens where comprehensive hit identification is the goal, multiplexed targeting of all paralogues would be valuable, though this may complicate interpretation of paralogue-specific phenotypes. We have discussed this in the Discussion.

      (5) General framework and limitations: The analysis platform presented in the manuscript cannot separate the developmental effects from adipose tissue plasticity/remodeling. Potential approaches that may help address this concern include: (a) establishing a baseline model to illustrate how WT fish respond to high-fat diet (HFD); (b) showing how mutants with hyperplasticity (opposite effects of foxp1 mutants) respond to HFD; (c) examining whether foxp1 gene expression level changes in response to HFD. However, these approaches (especially a and b) would require extensive experimental work and may be beyond the scope of this study. Without further evidence or data support of adipose tissue plasticity and remodeling, the author may want to emphasize in the background and discussion sections how adipose tissue development may affect plasticity and adaptation, and soften the tone of how genes may directly regulate adipose tissue plasticity and adaptation.

      We thank the reviewer for this comment about the relationship between adipose development and plasticity/remodelling. We agree this is an important issue as we are looking in juvenile fish that are still growing. Therefore, when we feed them HFD and see LDs get bigger – is this diet-induced remodelling or just accelerated normal development (ie, growth that would happen anyway, but occurring faster due to more nutrients)?

      To address the reviewer's specific suggestions:

      (A) Baseline model of WT HFD response: We have now performed detailed spatial analysis of WT responses to HFD (new Fig 6H-M, Supplemental Figs. 8 & 9). This analysis establishes a comprehensive baseline for hypertrophic responses to HFD in developing adipose tissue. In summary, WT fish show robust, statistically significant and spatially-graded hypertrophic responses to HFD across the entire AP axis, with responses ranging from +28.1 mm anteriorly to +17.7 mm posteriorly.

      We agree with the Reviewer that separating developmental from adaptive processes in growing juvenile fish is challenging. Importantly, we believe foxp1a mutants provide compelling genetic evidence that we are studying adaptive responses rather than purely developmental processes. foxp1a mutants have normal baseline LD sizes on control diet (demonstrating foxp1a is not required for developmental adipose expansion), yet when challenged with HFD show significantly reduced hypertrophic expansion and reduction of spatial gradient. This genetic dissociation strongly argues we are observing adaptive capacity rather than developmental growth rate.

      (B) Hyperplastic mutants:

      We agree that analysis of hyperplastic mutants would provide valuable complementary information about tissue remodelling capacity. However, as the reviewer anticipated, this would require: (1) generating stable lines of the appropriate hyperplastic mutants, (2) conducting paired HFD feeding studies, (3) performing spatial morphometric analysis comparable to our foxp1 studies, and (4) potentially distinguishing hyperplastic vs hypertrophic contributions to expansion. We agree this constitutes substantial additional experimental work beyond the scope of the current manuscript, though it represents an important direction for future studies.

      (C) foxp1 expression changes in HFD:

      Unfortunately, we do not have SAT samples from HFD-treated fish preserved for RNA analysis, and therefore cannot assess whether foxp1 expression levels change in response to dietary challenge. This would be valuable for future studies to determine whether foxp1 genes are dynamically regulated during metabolic adaptation or function as constitutive regulators of adaptive capacity.

      Following the Reviewer's guidance, we have revised throughout the manuscript to more carefully distinguish developmental patterning from metabolic adaptation.

      (6) Title: In the absence of experimental results that can distinguish between developmental effects from adipose tissue plasticity/remodeling, such as those mentioned above, the manuscript title is not accurate and should therefore be revised to be something like "hyperplastic and hypertrophic adipose morphology."

      We have now altered the title as the Reviewer suggested to “A quantitative in vivo CRISPR-imaging platform identifies regulators of hyperplastic and hypertrophic adipose morphology in zebrafish”

      Minor:

      (7) In mice studies, deleting foxp1b in adipose tissue protects mice from diet-induced obesity, while overexpressing foxp1b in adipose tissue promotes diet-induced obesity (Liu et al., Nature Communication, 2019). These overall phenotypes and foxp1b-mediated effects appear to be contradictory to what is observed in the zebrafish model. Can the authors also provide more evidence/discussion on why such a difference occurs comparing zebrafish and mice models?

      We thank the reviewer for this important comparison. We believe the apparent contradictions reflect (1) differences in adipose tissue thermogenic capacity - between species possibly, but also between functionally distinct depots and (2) whole-organism versus tissue-specific experimental approaches.

      (1) Different adipose tissue biology: browning-prone vs browning-resistant adipose

      Liu et al. (2019, PMID: 31699980) demonstrated that adipose-specific deletion of Foxp1 in mice increases thermogenesis and browning of SAT, with protection from diet-induced obesity (DIO) and improved insulin sensitivity. Conversely, Foxp1 overexpression impaired adaptive thermogenesis and promoted DIO. Mechanistically, Foxp1 directly represses β3-adrenergic receptor transcription, thereby inhibiting the thermogenic program. Strikingly, mouse Foxp1-deleted adipocytes displayed smaller, multilocular lipid droplets characteristic of brown/beige adipocytes.

      These morphological outcomes initially appear opposite to our zebrafish findings: mouse Foxp1 mutants have smaller adipocytes (due to browning), while zebrafish foxp1b mutants have larger lipid droplets (hypertrophy). We believe this fundamental difference may reflect the propensity of adipose tissue to undergo adaptive thermogenesis.

      While it was recently discovered that zebrafish possess thermogenic epicardial adipose tissue (PMID: 38507414), in general zebrafish adipose is not considered thermogenic, and zebrafish as ectotherms are thought to lack adaptive thermogenesis for thermoregulation. The exact thermogenic potential of zebrafish adipose remains to be fully characterised, but potential differences in thermogenic capacity between mouse and zebrafish adipose may help explain the distinct phenotypic outcomes.

      Importantly, Liu et al. studied mouse inguinal subcutaneous WAT - the depot most prone to browning in rodents. It remains unclear what role Foxp1 plays in browning-resistant mammalian WAT depots, where thermogenic conversion does not readily occur. In such depots, Foxp1 loss might produce phenotypes more similar to our zebrafish findings - dysregulated white adipose function without browning.

      The above hypothesis suggest that browning responses may mask other roles for Foxp1 in WAT. Interestingly, although not quantified in the paper, Liu et al.’s Foxp1 overexpression model (Ap2-Foxp1) appeared to reduce adipocyte size despite suppressing Ucp1 expression and reducing lipolysis. These data suggest more complex roles and indicate that Foxp1’s control of adipocyte size might extend beyond simply regulating thermogenesis and may involve coordinating the balance between hyperplastic versus hypertrophic expansion.

      Furthermore, human subcutaneous WAT is not as prone to browning as mouse inguinal WAT. Human browning occurs primarily in specialised depots (e.g. supraclavicular, deep neck), while the majority of human adipose tissue represents constitutive white adipose with limited thermogenic capacity. Therefore, it remains an open question whether FOXP1's primary physiological role in humans relates to thermogenesis regulation (in specialised depots) or white adipose metabolic control (in the majority of adipose tissue). Zebrafish findings examining constitutive WAT function (admittedly the lack of adaptive thermogenesis in zebrafish is presumed at this stage) may be more relevant to human adipose than initially appear.

      (2) Whole-organism vs tissue-specific effects on metabolic health

      A second apparent contradiction concerns metabolic outcomes: mouse adipose-specific Foxp1 deletion improves metabolic health (Liu et al.), whereas our zebrafish whole-organism foxp1b mutants display metabolic dysfunction (baseline hypertrophy, impaired HFD response, hyperglycaemia and fatty liver). We believe this discrepancy reflects comparison of whole-animal mutants (zebrafish) to tissue-specific deletions (mouse), rather than opposite adipose tissue functions.

      Critically, Foxp1 has established roles in hepatic glucose metabolism. Zou et al. (PMID: 26504089) demonstrated that hepatic Foxp1 inhibits expression of gluconeogenesis genes and decreases hepatic glucose production and fasting blood glucose by competing with Foxo1 for binding of insulin responsive gluconeogenic genes. In line with these observations, we observe fatty liver and hyperglycaemia in foxp1a;foxp1b double mutant zebrafish (data not shown), suggesting that the metabolic dysfunction in our whole-animal mutants may be driven primarily by hepatic Foxp1 loss rather than adipose-specific effects.

      We have expanded on the points raised here in the Discussion.

      (8) Line 522-524: "The major phenotype in foxp1a mutants was impaired adipose expansion following HFD, suggesting failure to respond to diet-induced stress signals". In the presented Figure 6j, foxp1a mutant expands adipose LD size following HFD, similar to the control, which is contradictory to the statement above. Please clarify.

      We thank the reviewer for highlighting this apparent inconsistency and apologise for imprecise wording. These measurements are actually consistent but refer to different scales of analysis.

      Tissue level (Supplementary Fig. 7): foxp1a mutants show significantly reduced total adipose expansion (based on whole-animal Nile Red images) compared to wild-type fish on HFD—this is what we refer to as "impaired adipose expansion."

      Cellular level (Fig. 6L-M): At the individual adipocyte level, foxp1a mutants show statistically significant increases in LD diameter following HFD. However, the magnitude is reduced by ~35% compared to wild-type (mutants: +14.4 µm; WT: +22.2 µm; p = 0.023).

      We have revised the text to more precisely state "reduced adipose expansion" rather than "impaired expansion" to avoid implying complete failure to respond.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug synergy without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on the OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

      We appreciate your positive remarks on the use of NetBox, GSEA, and human curation for predicting anti-resistance effects of second drugs. Regarding the weaknesses you identified:

      Mechanistic Insight: We agree that our current work interprets findings using prior published knowledge and does not attempt to infer detailed mechanisms of drug resistance of the nominated drug combinations. Our primary goal with this study was to establish a robust, unbiased proteomic and computational pipeline for proposing anti-resistance drug combinations, rather than to fully characterize the downstream molecular effects for each combination or to prove causation. To get closer to mechanistic insight, meaning detailed hypotheses of causative interactions, one would need to investigate anti-resistance effects in other pre-clinical materials as a crucial next step for the most promising combinations identified. This was out of scope for us. We assume the proposed combinations are useful for focussed follow-up in the community.

      Discovery Phase on a Single Cell Line: Our discovery phase was focused solely on the OVSAHO cell line due to its resemblance to surgical ovarian cancer samples. Including additional cell lines in the initial proteomic-response discovery phase plausibly would have enhanced the generalizability. But this was not done due to resource constraints. However, we did perform more extensive validation of the effect of drug combinations on proliferation in several cell lines to explore broader applicability.

      2D Culture Limitations: We are fully aware of the limitations of 2D cell culture models, especially in the context of ovarian cancer, where in clinical reality interactions with the microenvironment and other effects can have significant roles in therapeutic resistance. Adn we recognize that in lab experiments 2D culture does not fully recapitulate the complexities of 3D tumors, PDX models, or primary patient tumors. We have added citations to the relevant literature (including the reference you provided), and have emphasized in the Discussion that our findings serve as a strong foundation for future experimental tests (validation) in more physiologically relevant experimental model systems.

      Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data were then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominantly with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitive sense.

      Weaknesses:

      Considering the available resources of the involved teams, performing the initial analysis in a single HGSC cell is certainly a weakness/limitation.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly), the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable the response was in the different HGSC cell lines used for the combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript.

      Thank you for your summary and positive comments. Regarding the weaknesses you identified:

      Initial Analysis in a Single Cell Line: We concur with your assessment that performing the initial analysis in a single HGSC cell line (OVSAHO) is a limitation. As mentioned in our response to Reviewer #1, resource limitations caused this decision, and we acknowledge that a broader initial screen would have strengthened generalizability. We added this limitation in the discussion section, emphasizing use of diverse cell lines in the initial protein response profiling as an area for future work.

      Challenges in Predicting Drug Combinations and Variability: We thank the observation regarding the challenges in predicting the effect of drug combinations and the variability of antiproliferative effects observed in different HGSC cell lines (Table 2). As with any predictive method, our computational-experimental pipeline is not guaranteed to identify with absolute certainty additive or synergistic interactions, but generates data-informed hypotheses to be considered in the presence of other available observations. We now emphasize in the Discussion that while our computational pipeline provides plausible anti-resistance candidates, the precise results (extent of additivity or synergy) differ in different cell lines. This underscores that experimental validation across diverse physiological models, such as PDXs or organoids (not just additional cell lines) is an essential criterion of validity of the generated hypotheses. And we underscore the (obvious) challenge of the ultimate translation of pre-clinical experiments to therapeutic effects in humans.

      In revision, we have clarified in detail the expectation of predicted synergy implied by the reviewer’s comment, “the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect”. This reflects a misunderstanding of our goals. The predictions are for drug effects that are anti-resistant, such that the proteomic response to one drug is counteracted by the second drug. The predicted effect is not synergy. Indeed, useful anti-resistance effect does not require synergy - additivity is sufficient: if cells are resistant to the original drug, the second drug plausibly still has antiproliferative effect, as it targets the cellular processes that are increased in activity (upregulated) in response to the first drug. So we deleted the red synergy color in Table 2 to avoid the potential conclusion from our results that without synergy, there is no benefit to a drug combination. In fact, additive drug combination effects are in themselves beneficial. For clarity on this point, added coloring in Table 2 to highlight the small number of combinations that did not work well in that the combination was clearly antagonistic, using a combination index CI >= 2.0 cutoff; we clarify this point in the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2b. This figure would be more impactful if presented as an upset plot with the same Venn diagram embedded. I am not sure Figure 2C accurately supports the statement : "Frequently affected proteins generally had expression level changes in the same direction across all drug perturbations (Figure 2c), indicating a potential general stress response. ". It would be beneficial if the authors could present the data in a way that shows the number of genes with similar directional groupings. Likewise, the color scheme for this figure is hard to interpret as grey is the most negative value and values are preselected for absolute fold-change. Please consider colors with a stronger contrast.

      Authors should consider uploading MS files to the PRIDE or MASSIVE repository.

      We have addressed these very useful suggestions. We have edited Figure 2b to include the requested upset plot. It serves to illustrate the intersection of proteins responding to different perturbation conditions; due to figure space constraints, we limit the figure to entries with counts of at least 15. We have added the number of proteins with consistent directional changes in the figure 2c caption and the text.

      For Figure 2c, we have edited the color bar legend to better reflect the colors that appear in the heatmap.

      We have added our mass-spectrometry drug-response dataset to the ProteomeXchange Consortium via PRIDE with accession number PXD066316.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al., a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy. The work is well presented, and the results are thorough and convincing.

      We are grateful to this reviewer for his/her thoughtful assessment and supportive feedback. In response, we have addressed each comment and incorporated the necessary revisions into the manuscript.

      Weaknesses:

      I will comment mostly on the MD results due to my expertise.

      The Methods description is quite precise, but is missing some important details:

      (1) Version of GROMACS used.

      We used GROMACS version 2023.2 (single-precision). All subsequent MD simulation procedures mentioned below have been consolidated and described in detail in the Supporting Information (SI).

      (2) The barostat used.

      Pressure coupling was applied using the C-rescale barostat (τ<sub>p</sub> = 5.0 ps, ref<sub>p</sub> = 1.0 bar).

      (3) pH at which the system is simulated.

      No explicit pH was defined during system construction. Proteins were modeled using standard protonation states as assigned by GROMACS preprocessing tools, corresponding to physiological, near-neutral pH (~ 7.0).

      (4) The pulling is quite fast (but maybe it is not a problem)

      The relatively high pulling velocity (1 nm/ns) was selected to enable efficient screening across a large number of designed proteins (211 candidates), while maintaining reasonable computational cost/time. Given the intrinsic orders-of-magnitude difference between simulation and experimental pulling rates, SMD results were used as a comparative screening tool, rather than for direct quantitative comparison with AFM data.

      (5) What was the value for the harmonic restraint potential? 1000 is mentioned for the pulling potential, but it is not clear if the same value is used for the restraint, too, during pulling.

      All positional restraints used in the simulations, including those applied during equilibration as well as the harmonic restraint on the N-terminus and the pulling umbrella restraint during SMD, employed the same force constant (k = 1000 kJ·mol<sup>–1</sup>·nm<sup>2</sup>). We have clarified this point in the revised Methods section.

      (6) The box dimensions.

      Rectangular simulation boxes were used throughout. For equilibrium MD simulations, the box dimensions in each direction were set based on the maximum extent of the protein along that axis, with a minimum distance of 1.2 nm between the protein surface and the box boundary on all sides. For SMD simulations, the same box dimensions were applied in the x and y directions. Along the pulling (z) direction, the box length was extended to accommodate the theoretical stretching length, defined as the initial N–C terminal distance plus 0.36 nm per stretched residue, while maintaining a 1.2 nm buffer at both ends (2.4 nm total). These details have now been clarified in the revised Supporting Information.

      From this last point, a possible criticism arises: Do the unfolded proteins really still stay far enough away from themselves to not influence the result?

      We analyzed the minimum atomic distance between each protein and its periodic images to assess potential artifacts from periodic boundary conditions. For all simulation stages used in screening and statistical analysis, the minimum protein–image separation remained above 1.0 nm for the majority of the simulation time, exceeding the nonbonded interaction cutoff and minimizing cross-boundary interactions. As shown in the Author response image 1for SpecAI89 (left), this separation during SMD simulations is consistently well above the threshold, indicating that the chosen box dimensions are appropriate. In the very late stages of annealing MD, highly unstable proteins may exhibit large conformational fluctuations and transient boundary proximity (right); however, these regimes are associated with large RMSD deviations and are excluded from analysis. Notably, the mechanically relevant unfolding events occur near the center of the simulation box and proceed along the pulling axis in SMD simulations, making boundary effects unlikely to influence the unfolding process or the relative mechanostability ranking.

      Author response image 1.

      Analysis of the minimum atomic distance between the protein and its periodic images under periodic boundary conditions. Left: SpecAI89 during SMD simulations, showing that the minimum protein–image distance remains above 1.0 nm for the majority of the simulation time. Right: WT during AMD simulations, where transient proximity to the periodic boundary is observed at very late stages due to large conformational fluctuations.

      Additionally, no time series are shown for the equilibration phases (e.g., RMSD evolution over time), which would empower the reader to judge the equilibration of the system before either steered MD or annealing MD is performed.

      We thank the reviewer for this suggestion. To assess equilibration, we analyzed the backbone RMSD evolution during the equilibration phase. Using SpecAI89 as a representative example (Author response image 2), the protein backbone RMSD converges rapidly and reaches a stable plateau within approximately 5 ps. The subsequent 125 ps equilibration period therefore sufficiently demonstrates that the system is well equilibrated prior to both steered MD and annealing MD simulations.

      Author response image 2.

      The backbone RMSD of SpecAI89 over time during simulation

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S2, only one copy (or the average of the three copies; it is not clear from the caption) is shown, would be better to show the individual traces for each repeat. Additionally, only the plot for the forces is shown, and not, similarly to the AMD, the RMSD plot. This could be a stylistic choice, but it just reports on how much force was applied and not on how the protein responded to the force. Moreover, horizontal lines at the maximum value reached by the force could be added in order to directly see the difference in force applied, since it is then remarked on.

      Figure S2 originally shows a representative single SMD trajectory, as the force–extension peak positions vary between independent simulations and averaging the force traces would obscure the characteristic force peaks. In the revised Supplementary Information, we have now added the force–extension traces from the other two independent SMD repeats for each construct (New Figure S2). In addition, horizontal lines indicating the maximum force reached in each trajectory have been included to facilitate direct comparison of force differences between designs.

      (2) In Figure S3 the plots have different y-axis. Maybe it could be valuable to modify it so that in figures b, c, and d the spectrum result is in the background (perhaps in gray) so that the y-axis is not changed to retain the information included in this plot, but one could still compare directly to the spectrum result. With a 0 to 1 nm y-axis part of the spectrin run will be hidden, but in any case, plot a can be used to see the full behavior. Similarly to S2, the repeats (if any) could be shown.

      We have revised Figure S3 as suggested. The y-axis is now unified to 0–1.2 nm across all panels. For panels b–d, the natural spectrin trajectory is displayed in light gray in the background for direct comparison. Additionally, three independent MD replicates are now presented for each construct to demonstrate reproducibility.

      Finally, minor remarks that could nevertheless improve the paper:

      (3) In Figure S7, a bimodal distribution model for the number of events could be used to fit the data better.

      We thank the reviewer for the detailed suggestion. Following this advice, we explored the bimodal Gaussian distribution model for fitting the force-event data in Figure S7. Indeed, our analysis showed that a bimodal fit could fit Figures S7 panel f better (as shown in Author response image 3). The two peaks were centered at F<sub>1</sub> = 190 ± 4 pN and F<sub>2</sub> = 380 ± 6 pN. Interestingly, the force of the first major peak obtained is the same as the previously fitted value. The second one is double force value which we guess maybe is a bi-molecule stretched for unknown reason. Considering the very few numbers of the second peak and the same force value (190 pN), we decide not to change the unfolding force value in the manuscript. But we thank this reviewer’s insightful comment.

      Author response image 3.

      The bimodal fit for unfolding force of SpecAI88-49E102K-6H149H show the same 190 pN unfolding for the first peak as previous fit.

      (4) The colors in the video are not very intuitive, as the spectrin is shown initially in light blue, but becomes grey in the variants, where light blue is reserved for the additional helix. A counter of elapsed time and/or force/temperature applied could help the readers orient. Maybe it could be useful to produce a video with spectrin and the three variants all shown together?

      We thank this comment. The videos have been revised to improve clarity and consistency accordingly. In all cases, the original protein scaffold is now shown in gray, while the additional helix in the designed variants is highlighted in blue. Real-time annotations have been added to aid interpretation: the instantaneous temperature is displayed during AMD simulations, and time is shown during SMD simulations. In addition, for ease of comparison, the AMD and SMD results of all four proteins are each compiled into a single combined video, allowing their behaviors to be viewed side by side.

      Reviewer #2 (Public review):

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      We are thankful for the reviewer’s diligent evaluation and positive remarks. His/her concluding remarks, which encourage our future work at the intersection of AI-protein design and AFM-SMSF, are especially appreciated. All comments have been incorporated into our revisions.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      This is an insightful comment. Indeed, a direct comparison between the same structure of the three-helix bundle will be most straightforward with a clear reference point. I will take this advice and try it in our future endeavor.

      In our case, a substantial fraction of the hydrophobic region is relatively shallow and partially solvent-exposed in the wild-type R15 α-helical bundle. So, the added fourth helix provides a new hydrophobic packing interface, increasing core burial, packing density, and strengthening the internal load-bearing network. Consistent with this design rationale, rSASA analysis shows that the designed proteins exhibit a higher degree of hydrophobic core burial compared to the wild-type R15. Specifically, the fraction of residues with rSASA < 0.2 exceeds 30% in the designs, compared to 23% in the natural spectrin repeat.

      While the authors have shown experimentally that stage II constructs have increased the mechanical stability by AFM, they did not show that these same constructs have increased the thermal and chemical stabilities. Since the effects of salt bridges on stability are highly context dependent (orientation, local environment, exposed vs buried, etc.), it is difficult to assess the magnitude of the effect that this change had on other types of stabilities.

      We agree that the effects of salt bridges are highly context-dependent and that different dimensions of stability do not always correlate. Following your suggestion, we evaluated the thermal and chemical stabilities of the Stage II constructs. The experimental results (now added as Figure S9) show that Stage II designs successfully maintain the high thermal stability and resistance to chemical denaturation to different extend. The thermal stability is still as high as the Stage I but the resistance to chemical denaturation is slightly reduced. We have added this result in the manuscript accordingly.

      The three constructs chosen are 60-70% identical to each other, either suggesting overconstrained optimization of the sequence or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore these possible design principles further.

      Yes, the observed sequence convergence likely arises from a combination of intrinsic physical constraints of the protein architecture and the applied design and screening criteria. In particular, the tightly packed hydrophobic core imposes strong constraints on side-chain size, packing complementarity, and the alignment of heptad-like motifs reminiscent of coiled-coil organization, which collectively reduce the accessible sequence space. In addition, the strong selection pressure imposed by foldability and stability filters further promotes convergence toward similar solutions. And we agree with the reviewer that this represents an important direction for future work.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein

      Yes, steered MD can become computationally expensive, particularly as the number of designs increases or as protein size grows. Considering the vast pool created by AI, SMD in this work was applied to a relatively small, high-confidence subset of candidates after multiple rounds of rapid prescreening, keeping the overall computational cost manageable. In future applications, this step could be further accelerated by integrating machine-learning–based predictors to improve scalability.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that the difference in rSASA between the designs and the natural spectrin repeat is meaningful. It would be helpful to report confidence intervals for the rSASA values of the designs to clarify whether any differences are statistically robust. Even if such differences prove statistically significant, it is not clear that they are large enough to be practically meaningful.

      In our analysis, rSASA values were calculated from equilibrated MD conformations and were consistently higher for all designed proteins that passed the simulation-based screening compared to the wild-type spectrin repeat. However, we believe that rSASA was used only as a supportive structural descriptor to indicate a trend toward a more compact and better-buried hydrophobic core, rather than as a standalone or decisive metric of stability.

      Protein stability is indeed influenced by multiple factors, including hydrogen bonding, salt bridges, metal coordination, and topology-dependent load-bearing interactions, none of which are captured by rSASA alone. Therefore, we agree with the reviewer that differences in rSASA alone should not be overinterpreted as a quantitative measure of protein stability. For this reason, rSASA was not used as a ranking criterion or a predictor of stability, but only as complementary evidence consistent with the overall design rationale and with the experimentally observed stability enhancements.

      The claim "The strong agreement between computational rankings and experimental measurements validates this approach for prioritizing designs based on relative mechanostability, offering a practical pipeline to bridge the gap between in silico design and experimental validation." should be substantiated by a citation or a figure. Since the authors have the experimental AFM data and steered MD data, I suggest adding a Spearman correlation plot of the two.

      Following this comment, we examined the Spearman rank correlation between SMD-derived unfolding forces and experimentally measured AFM forces (Author response image 4). The resulting correlation was modest (ρ = 0.4, p = 0.6), which is not unexpected given (i) the large difference in force and timescales between high-speed SMD simulations and single-molecule AFM experiments, and (ii) the limited number of designs and simulation repeats available.

      Nevertheless, qualitatively, the difference between the first point from wt-spectrin and the other three specAI is clear. Considering the large computational cost, we only performed three times simulation one each design to balance the accuracy and the cost/time. To avoid overinterpretation, we therefore did not include the correlation analysis in the main text and revised the manuscript to soften claims of strong agreement, emphasizing instead the qualitative and comparative role of SMD in the design pipeline.

      Author response image 4.

      Spearman correlation between SMD and AFM unfolding forces for natural spectrin and SpecAI designs. SMD force (x-axis) versus experimental AFM force (y-axis); each point represents one protein.

      Reviewer #3 (Public review):

      Summary:

      Qiu et al. present a hierarchical framework that combines AI and molecular dynamics simulation to design an α-helical protein with enhanced thermal, chemical, and mechanical stability. Strategically, chemical modification by incorporating additional α-helix, site-specific salt bridges, and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provides fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete framework for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.

      The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      We appreciate the positive assessment of our manuscript from this reviewer and his/her support. We have answered all the comments as follows and modified the manuscript accordingly.

      Weaknesses:

      (1) The authors report that appending an additional helix increases the overcall stability of the α-helical protein. Could the author provide a more detailed structural explanation for this? Why does the mechanical stability increase as the number of helixes increase? Is there a reported correlation between the number of helices (or the extent of the hydrophobic core) and the stability?

      In multi-helix bundle proteins, tight interhelical packing leads to the formation of a dense hydrophobic core, which substantially enhances overall structural stability. The introduction of an additional helix does not merely increase helix count, but expands the buried hydrophobic interface, improving packing density and cooperative side-chain interactions in the core. This, in turn, strengthens the internal load-bearing network that resists force-induced unfolding.

      From a mechanical perspective, adding a helix also increases topological interlocking among secondary-structure elements, which raises the energetic barrier for unfolding and shifts the unfolding pathway toward more cooperative rupture events, thereby increasing the unfolding force threshold. Consistent with this design principle, pioneering studies have reported a positive correlation between the number of helices (or the extent of the hydrophobic core) in helix bundles and their stability (Lim et al., Structure, 2008, 16:449; Minin et al., J. Am. Chem. Soc., 2017, 139, 16168; Bergues-Pupo et al., Phys. Chem. Chem. Phys., 2018, 20, 29105). Inspired by these works, our AI-protein design study uses the appended helix to reinforce the hydrophobic core rather than simply increasing secondary-structure content.

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along the pulling coordinate.

      We agree this is a crucial distinction. Thermal and chemical stabilities report on the equilibrium free energy (ΔG), while mechanical stability probes the kinetic unfolding barrier (ΔG‡) along a force-dependent pathway. Their inherent difference makes concurrent improvement in all parameters a non-trivial task, which highlights the importance and success of our integrative design approach.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (k<sub>f</sub>) and unfolding (k<sub>u</sub>) rates. It remains unclear whether the observed ultrastability is primarily driven by a drastic decrease in the unfolding rate (k<sub>u</sub>) or if the design also maintains or improves the folding rate (k<sub>f</sub>)?

      We agree with the reviewer that thermodynamic stability is determined by both the folding rate (k<sub>f</sub>) and the unfolding rate (k<sub>u</sub>). In the present study, we did not directly measure folding kinetics, and therefore cannot quantitatively deconvolute the respective contributions of k<sub>f</sub> and k<sub>u</sub> to the observed ultrastability. Based on the design strategy and the experimental observations, we propose that the enhanced stability primarily originates from a substantial reduction in the unfolding rate (k<sub>u</sub>), corresponding to an increased unfolding energy barrier. The reinforcement of the hydrophobic core, the introduction of stabilizing interactions such as salt bridges and metal coordination, and the additional helix that increases topological and packing constraints all raise the energetic cost of disrupting key interactions in the folded state.

      This interpretation is consistent with the high mechanical unfolding forces observed in both AFM experiments and SMD simulations. In contrast, these stabilizing features are not necessarily expected to accelerate folding and may even modestly increase folding complexity. Addressing folding kinetics explicitly would require dedicated kinetic experiments or simulations, which are beyond the scope of the present work but represent an interesting direction for future studies.

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (k<sub>f</sub> ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Does the newly designed protein, with its additional fourth helix and site-specific chemical modifications, retain the exceptionally high folding rate of the parent R15?

      We did not directly measure the folding kinetics of the newly designed proteins, and therefore cannot determine whether they retain the exceptionally fast folding rate reported for the parent spectrin repeat R15. While R15 is known for its ultrafast folding behavior, the introduction of an additional fourth helix and site-specific chemical modifications, although beneficial for enhancing stability, may increase the complexity of the folding landscape and do not necessarily guarantee that the folding rate (k<sub>f</sub>) remains comparable to that of R15.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the used Gaussian function to fit the unfolding force distribution (Figure 3-4). In Figure S8, the Bell-Evans model is used to analyze unfolding force. The authors should explain the choice of fitting methods and ensure consistency.

      The Gaussian fitting used in Figures 3–4 is intended as a descriptive statistical analysis to summarize the unfolding force distributions and to facilitate direct comparison between different designs. This approach provides a robust estimate of the most probable unfolding force and the distribution width, without invoking a specific physical unfolding model, and is commonly used in single-molecule force spectroscopy for comparative purposes.

      In contrast, the Bell-Evans model applied in Figure S8 is a kinetic framework that explicitly accounts for force-loading-rate dependence and is used to extract mechanistic insights into the unfolding process. Therefore, the two fitting approaches serve complementary roles: Gaussian fitting for quantitative comparison and ranking of mechanostability, and Bell-Evans analysis for mechanistic interpretation. We have clarified this distinction and the rationale for using both methods in the revised Supplementary Information to ensure consistency and transparency.

      (2) The authors utilized steered MD simulation to analyze the mechanical properties via ForceGen (Ni et al., 2024, Sci. Adv. 10, eadl4000). However, the significant discrepancy between the predicted unfolding force (~600 pN) and the experimental value (~50 pN for spectrin, line 376) requires further justification (line 376). Please clarify how the accuracy of these predictions can be established. Specifically, do the MD simulations successfully capture the relative ranking or trends in stability across the different designed variants?

      We agree with the reviewer that there is a substantial discrepancy between the absolute unfolding forces predicted by SMD simulations (~ 600 pN) and those measured experimentally by AFM (~ 50 pN for spectrin). This difference primarily arises from the orders-of-magnitude mismatch in loading rates between simulations and experiments. In our SMD simulations, the pulling velocity (~10<sup>9</sup> nm/s) is several orders of magnitude higher than that used in AFM experiments (~10<sup>3</sup> nm/s), which is to systematically elevate the apparent unfolding force. In addition to loading-rate effects, limitations in force-field accuracy, finite system size, and restricted conformational sampling further contribute to deviations in absolute force values. As a result, the unfolding forces obtained from SMD are not intended to provide quantitative agreement with experimental measurements or absolute mechanical stability.

      Instead, SMD is employed here as a comparative screening tool to assess relative mechanostability across different designed variants under identical simulation conditions. Despite the limited number of repeats imposed by computational cost, the simulations consistently distinguish candidates with markedly different mechanical responses. Importantly, the variants identified by SMD as more mechanically stable were subsequently confirmed experimentally to exhibit enhanced mechanostability relative to the wild-type spectrin repeat. Therefore, while SMD does not yield quantitatively accurate unfolding forces, it successfully captures relative stability trends and provides a practical and effective means for prioritizing designs prior to experimental validation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. Author response:

      Reviewer #1:

      We appreciate the reviewer’s suggestions. In the revision, we will clarify which results are new and better position this work relative to our earlier publication. We will also expand the discussion of the functional implications of polymerase clustering and its cell-cycle dynamics.

      Regarding the condensate interpretation, we agree that the current evidence is suggestive but not definitive. In the revised manuscript, we will clarify how our measurements relate to commonly used criteria for condensate assemblies and revise the text to avoid overstating this interpretation. We will also add quantification to additional figures and revise the model diagram to more accurately reflect the conclusions supported by the data.

      Reviewer #2:

      We thank the reviewer for the positive assessment of the imaging quality. We agree that the manuscript would benefit from a broader discussion of possible models for the observed polymerase foci. In the revision, we will expand the discussion to include alternative interpretations, such scaffolded assemblies as suggested by the reviewer 3, and further clarify the properties of the RNA Pol II and RNA Pol III foci.

      Reviewer #3:

      We thank the reviewer for the positive evaluation of the study and the helpful suggestions. We agree that the current evidence is indicative but not sufficient to definitively demonstrate condensate formation. In the revision, we will revise the language and discuss alternative interpretations, including scaffolded assemblies. We will also provide additional quantifications for the relevant figures.

      Overall, we appreciate the reviewers’ suggestions and believe that the planned revisions will improve the clarity and impact of the manuscript.

    1. Author response:

      Reviewer #1:

      We appreciate the reviewer’s insightful suggestions. In the revised manuscript, we will provide quantitative analysis of Western blot data throughout the study to improve data robustness and reproducibility. In addition, we will expand the “Discussion” session to address the following points raised by the reviewer #1: (1) Potential mechanisms underlying the regulation of LAMP1 transcript levels by NINJ2; (2) Whether Ninjurin1 may play a similar role in regulating lysosomal membrane permeabilization (LMP); (3) The potential clinical implications of our findings, particularly in relation to cancer progression and therapeutic targeting.

      Reviewer #2:

      We thank the reviewer for the insightful and constructive suggestions, which would further deepen the mechanistic understanding of the NINJ2-LAMP1 pathway and its role in ferroptosis regulation. To address the reviewer’s concerns, we will clarify the interpretation of our findings, add quantitative analyses where appropriate, and expand the Discussion to acknowledge these important mechanistic questions and future research directions. Specifically, we will revise the Statistical Analysis section to clearly describe the statistical methods used, including whether corrections for multiple comparisons were applied where appropriate. We will further discuss the potential interaction domain(s) between NINJ2 and LAMP1. We will also discuss the potential role of NCOA4, a central mediator of ferritinophagy, in the NINJ2-FTH1-LAMP1 pathway. Finally, we will include a schematic model summarizing the proposed NINJ2-LAMP1-iron-ferroptosis axis to better illustrate the working model of our study.

    1. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

      Please see our detailed responses to the reviewer comments below.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) It appears that the accuracy of the estimated gaze angle must be well under the size of the gaze cone (+/- 10 degrees), but I can't find any direct estimate of the accuracy even if it is just a ballpark figure. On Lines 219-233 is where performance is described for viewing images and video on a monitor, where it should be possible to reconstruct the point of gaze on the monitor while images and video are shown, in order to evaluate the accuracy of the system for where the marmoset is looking? Would you see eye position traces that would show fixation clusters around those images or videos with stationary points on the monitor much like that seen for head-fixed animals looking at faces on a screen (Mitchell et al, 2014)? If so, what is the typical spread of those clusters during fixations on an image, both in terms of the precision by RMS error during a fixation epoch and the spread around the images at different locations (accuracy of projection)? For example, if gaze clusters were always above the displayed images one would have an idea that the face plane is slightly offset above the true gaze direction. It is not completely clear how well the face plane and corresponding gaze cone do in describing gaze direction in space, but the monitor stimuli could be used as an initial validation of it.

      We thank the reviewer for this important suggestion regarding the quantitative validation of gaze accuracy. We agree that, when animals view stimuli presented on a monitor, the estimated gaze direction can be evaluated by examining the spatial distribution of gaze–monitor intersection points relative to stimulus locations.

      To address this, we generated a new figure (Fig. S2A) analyzing gaze behavior following the onset of video stimuli presented at different locations on the monitor. Specifically, we selected video clips in which human annotators verified that the marmosets were looking at the monitor. Consistent with prior work in head-fixed marmosets (Mitchell et al., 2014), we observe clustering of gaze–monitor intersection centers within and around the corresponding stimulus locations after stimulus onset. These clusters provide an empirical validation that the estimated gaze direction aligns with stimulus position in space.

      Importantly, unlike the head-fixed preparation used in Mitchell et al. (2014), marmosets in our study were freely moving. As a result, they do not exhibit prolonged, stationary fixations on the monitor, and fixation clusters are therefore more diffuse. This increased spread reflects natural head and body motion rather than limitations of the gaze estimation method itself. Despite this, gaze intersection points remain spatially localized to the vicinity of the presented stimuli across different monitor locations.

      We did observe small offsets in some gaze clusters relative to stimulus centers; however, these offsets were not systematic across stimulus locations or animals. Crucially, there was no consistent bias (e.g., clusters appearing uniformly above or below stimuli) that would indicate a systematic misalignment of the face plane or gaze cone relative to true gaze direction. Together, these observations support the conclusion that the face-plane-based gaze cone provides an accurate estimate of gaze direction in space, with precision well within the ±10° aperture of the gaze cone.

      While the freely moving component of the behavior precludes direct estimation of fixation RMS error comparable to head-fixed paradigms, the observed stimulus-locked clustering serves as an initial validation of both the accuracy and practical utility of our approach under naturalistic conditions.

      (2) A second major comment is about clarity in the writing of the results and discussion. At the end of the manuscript, a major takeaway is the difference between familiar and unfamiliar dyads, that males show more interest in viewing females including unfamiliar females, but for familiar females, this distinction is also associated with being likely to look at them if they look at the male, and then to engage in joint gaze with them after looking at them, which indicates more of a social interaction than simply monitoring them when they are unfamiliar. Those aspects of the results could be emphasized more in the topic sentences of paragraphs presenting data to support those features of the gaze data (at present is buried at the ends of results paragraphs and back in the discussion).

      We thank the reviewer for this insightful suggestion. We have restructured the Results and Discussion sections to lead with the primary social takeaways rather than technical descriptions (Tracked changes in Word). Specifically, we now emphasize the distinction between "social monitoring" (characteristic of unfamiliar dyads) and "active social coordination" (characteristic of familiar dyads).

      (1) Topic Sentences: We revised the topic sentences of all Results paragraphs to immediately highlight the findings regarding male interest and the influence of familiarity on reciprocation.

      (2) Conceptual Framework: We added a conceptual distinction in the Discussion, explaining that while unfamiliar marmosets maintain high social attention through "peripheral monitoring" and proximity-dependent joint gaze, familiar pairs exhibit sophisticated, distance-independent coordination and gaze reciprocation.

      (3) Clarification of Male Interest: We explicitly stated that while male interest in females is high regardless of familiarity, it manifests as persistent monitoring in unfamiliar pairs versus a more aware, reciprocal state in familiar pairs.

      Minor comments:

      (1) Methods:

      a) Lines 522-539: The 200 continuous frames used for validation of the model containing two marmosets are sufficient to test how well it generalizes to other animals outside the training set? The RMSE reported, does it vary for animals inside vs outside the training set? To what extent does the RMSE, in image pixels, translate into accuracy in estimating the gaze direction, for example, as assessed by estimating error when marmosets look at images or video on the monitor?

      To address the reviewer’s concern regarding generalization and the translation of pixel RMSE to angular accuracy, we emphasize that the six facial features selected are prominent, high-contrast features across the species. Consequently, we observed that the RMSE remained consistent for marmosets both inside and outside the training set. To quantify how pixel-level tracking error translates into gaze estimation accuracy, we performed a sensitivity analysis. We simulated landmark (i.e., feature) jitter by sampling perturbations from circular distributions based on our empirical data (2.4 pixels for eyes; 2.1 pixels for the central blaze). Our results, illustrated in uthpr response image 1, show that 90% of the resulting head gaze deviations fall within 10°, which is consistent with the angular threshold used for our gaze cone model. This confirms that the reported RMSE provides sufficient precision for reliable gaze estimation.

      Author response image 1.

      Probability distribution of gaze angular deviation under circular perturbation. The histogram (blue) represents the change in reconstructed gaze angle (degrees) following stochastic perturbation of facial features. To simulate real-world variance, noise was sampled from circular distributions with radii of 2.4 pixels (eyes) and 2.1 pixels (central blaze). The red curve represents an exponential fit to the empirical data (y=ae<sup>bx</sup>, a=0.9591, b=0.1813. Approximately 90% of the reconstructed gaze deviations remain below 10°, indicating the model’s localised stability under pixel level coordinate jitter.

      b) Line 542-43: Is there any difference between a rigid model fit to the six facial points, versus using the plane defined by the two eyes and central blaze in terms of direction accuracy (in the ground truth validation)? How does the "semi-rigid" set of six points (mentioned also in lines 201-203) constrain the fit of the three points (two eyes and central blaze) that define the normal plan for the gaze cone?

      We thank the reviewer for the opportunity to clarify our geometric model. The plane used to define the gaze cone's origin was indeed determined by the two eyes and the central blaze. However, a plane defined by only three points was insufficient to determine a unique gaze direction, as the normal vector was ambiguous (it could point forward through the face or backward through the head).

      To resolve this, we utilized the relative positions of the two ear tufts. Because the tufts are anatomically situated behind the eyes and blaze, these additional points provide the necessary spatial context to orient the gaze vector correctly. In our validation, we found that the mouth does not alter the angular accuracy compared to a 3-point fit, supporting that the facial features are correctly identified.

      We use the term 'semi-rigid' to describe the six-point constellation because their relative spatial configurations remain stable across individuals and expressions, imposing a biological constraint on the model. This prevents unphysical warping of the face frame during 3D reconstruction and ensures the gaze cone remains anchored to the animal's true midline.

      (2) Results:

      a) Lines 203-205: What is the distinction between gaze orientation (defined by facial plane, 3D vector) and gaze direction (defined by ear tufts) ... is gaze direction in the 2D x-y plane? Why are two measures needed or different? It does not appear gaze orientation is used further in the manuscript and perhaps could be omitted.

      We appreciate the reviewer’s comment regarding the terminology. We have replaced all instances of ‘gaze orientation’ with ‘gaze direction’ to ensure consistency throughout the manuscript.

      To clarify, both terms referred to the same 3D unit vector. The ear tufts were not used to define a separate 2D measure; rather, they served as posterior anatomical anchors to resolve the 3D polarity of the normal vector (ensuring the vector points 'forward' from the face rather than 'backward'). Gaze direction was calculated in 3D space and was not restricted to a 2D x-y plane. We have clarified this in the revised Methods section (Lines 203–205) to avoid further ambiguity.

      b) Line 215-216: why is head-gaze velocity put in normalized units instead of degrees visual angle per second? How was the normalization performed (lines 549-557)? It would be simpler to see velocity as an angular speed (degrees angle per second) rather than a change in norms.

      We thank the reviewer for this suggestion. We agree that the expression is misleading.

      (1) We have replaced "face norm" with "face normal vector" (N) throughout the manuscript to clarify that we are referring to the 3D unit vector perpendicular to the facial plane.

      (2) Lines 224-225 and the corresponding Methods section (Lines 599-609) have been updated to reflect this change in units and terminology.

      We chose to use the change in the face normal vector in normalized units for our primary calculations because it allows for efficient spatiotemporal smoothing and is computationally robust at the very low thresholds required for our stability analysis. However, to address the reviewer's concern regarding interpretability, we have verified that our threshold of 0.05 normalized units corresponds to an angular velocity of 2.87 degrees/frame duration [33ms]. Since we are operating at very small angular changes, the Euclidean distance between unit vectors is a near-linear proxy for the angular displacement in radians.

      c) Lines 215-216: How do raw gaze traces appear over time ... are there gaze saccades and then stable fixations, or does it vary continuously? A plot of the gaze trace might be useful besides just showing velocity with a threshold, to evaluate to what extent stable fixation vs shifts are distinct.

      Author response image 2.

      Time course of gaze, angular velocity and stability, thresholding. The plot illustrates the temporal dynamics of the face normal vector velocity used to define stable gaze states. The blue trace represents the raw gaze velocity calculated in normalised units. The red dashed line demotes the empirical cut off threshold of 0.05 units per frame.

      To clarify the temporal dynamics of marmoset head movements, we have provided a representative time course of head gaze velocity as shown in Author response image 2. The data clearly show a "saccade-and-fixate" pattern: large, distinct spikes in velocity (representing rapid head redirections) are separated by periods of relative stability.

      While minor high-frequency fluctuations in the raw trace (blue) may be attributed to facial feature detection noise, they remain significantly below our stability threshold (red dashed line). By applying this threshold, we successfully isolated biologically relevant "stable fixations" from "head saccades," ensuring that our subsequent social gaze analysis is based on periods of intentional head gaze direction.

      d) Lines 237-286: The writing in this section does not emphasize the main results. There seem to be three takeaway points that could be emphasized better in the topic sentences of each of the paragraphs: i) Marmosets tended to spend most of their time on either end of the elongated box, not in the middle, ii) Males spent more time near the front of the box near the other animal than females, iii) Familiar pairs spent more time closer to each other.

      To address this comment, we have reorganized this section to lead with the three key behavioral findings:

      (1) We now state clearly in the topic sentence that marmosets preferred the ends of the arena over the middle.

      (2) We have highlighted the finding that males spend significantly more time near the inner edge (closer to the partner) than females, irrespective of familiarity.

      (3) We emphasized that familiar pairs maintain closer and more dynamic social distances over time, whereas unfamiliar pairs tend to move further apart as a session progresses.

      e) Line 303: It would be useful to see time traces of head velocity of each member of the pair and categorization over time of the gaze event types. A stable epoch must be brief on the order of 100-200ms. It is unclear how distinct the stable fixation epochs are from the moments when the gaze is shifting. Also, the state transition analysis treats each stable epoch like one event, and then following a gaze movement by either of the pair, the state is defined again, is that correct?

      We defined stable epochs as continuous periods where the face normal vector velocity remained below 0.05 normalized units for both animals. This ensures that a "gaze state" is only categorized when both marmosets have relatively fixed head orientations. As shown in the provided time traces in Author response image 2), the velocity profile is characterized by sharp peaks (head saccades) and clearly defined troughs (fixations). Further, we generated a probability histogram of stable head-gaze epoch durations (Author response image 3). The median duration of these stable epochs is 200ms, which aligns with biological expectations for fixation durations in primates and confirms that these states are distinct from the high-velocity shifts.

      The reviewer’s interpretation is correct. Our Markov chain model treats each stable epoch as a single event. A transition occurs when at least one animal moves (exceeding the velocity threshold), resulting in a new stable epoch where the relative gaze state is re-evaluated. This approach allows us to model the sequence of social interactions as a series of discrete behavioral decisions.

      Author response image 3.

      Temporal characteristics of stable gaze, head gaze, epochs. The histogram illustrates the probability distribution of the duration (ms) of stablegaze behaviour epochs. A minimum duration threshold of 100 ms was applied to exclude transient, non-purposeful head gazes.

      f) Lines 316-326: Some general summarizing statements to lead this paragraph would be useful. It seems that familiar pairs are more likely to participate in joint gaze, especially when close to each other, and perhaps, that males tended to gaze at females more than the reverse. Is there any notion that males were following the gaze of females?

      We thank the reviewer for these suggestions. We have revised the topic sentences of this section to lead with a summary of the social takeaways, specifically highlighting the higher level of male interest and the shift toward reciprocal coordination in familiar pairs.

      The reviewer correctly identified an important dynamic. Our transition analysis (Fig. 4D) confirms that males in both familiar and unfamiliar dyads frequently follow the female's gaze. This is evidenced by a robust transition probability (~17%) from "Male-to-Female Partner Gaze" (blue node) to "Joint Gaze" (green node). We found that this gaze-following behavior was a general feature of the dyads and did not differ significantly by familiarity, which is why it was not previously emphasized. However, we have now added a statement to the Results (Lines 358-365) to explicitly describe this male-led gaze-following behavior.

      g) Lines 328-337: Can these findings in this paragraph be summarized more generally? It seems males view unfamiliar females longer, whereas for familiar females they are more likely to reciprocate viewing if being viewed by them and then to join in joint gaze with them. Would that event, viewing a female and then a transition to joint gaze, not be categorized as a gaze-following event?

      We have now summarized the paragraph to emphasize the transition from vigilant monitoring in unfamiliar pairs to reciprocal awareness in familiar pairs.

      Regarding "longer" viewing: We have clarified the text to specify that males' interest in unfamiliar females is persistent and robust rather than simply "longer" in a single duration. The high recurrence probability signifies that males consistently re-orient their gaze back to the unfamiliar female even if the interaction is briefly interrupted by movement.

      Regarding gaze following and joint gaze: The reviewer asks if the transition from viewing a female to joint gaze constitutes gaze following. We agree that a transition from "male-to-female gaze" to "joint gaze" is indeed a gaze-following event (as noted in our previous response regarding Fig. 4D). However, the specific transition discussed in this paragraph (female-to-male gaze to male-to-female gaze) is different: it describes a "reciprocal" event where the male responded to being looked at by looking back at the female, while the female simultaneously shifted her gaze away. Since the two gaze cones did not intersect on an external object or on each other's faces simultaneously at the end of this transition, it was not categorized as joint gaze or gaze following.

      h) Lines 339-351: It is not clear why gazing at the region surrounding a female's face (as opposed to the face itself) reflects "gaze monitoring tied to increased social attention (Dal Monte et l, 2022). This hypothesis could be expanded to make the prediction clear in this paragraph.

      We thank the reviewer for identifying the need to clarify the hypothesis regarding the region surrounding the face. We have expanded this paragraph to explain why gazing at the peripheral facial region reflects social monitoring.

      In many primate species, direct and sustained eye contact can be often interpreted as a threat or a challenge, particularly between unfamiliar individuals. Peripheral monitoring (looking at the area immediately surrounding the face) can strategically allow an animal to stay highly attentive to the partner's head orientation, gaze direction, and facial expressions—all critical for anticipating future actions—while minimizing the risk of social conflict. By demonstrating that unfamiliar marmosets utilize this peripheral strategy significantly more than familiar ones, we provide evidence that social attention in novel dyads is characterized by a social monitoring strategy that balances the need for information with social caution.

      i) Lines 354-373: This section seems to suggest again that in a familiar male/female pair, the male is more likely to follow the female gaze and establish a joint gaze, and this occurs less with the unfamiliar pair only when closer in distance. Some summary sentences to begin the paragraph could help frame what to expect from the results.

      We have added summarizing topic sentences to this section to clarify the relationship between familiarity and the spatial distribution of joint gaze.

      (3) Discussion:

      Lines 380-463: This section reads more clearly than most of the results, where it is often hard to connect the data plots to their significance for behavior. Overall, I believe the manuscript could be improved by setting up a hypothesis before presenting results in the paragraphs demonstrating the data. Some of the main findings appear in text from lines 413-419 (somewhat hidden even in discussion).

      We sincerely appreciate the reviewer’s positive feedback on the clarity of the latter sections of our Discussion. We have taken the suggestion to heart and have performed a comprehensive restructuring of the Results and Discussion sections.

      (1) We have moved the key takeaways, specifically the distinction between vigilant monitoring in unfamiliar pairs and reciprocal coordination in familiar pairs, from the end of the Discussion to the topic sentences of the relevant Results paragraphs.

      (2) We established a unified framework throughout the manuscript that connects pixel-level tracking stability to the biological "saccade-and-fixate" movement pattern, and ultimately to the social dimensions of sex and familiarity.

      (4) A couple of additional questions to address in the discussion:

      a) Can you speculate why in this behavioral context the marmosets do not engage in reciprocal gaze where both are simultaneously looking at each other (lines 297-301)? How low is the incidence of this event, numerically, in comparison to the other events (1 in 1000 events, etc)?

      We appreciate the reviewer’s interest in the lack of reciprocal gaze (mutual eye contact).

      Numerically, reciprocal gaze events occurred with a frequency of approximately 1 in 500 social gaze events (comprising less than 0.2% of our social dataset). Given this extreme scarcity, we felt that any statistical comparisons across sex or familiarity would be underpowered and potentially misleading, leading to our decision to focus on partner and joint gaze states.

      We speculate that the rarity of reciprocal gaze is primarily due to our task-free experimental setup. Unlike directed cooperation tasks where animals must look at each other to coordinate actions for a reward (e.g., Miss & Burkart, 2018), our study focused on task-free interactions. In a free-moving context without a common goal, marmosets may prioritize monitoring the environment or the partner’s actions (joint or partner gaze) over direct, sustained mutual eye contact, which can sometimes be perceived as a confrontational or high-arousal signal in primate social hierarchies.

      b) Does a transition from a marmoset viewing their partner, to a joint gaze, count as a gaze-following event? It appears the authors are reluctant to use that terminology. What are the potential concerns in that terminology? Is there a concern that both animals orient to the same object that is salient to them without it being due to their gaze?

      A transition from a partner-directed gaze to a joint gaze is indeed a gaze-following event. We distinguish these events from a transition between partner-directed gazes (e.g., male-to-female to female-to-male). In these "reciprocation" cases, once the second animal looked at the first, the first animal shifted their gaze away. Because the two gaze cones did not intersect on a common object at the end of the transition, I classified such events as a social exchange of attention rather than a coordinated gaze-following event.

      Reviewer #2 (Recommendations for the authors):

      I do have a few questions/points for clarification:

      (1) While your approach appears to be able to track head orientation when the face is occluded or turned away from the primary cameras, how was the accuracy of this validated? Since you have multiple cameras, it should be possible to make the estimate using the occluded cameras and then validate using the non-occluded ones.

      We appreciate the reviewer's comment regarding the validation of our tracking during partial occlusions.

      We wish to clarify that our system does not utilize "primary" vs "auxiliary" cameras. Rather, any two or more cameras that capture facial features with high confidence are used to triangulate the points into 3D space. Thus, the "primary" cameras are dynamically determined frame-by-frame based on the animal's orientation.

      To validate the accuracy of our 3D reconstruction during occlusions, we utilized a "projection-validation" approach. As demonstrated in Figure 2B (left panel), when the face is turned away from a specific camera, leaving only the back of the head visible, we used the facial features triangulated from the other non-occluded cameras and projected them onto the image plane of the occluded camera. The fact that these projected points aligned precisely with the expected (but hidden) anatomical landmarks confirms the global accuracy of our 3D model.

      We previously benchmarked this approach using a three-camera system where we triangulated coordinates via two cameras and successfully projected them onto the third camera's image plane with high accuracy. This ensures that even when a camera is "blind" to the face, the 3D position estimated by the rest of the array remains robust.

      (2) Marmosets, like other non-human primates, also look at other body postures for their social communication, though admittedly marmosets are far more likely to look others in the face than larger primates. The tail-raised genital displays come to mind. While the paper primarily focuses on shared vs deviant gaze, and I believe tracks not only the angle of viewing towards the target but also the distance from the face (please clarify if I am wrong), it would also be useful to know how often marmosets are looking at each other beyond just the face. This is particularly interesting if the gaze towards the partner varies depending on whether that partner was generally oriented towards the gazer, or not. For the joint gaze, were there conditions in which the two were looking at the same target, but had body postures that were not oriented toward one another (i.e. looking at a distant target beyond one of the animals, like looking over someone else's shoulder)?

      We thank the reviewer for highlighting the importance of body postures and non-facial social signals (e.g., genital displays) in marmoset communication.

      At the inception of this project, we explored tracking multiple body parts. However, due to the marmoset's dense fur and the lack of distinct skeletal markers under naturalistic lighting, human annotators and early automated tools struggled to achieve the precision required for high-resolution 3D kinematics. While recent advances in whole-body tracking now make these questions approachable, we chose to focus on the face normal vector because it provided the most robust and high-confidence signal for social orientation in our current dataset.

      Regarding the "looking over the shoulder" scenario, we utilized a hierarchical classification system to prevent wrong categorization. Intersection with the partner’s face always took priority. If one animal’s gaze cone contained the other’s face, the state was classified as "Partner Gaze", even if the two gaze cones happened to intersect at a distant point in space. This ensures that "Joint Gaze" specifically captures instances where both animals ignore one another’s face regions to focus on a shared external target.

      We agree that the relationship between body posture and head gaze is a fascinating area for future research. In our current setup, while "Joint Gaze" requires the head-gaze cones to intersect, the animals' bodies could indeed be oriented in different directions (e.g., looking at a distant target behind the partner). We have added a note to the Discussion acknowledging that incorporating whole-body gestures would further deepen the understanding of marmoset social ethology.

      (3) In the introduction, (line 70), you raise the question of ecological relevance, using rhesus in laboratory settings. This could use a little more expansion/explanation of the limitations of current/past approaches.

      We thank the reviewer for the suggestion to expand upon the ecological limitations of traditional laboratory paradigms.

      We have substantially revised the Introduction (Lines 70–82) to provide a more detailed critique of past approaches. Specifically, we now highlight how traditional head-fixed or screen-based paradigms decouple eye movements from natural head-body dynamics and lack the reciprocal, multi-agent complexity found in real-world social environments (e.g., Land, 2006; Shepherd, 2010). By contrasting these constraints with the spatially and socially embedded nature of marmoset interactions, we clarify why a more naturalistic, quantitative approach is necessary to understand the true dynamics of social gaze. These additions provide a stronger theoretical foundation for our move toward a free-moving experimental model.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

      Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

      Thanks to the referees' input and more work, we think our revised manuscript now meets the high standard of eLife

      Recommendations for the authors:

      The importance of the circular swimming chirality for the observed phenomenon could be further emphasized by actually using the word "chiral" or "chirality" in the text. Also indicating what would change is swimming were counterclockwise rather then clockwise would help the reader understand the key significance of chirality.

      We thank the reviewer for this insightful suggestion. We agree that the chirality of the surface interaction is central to the observed phenomenon and should be explicitly highlighted to improve the reader's understanding.

      In response, we have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming. We clarify that in such a case, the hydrodynamic interaction would cause cells to veer left, resulting in up-gradient accumulation along the left sidewall rather than the right. We believe these additions significantly improve the clarity of the underlying physical mechanism.

      Reviewer #1 (Recommendations for the authors):

      I still have several comments that the authors may want to consider for the last version.

      - The run and tumble behavior of the cells at the surface remains puzzling and would need some more explanation in the text. Tumbles with no significant reorientation angle amount largely to smooth swimmers. How can a model based on run-and-tumbles be used to explain the difference between LSW and RSW?

      We apologize for the lack of clarity regarding the surface run-and-tumble behavior. While it is true that surface tumbles often result in smaller reorientation angles compared to bulk swimming, they are not negligible and play a critical role in the observed asymmetry. As shown in the tumble angle distributions (Fig. 2E and 2F), the probability of a tumble angle exceeding π/2 is approximately 9% for sidewall trajectories and 30% for the middle area. This tumbling behavior leads to differences between the left sidewall (LSW) and right sidewall (RSW) in two key ways:

      First, as detailed in our geometric analysis (Fig. 6), running cells following stable clockwise circular paths are geometrically favored to reach the RSW. Because cells moving up-gradient (towards the RSW) experience suppressed tumbling, they maintain these stable circular trajectories and accumulate effectively. Conversely, cells moving down-gradient (towards the LSW) experience enhanced tumbling. These frequent interruptions distort the circular trajectories required to reach the LSW, resulting in fewer bacteria entering the LSW compared to the RSW.

      Second, once at the wall, the difference in tumbling frequency dictates retention. Majority of LSW cells are swimming down-gradient (LSW-DG) and thus tumble more frequently, increasing their probability of escaping the wall. Majority of RSW cells are swimming up-gradient (RSW-UG), suppressing tumbles and increasing their residence time at the wall.

      The relevant clarifications have been included in the last paragraph of “Results” in the manuscript.

      - Figure 5B would need more explanation. I still don't understand the different behaviors for the right and left side walls at small widths. Is it noise really or a more complex behavior? Since most of these calculations are based precisely on the shape of these curves it would be useful to discuss them in more detail.

      We apologize for the lack of clarity. The behavior observed at small widths in Figure 5B is not noise; rather, it reflects the idealized nature of our simulation model.

      In the simulation, bacteria were modeled as active particles without explicit steric exclusion for the flagella and cell body. Consequently, simulated cells retain the ability to reorient and turn freely even in very narrow lanes (w ≤ 6 μm), allowing the geometric sorting mechanism (which favors the RSW) to function efficiently even at small widths. This is why the simulation shows a distinct difference between LSW and RSW proportions in this regime.

      In the experimental reality, however, the finite size of the bacterial body and flagella creates steric hindrance. In narrow channels, this physical constraint restricts the cells' ability to turn, thereby disrupting the circular swimming mechanism required to sort cells into the RSW. As a result, experimental data shows that the proportions of LSW and RSW cells tend to equalize in narrow channels (e.g., w = 6 μm in Fig. 4B), leading to a lower chemotactic drift velocity than predicted by the simulation.

      We have added a discussion regarding these steric effects and the deviation at narrow widths to the Results section (the penultimate paragraph of subsection "Simulation of E. coli chemotaxis within lane confinement") in the revised manuscript.

      - The importance of the chirality of the circular trajectories, although essential, remains insufficiently mentioned in the text.

      We have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming.

      - It would be useful to color-code the trajectories of Figure 1B and alike with time.

      Thank you for the suggestion. Now the trajectories in Fig. 1B have been redrawn. Distinct colors denote individual trajectories, with color intensity darkening to indicate time progression.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Lenz and colleagues describes a detailed examination of the epigenetic changes and alterations in subnuclear arrangement associated with the activation of a unique var gene associated with placental malaria in the human malaria parasite Plasmodium falciparum. The var gene family has been heavily studied over the last couple of decades due to its importance in the pathogenesis of malaria, its role in immune avoidance, and the unique transcriptional regulation that it displays. Aspects of how mutually exclusive expression is regulated have been described by several groups and are now known to include histone modifications, subnuclear chromosomal arrangement, and in the case of var2csa, regulation at the level of translation. Here the authors apply several methods to confirm previous observations and to consider a possible role for DNA methylation. They demonstrate that the histone mark H3K9me3 is found at the promoters of silent genes, var2csa moves away from other var gene clusters when activated, and while DNA methylation is detectable at var genes, it does not seem to correlate with transcriptional activation/silencing. Overall, the data and approach appear sound.

      Strengths:

      The authors employ the latest methods for epigenetic analysis of histone marks, transcriptomic analysis, DNA methylation, and chromosome conformation. They also use strong selection pressure to be able to examine the gene var2csa in its active and silent state. This is likely the only paper that has used all these methods in parallel to examine var gene regulation. Thus, the paper provides readers with confidence in the interpretation of independent methods that address a similar subject.

      We thank the reviewer for this positive assessment. We appreciate the recognition that our study combines complementary approaches including histone mark profiling, transcriptomic analysis, DNA methylation mapping, and chromosome conformation capture in parallel to the use of strong population selection that enables a controlled comparison of var2csa in active versus silent states. We agree that the convergence of independent methods strengthens confidence in the interpretation.

      Weaknesses:

      The primary weakness of the paper is that none of the conclusions are novel and the overall conclusions do not shed much new light on the topic of var gene regulation or antigenic variation in malaria parasites. The paper is largely confirmatory. The roles of H3K9me3 and subnuclear localization in var gene regulation are well established by many groups (including for var2csa), albeit in some cases using alternative methods. The only truly unique aspect of the manuscript is the description of 5mC at var2csa when the gene is transcriptionally active or silent. Here the authors demonstrate that the mark has no clear role in transcriptional activation or silencing, however, this will not be surprising to many in the field who have previously cast doubt on a regulatory role for this modification.

      While we agree that some individual features of var gene regulation, including H3K9me3 enrichment, have been described previously, our study integrate for the first time several layer of gene regulation on the clinically important var2csa locus using phenotypically homogeneous placental-binding parasite populations. As expected, var2csa activation coincided with a loss of H3K9me3 at the locus. However, using high-resolution chromatin conformation capture (to our knowledge, this experiment had never been applied to phenotypically homogeneous parasite populations), we quantified the repositioning of var2csa relative to heterochromatic telomeric clusters. We further assessed DNA methylation in this framework and show that 5-methylcytosine is broadly present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in the transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      As stated in our response to Reviewer 1, our study combines, for the first time, complementary approaches, including transcriptomic analysis, histone mark profiling, DNA methylation mapping, and chromosome conformation capture, together with strong population selection to enable a controlled comparison of var2csa in active versus silent states.

      Weaknesses:

      No major new finding is reported. The strength of the evidence presented is mostly solid, although certain elements, e.g., the role of 5mC in transcriptional regulation of var2cs, appear preliminary and incomplete.

      While we agree that no major new finding is reported, we were able to use for the first time a high-resolution chromatin conformation capture method to quantify the repositioning of var2csa relative to heterochromatic telomeric clusters. We also further assessed that 5-methylcytosine is present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate for the first time transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) In the second paragraph of the introduction, the authors state "....such as the shielding of the parasite antigens expressed on pRBC surfaces by other cells and the evasion of splenic clearance (8)." What does "other cells" mean here?

      We thank the reviewer for this comment. We have clarified the cell type in the text.

      (2) In their interpretation of the Hi-C data, the authors conclude that the var2csa expressing parasites display "tighter heterochromatin control of var gene regions" and "interactions around other silent var genes were increased" and "an overall compaction of telomere ends and var gene-containing intrachromosomal regions". While the data appear to show that this is true when they compare the two parasite populations, I am concerned that the authors might be misinterpreting the data. It is important to note that the NF54CSAh line is heavily selected to be nearly entirely homogeneous for var gene expression while the NF54 line is exceptionally heterogeneous. This is shown in Figure 1G. Thus, any chromosomal arrangement specific for var gene expression in the unselected NF54 population will be similarly heterogeneous and therefore could appear less tight. In other words, interactions around silent var genes and overall compaction of telomere ends might be identical between individual parasites within these populations, but appear tighter or more compact in the var2csa expressing line simply because it is a homogeneous population. Perhaps this is what the authors meant to convey, however as currently written, it seems that they conclude the expression of var2csa results in a unique change in chromosome organization. A better comparison would be two populations homogeneously expressing different var genes, one expressing var2csa and one expressing an alternative var gene. Such lines can be generated through clonal isolation or selection for binding to a different host receptor.

      We thank the reviewer for this comment. The reviewer is correct, and we have revised the Discussion section of the manuscript to clarify this issue.

      (3) The title of the last section of the Results is "Distribution of DNA methylation influences gene expression overall but does not mediate transcriptional activation and switching in antigenic variation". This is an overstatement. The authors show that DNA methylation is absent at var gene promoter regions and enriched in coding regions, but there they provide no evidence that it "influences gene expression overall". This is speculation. Lastly, when the authors examined 5mC occupancy across genes, did they normalize for GC content of the DNA sequences? GC content is known to increase dramatically in coding regions (particularly in var genes) and thus could explain the distribution of this mark. If the authors corrected for this, they should directly state this in the results section. If they did not, they should explain why they don't think this property of the P. falciparum genome explains the distribution of 5mC.

      There is often a misconception in the field that DNA methylation is primarily confined to CpG islands in promoter regions and functions mainly as a repressor of transcription. However, in contrast to promoter methylation, methylation within gene bodies is generally associated with higher levels of gene expression, suggesting a role in facilitating transcription elongation. Gene-body methylation can also repress internal promoters, thereby preventing spurious transcription initiation within the gene. In addition, it has been shown to influence alternative splicing by affecting RNA polymerase II elongation kinetics.

      We propose that, in Plasmodium, DNA methylation may be associated with priming genes for transcriptional activity rather than repressing transcription. Specifically, higher methylation levels may facilitate recruitment of the RNA polymerase II transcriptional machinery to enable transcription. In Figure 4B, we observe higher levels of DNA methylation in the first exon of highly expressed genes in both the NF54 and NF54CSAh lines. Interestingly, we also detect high levels of methylation across most introns of the var genes, introns that must be transcribed, cannot be degraded, and are essential for var gene regulation, suggesting a possible sequence-recognition function. We have edited the manuscript to improve clarity.

      (4) In the legend to Figure 3D, the authors state that the centromeres are shown in blue, however in the figure they appear to be grey while var2csa is blue.

      We have revised the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):

      I recommend using the term "transcription" rather than "expression" when discussing events at the gene level.

      We have revised the manuscript accordingly.

      I also recommend using the term "adhesion" to describe the physical interaction between infected erythrocytes and adhesion receptors rather than adherence", which should be reserved to describe non-physical affinity (e.g., beliefs, faith).

      We have revised the manuscript accordingly.

      Important new evidence regarding transcriptional regulation of var genes in general and var2csa in particular should be discussed and cited.

      We have revised the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The "number sense" refers to an imprecise and noisy representation of number. Many researchers propose that the number sense confers a fixed (exogenous) subjective representation of number that adheres to scalar variability, whereby the variance of the representation of number is linear in the number.

      This manuscript investigates whether the representation of number is fixed, as usually assumed in the literature, or whether it is endogenous. The two dimensions on which the authors investigate this endogeneity are the subject's prior beliefs about stimuli values and the task objective. Using two experimental tasks, the authors collect data that are shown to violate scalar variability and are instead consistent with a model of optimal encoding and decoding, where the encoding phase depends endogenously on prior and task objectives. I believe the paper asks a critically important question. The literature in cognitive science, psychology, and increasingly in economics, has provided growing empirical evidence of decision-making consistent with efficient coding. However, the precise model mechanics can differ substantially across studies. This point was made forcefully in a paper by Ma and Woodford (2020, Behavioral & Brain Sciences), who argue that different researchers make different assumptions about the objective function and resource constraints across efficient coding models, leading to a proliferation of different models with ad-hoc assumptions. Thus, the possibility that optimal coding depends endogenously on the prior and the objective of the task, opens the door to a more parsimonious framework in which assumptions of the model can be constrained by environmental features. Along these lines, one of the authors' conclusions is that the degree of variability in subjective responses increases sublinearly in the width of the prior. And importantly, the degree of this sublinearity differs across the two tasks, in a manner that is consistent with a unified efficient coding model.

      Comments on revisions:

      The authors have done an excellent job addressing my main concerns from the previous round. The new analyses that address the alternative model of "no cognitive noise and only motor noise" are compelling and provide quantitative evidence that bolsters the paper's overall contribution. The authors also went above and beyond by reanalyzing the Frydman and Jin (2022) dataset to provide new and very interesting analyses that provide an additional out of sample test of the model proposed in the current paper.

      Reviewer #2 (Public review):

      Summary:

      This paper provides an ingenious experimental test of an efficient coding objective based on optimization as a task success. The key idea is that different tasks (estimation vs discrimination) will, under the proposed model, lead to a different scaling between the encoding precision and the width of the prior distribution. Empirical evidence in two tasks involving number perception supports this idea.

      Strengths:

      - The paper provides an elegant test of a prediction made by a certain class of efficient coding models previously investigated theoretically by the authors. The results in experiments and modeling suggest that competing efficient coding models, optimizing mutual information alone, may be incomplete by missing the role of the task.

      - The paper carefully considers how the novel predictions of the model interact with the Weber/Fechner law.

      Weaknesses:

      The claims would be even more strongly validated if data were present at more than two widths in the discrimination experiment (also noted in Discussion).

      Reviewer #3 (Public review):

      Summary:

      This work investigates whether human imprecision in numeric perception is a fixed structural constraint or an endogenous property that adapts to environmental statistics and task objectives. By measuring behavioral variability across different uniform prior distributions in both estimation and discrimination tasks, the authors show that perceptual imprecision increases sublinearly with prior width. They demonstrate that the specific exponents of this scaling (1/2 for estimation and 3/4 for discrimination) can be derived from an efficient-coding model, wherein decision-makers optimally balance task-specific expected rewards against the metabolic costs of neural coding. The revised manuscript expands this framework to accommodate logarithmic representations and validates the core model against an independent dataset of risky choices.

      Strengths:

      The authors have effectively addressed my previous concerns with rigorous additions:

      (1) The mathematical formulation has been revised into a discrete signal accumulation framework, making the objective function and resource trade-offs much more transparent and mathematically tractable.

      (2) The incorporation of the logarithmic representation resolves prior ambiguities regarding structural constraints.

      (3) The new split-half analysis effectively addresses the temporal dynamics of adaptation. The stability of the sublinear scaling across the experiment provides solid evidence that human subjects utilize rapid, top-down modulation to adjust their encoding strategy when explicitly informed about the environment.

      (4) Validating the derived scaling exponents on an independent risky-choice dataset robustly supports the generalizability of the theoretical framework beyond a single cognitive domain.

      Weaknesses:

      The methodological and theoretical issues raised in the first round have been thoroughly resolved, and the evidence supporting the claims regarding response variance is convincing.

      There is one remaining theoretical point that warrants discussion to provide a complete picture of the proposed generative model. The manuscript exquisitely models and predicts response variance (imprecision), but it remains largely silent on the closed-form predictions for the mean estimation (i.e., bias). Under the assumption of optimal Bayesian decoding combined with specific encoding schemes (e.g., linear vs. logarithmic), the model implicitly generates mathematical predictions for the subjects' mean estimates. Specifically, varying the scaling exponent (α) and the prior width (w) should systematically alter the predicted bias in different conditions.

      While fitting or explicitly explaining this mean bias is not strictly necessary for the core claims regarding variance scaling, acknowledging what the optimal decoder analytically predicts for the mean estimation-and how it aligns or contrasts with typical empirical observations-would strengthen the theoretical transparency of the paper.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no further requests for the authors, I congratulate the authors on a great paper.

      Reviewer #2 (Recommendations for the authors):

      No further suggestions.

      Reviewer #3 (Recommendations for the authors):

      In the Figure 2b caption, the phrase "from which the numbers of dots are sampled" appears to be a typo carried over from the estimation task. It should likely read "from which the numbers are sampled", as the discrimination task uses Arabic numerals rather than dot arrays.

      We thank the reviewers for their attention to our revised manuscript. We are very glad that the reviewers seem satisfied with how we have addressed their concerns. The paper is now stronger than in its first iteration.

      Reviewer #3 points out that we have focused on the subjects’ response variability, and we did not report the mean estimates. We agree that the reader could reasonably expect to see this. We now include this in Figure 6.

      The subjects exhibit the typical patterns observed in numerosity-estimation task (most notably, the ‘central tendency of judgment’). The dotted line shows the predictions of the best-fitting model (with 𝛼 = 1/2) with the logarithmic encoding, which reproduces the subjects’ main behavioral patterns.

      We have slightly revised the manuscript. The revised version includes this Figure, in Methods (p. 28). We have modified the text of the Methods accordingly (bottom of p. 27), and we now refer to this analysis in the main text (line 6 of p. 5). We have also corrected the typo noted by Reviewer #3 (caption of Fig. 2b).