10,000 Matching Annotations
  1. May 2026
    1. Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues here apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair correlation function spectroscopy) to address the nucleo-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Toll-dependent control of Dl nuclear localization, and represents one of a handful of model morphogen gradients produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measurement of GFP-tagged Dl protein, either in wild-type embryos, or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The study uses raster image correlation spectroscopy approaches to measure biophysical components of the Dl gradient in Drosophila embryos. It convincingly demonstrates a positive correlation between Toll pathway activity and the fraction of bound Dl in the nucleus. RICS methodology has widespread potential applications in cell and developmental biology, and this study will contribute to its adoption.

      Weaknesses:

      The study seeks to test a hypothesis for how the Toll pathway may limit Dl DNA binding in the nucleus. This experiment, while producing initial support for a role of nuclear Cactus, is confounded by co-expression of wild-type Dl, thus limiting the interpretation of the experimental results.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al. use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying Dl gradient formation have been extensively studied, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. But the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors separated GFP-tagged Dl into a mobile and an immobile pools. Interestingly, the fraction of immobile Dl is position-dependent, revealing more binding to DNA in ventral than in dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl to bind DNA. Using specific dl alleles, authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      The main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl to bind DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A). While it is interesting that a small fraction of immobile Dl significantly increases in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent.

      Another weakness of the study, is that experiments are performed in the presence of a wild-type GFP-tagged Dl (unfortunately, the Dl gradient does not form without it; Supplemental Figure 4). This is an unfortunate technical limitation, because it cannot allow to test how important Cact binding is for determining the amount of Dl that could bind DNA in more biologically-relevant locations of the embryo (e.g., in lateral regions).

      Overall, I feel that the manuscript exemplify how FSC methods and analysis can be used for the estimation of biophysical parameters and test biological hypothesis, even under very low concentrations (such as Dl in dorsal-most nuclei). However, due to technical limitations, it falls short in offering a real quantitative understanding of their proposed mechanisms. The authors did not report in Figure 5, what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

      Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.

      Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.

      In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.

      However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.

      Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.

      Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.

      While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).

      In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.

      For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

      We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.

      The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.

      First part/General Suggestions:

      (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?

      The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).

      (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.

      We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.

      (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.

      We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.

      (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.

      We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.

      (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?

      We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).

      (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?

      We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).

      We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.

      Last Part:

      (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.

      We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.

      (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.

      We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.

      (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).

      We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.

      Minor issues/typo stuff:

      (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.

      We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.

      (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.

      We have added the requested information.

      (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?

      We have made these changes.

      (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.

      We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.

      (5) Where is the median line in the S317A boxplot in Fig 5C?

      The median line is at ψ = 0. We have added an explanation of this to the Figure legend.

      (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.

      We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?

      The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.

      (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).

      We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.

      Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.

      On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.

      (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.

      We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.

      (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.

      We have made the suggested changes.

      (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).

      We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.

      (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).

      We have made the suggested changes.

    1. eLife Assessment

      This study provides important insights regarding the temporal dynamics of dopamine across sleep/wake transitions in several brain areas. Using multi-site fiber photometry combined with EEG/EMG recordings, the study revealed heterogenous dynamics across both cortical and several subcortical areas. Although the evidence for these observations is solid, evidence for the proposed mechanisms driving DA dynamics is incomplete. Overall, the study may have a substantial impact on several fields working on the neurobiology of DA signaling.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-to-WAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-to-WAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

    3. Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      We thank the reviewer for raising this important methodological consideration. While we acknowledge that a minor contribution from norepinephrine (NE) to the DA2m signal cannot be categorically excluded, several convergent lines of evidence give us confidence that the signals we recorded primarily reflect dopamine release.

      First, DA2m has substantially lower affinity for NE compared to dopamine. The reported EC<sub>50</sub> for NE is ~1200 nM [1], which is ~15-fold higher than for dopamine. In contrast, extracellular NE levels in the prefrontal cortex are typically in the low nanomolar range (generally <5 nM under basal conditions) [2,3]. Because physiological NE concentrations are orders of magnitude below the sensor’s EC<sub>50</sub> threshold, NE is highly unlikely to drive significant DA2m activation in vivo.

      Second, our optogenetic experiments provide direct functional validation. The targeted stimulation of midbrain dopaminergic neurons elicited robust DA2m signal responses across both cortical and subcortical brain areas. This confirms that the sensor reliably captures evoked dopamine release within our specific experimental paradigm.

      Finally, the spontaneous DA2m signal dynamics we observed across sleep-wake states functionally diverge from previously reported patterns of cortical NE release [4]. For example, in Figure 1C, our DA2m recordings in the mPFC revealed high activity during wakefulness, alongside pronounced, sharp changes during NREM-to-WAKE transitions. In contrast, prior study [4] show that NE exhibits comparatively mild fluctuations during wakefulness and transitions between NREM. This temporal and kinetic divergence further supports that our recorded signals isolate region-specific dopaminergic dynamics rather than generalized NE arousal activity.

      Taken together, these physiological, functional, and kinetic distinctions indicate that while a negligible contribution from NE cannot be entirely ruled out, it is highly unlikely to account for a substantial portion of the DA2m signals observed during sleep-wake transitions in our study.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      We appreciate the reviewer raising this important point regarding the metabolism of CNO. We are aware of the evidence suggesting that CNO can undergo back-metabolism to clozapine in rodents, which could potentially exert independent effects on sleep-wake architecture. To mitigate this concern, we strictly employed several experimental safeguards:

      (A) Non-hM3Dq Control Group: As noted by the reviewer, we included a cohort of mice that did not express the hM3Dq receptor but received the same dosage of CNO (1 mg/kg). In these animals, we observed no significant alterations in sleep-wake states compared to saline baseline (Figure S3), suggesting that at this dosage, any clozapine produced was below the threshold for behavioral modulation of sleep.

      (B) Dosage Selection: We utilized a relatively low dose of CNO (1 mg/kg), which is widely reported in the literature to minimize the accumulation of clozapine to levels that would interfere with EEG-defined sleep states in rodents [5]. Furthermore, studies have demonstrated that while higher doses of CNO (e.g., 5–10 mg/kg) can produce clozapinelike effects on sleep architecture, lower doses around 1 mg/kg do not yield significant alterations in cortical EEG power distribution or sleep-wake amounts in control animals [6,7].

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-toWAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-toWAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      We thank the reviewer for this insightful comment. We agree that midbrain dopamine neurons exhibit both tonic and phasic action-potential firing patterns. As summarized by Grace et al., dopamine neurons recorded using in vivo electrophysiology can display a slow, irregular, single-spike “tonic” firing pattern, typically around 2–10 Hz, as well as burst-like “phasic” firing patterns [8].

      However, our recordings were performed using GRAB-DA2m fiber photometry. Therefore, our measurements reflect extracellular dopamine dynamics in the recorded target regions rather than the action-potential firing patterns of midbrain dopamine neurons. GRABDA2m has subsecond sensor kinetics and is suitable for detecting extracellular dopamine transients occurring over hundreds of milliseconds to seconds, as well as slower dynamics occurring over seconds to tens of seconds [1], which matches the timescale of the sleep–wake transition-related dynamics observed in previous studies [9,10]. Nevertheless, GRAB-DA2m fiber photometry in our study does not directly resolve dopamine neuron spike timing or distinguish tonic from phasic firing modes. Accordingly, we interpret our signals as extracellular dopamine concentration dynamics rather than as direct measurements of tonic or phasic neuronal firing.

      Therefore, the transition-aligned dopamine signals shown in Figure 1 should be interpreted as dopamine dynamics occurring over seconds-to-tens-of-seconds around sleep–wake transitions, rather than as dopamine neuron firing patterns. In addition, these traces represent GRAB-DA2m signals averaged across sessions and mice within a ±30 s window centered on each sleep/wake transition. Thus, they do not necessarily represent individual dopamine transient patterns on single transitions. We also acknowledge the reviewer’s observation that faster phasic-like components are visible in some traces, including the decreases in the NAc-L preceding REM-to-WAKE transitions. Direct electrophysiological recordings of dopamine neuron firing during sleep–wake transitions would be useful in future studies to determine how tonic and phasic firing modes contribute to the observed dopamine dynamics.

      In the laser-evoked stimulation experiments shown in Figure 3, we thank the reviewer for the thoughtful interpretation. The results indicate that different stimulation durations can produce distinct dopamine release dynamics in downstream projection regions. Moreover, prolonged optogenetic stimulation was associated with more sustained dopamine responses, suggesting that the temporal profile of extracellular dopamine dynamics depends, at least in part, on the duration and region of dopaminergic input [1]. We also agree with the reviewer that the temporal profiles of the GRAB-DA2m signals vary not only across regions, but also across sleep/wake transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid dopamine decrease, whereas the REM-to-WAKE transition displays a slower decline.

      Similarly, faster dopamine changes are observed in some region/transition combinations, such as NAc-L during NREM-to-WAKE and DRN during REM-to-WAKE, whereas slower kinetics are observed in others, such as mPFC during WAKE-to-NREM and VTA during NREM-to-WAKE. Together, these effects reflect both region-specific mechanisms and transition-dependent differences in dopaminergic activity.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      We thank the reviewer for raising these important considerations. The temporal relationships between regional dopamine signals were assessed using cross-covariance analysis. We agree that cross-covariance analysis has limitations when applied to sleep/wake transitions, because these transitions are inherently non-stationary events. Although cross-covariance centers the signals by subtracting their means and is therefore less sensitive to baseline offsets than raw cross-correlation, it still summarizes the lagdependent covariance between two signals over the selected analysis window. Therefore, the inferred lag should be interpreted as a transition-level measure of temporal coordination rather than a precise estimate of instantaneous lead–lag timing.

      To minimize the influence of brief or unstable state fluctuations, we only included transitions in which both the preceding and following sleep/wake epochs lasted at least 30 s, and excluded epochs shorter than 30 s [4]. This criterion helped ensure that the analyzed events represented well-defined transitions between sustained behavioral states rather than transient or fragmented episodes. Although dopamine signals may still change dynamically within the transition window, and the temporal resolution of fiber photometry and the kinetics of genetically encoded GRAB-DA2m sensors limit the precision with which fine-scale timing relationships can be interpreted, dopamine signals were relatively stable within each behavioral state, as shown in Fig. 1B and reported previously [1,9,10] Thus, we believe that cross-covariance analysis provides useful information about the temporal coordination of dopamine dynamics across regions.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

      We thank the reviewer for this thoughtful comment. We agree that chemogenetic manipulation operates on a much slower timescale than the rapid dopamine transients observed during sleep–wake transitions, and therefore does not directly recapitulate these fast dynamics. In particular, CNO-induced activation unfolds over minutes and produces sustained changes in neuronal activity, whereas the DA signals we report fluctuate on a sub-second to second timescale. Our intention with the chemogenetic experiments was not to mimic the precise temporal profile of endogenous DA signals, but rather to test whether increasing the activity of specific dopaminergic populations is sufficient to influence behavioral state.

      In this context, our results show that activation of VTA or DRN dopaminergic neurons robustly promotes wakefulness, supporting a causal role for these populations in sleep– wake regulation at the circuit level. However, we agree that these data do not by themselves establish that these neurons directly generate the rapid transition-related DA dynamics observed in the photometry recordings.

      Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states.

      We appreciate the reviewer’s thoughtful suggestion. We agree that the directionality and kinetics of dopamine changes during sleep/wake transitions may provide important information beyond state-level dopamine quantification.

      In this study, mice were recorded for 4–5 h during each sleep session. Across the recording period, mice frequently transitioned from NREM to WAKE, WAKE to NREM, NREM to REM, and REM to WAKE. Transitions from WAKE to REM were rarely observed and therefore were not included in the transition analysis. Accordingly, we focused our analysis on the four major transition types: NREM-to-WAKE, WAKE-to-NREM, NREM-toREM, and REM-to-WAKE [4,9,11].

      For each transition type, dopamine dynamics were analyzed separately by aligning the zscored GRAB-DA2m signal to the transition onset and averaging across all epochs of the same transition type. To minimize the influence of brief or unstable state fluctuations, we excluded transitions in which either the preceding or following sleep/wake epoch lasted less than 30 s. The resulting transition-triggered dopamine traces were then averaged across sessions and mice for each transition type independently.

      Thus, the transition analysis preserves the directionality of state changes rather than pooling all sleep/wake transitions together. Because dopamine signals differ across behavioral states, transitions between neighboring states produce distinct temporal profiles when aligned to the transition point [4,9-11]. For example, REM-to-WAKE transitions may show a rapid increase in dopamine in the mPFC, whereas WAKE-to-NREM or NREM-to-REM transitions may show slower and more modest decreases. These transition - specific kinetics may reflect distinct underlying mechanisms, including changes in dopamine neuron firing or local terminal modulation.

      The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results.

      We agree with the reviewer that precise histological validation is essential for the correct interpretation of our optogenetic and chemogenetic findings.

      Regarding the chemogenetic experiments, as noted, we provide examples of virus expression in the VTA, DRN, and SNc in Figure 4. By demonstrating the consistency and restriction of our targeting across the entire cohort (VTA, SNc, and DRN), we confirmed that our observed sleep effects were regionally specific. Our data only included mice with accurate targeting and no substantial virus "leakage" into adjacent nuclei.

      We thank the reviewer for this insightful observation regarding the regional dopamine (DA) responses following SNc stimulation. While the SNc is traditionally associated with the dorsal striatum (DLS), several studies have demonstrated that SNc dopaminergic neurons also project to the nucleus accumbens, particularly the lateral shell [12,13]. Furthermore, recent work characterizing the functional heterogeneity of midbrain DA neurons suggests that SNc subpopulations can drive significant DA release in ventral striatal subregions [14]. We appreciate the reviewer’s caution regarding potential off-target effects. While our histological criteria for validation post recordings were stringent, we acknowledge that in any midbrain manipulation, the close anatomical proximity of the VTA and SNc makes it technically challenging to guarantee zero involvement of neighboring VTA neurons. However, by using mice with the most restricted virus expression and fibers targeting, we have minimized this potential confound as much as is technically feasible with current viral and optogenetic methods.

      Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      We thank the reviewer for this comment. The experiments in Figure 3 were designed to systematically map the sources of dopaminergic inputs to key brain regions examined in this study [15], including the mPFC, DLS, NAc, and CeA. Establishing these input–output relationships is important for interpreting the photometry signals observed during sleep– wake transitions.

      Specifically, we found that optogenetic activation of VTA dopaminergic neurons elicits DA responses in all four regions, whereas activation of DRN dopaminergic neurons induces responses in the mPFC, DLS, and CeA, and activation of SNc dopaminergic neurons induces responses in the mPFC, NAc, and DLS. These results reveal partially overlapping but distinct projection patterns across dopaminergic populations.

      Taken together, these data provide a circuit-level framework suggesting that VTA, SNc, and DRN dopaminergic neurons may contribute differentially and with distinct weights to the DA signals observed in these regions during sleep wake transitions.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      We appreciate the reviewer’s assessment that our study convincingly demonstrates how dopamine fluctuates across sleep states. We agree that the primary contribution of this work is descriptive and foundational. At the same time, we respectfully emphasize that rigorous, comprehensive descriptive studies are essential, particularly when addressing phenomena that have not been systematically characterized. Prior to this work, dopamine dynamics during natural sleep–wake transitions had not been measured simultaneously across multiple brain regions.

      Our multi-site photometry approach advances the field in several important ways. Technically, the combination of simultaneous eight-region fiber photometry with EEG/EMG recordings represents a substantial methodological advance, enabling brainwide, network-level analysis of dopamine dynamics during natural state transitions. This approach reveals emergent features—such as temporal coordination and inter-regional lead–lag relationships—that cannot be captured using single-site recordings. Moreover, integrating brain-wide measurements with region-specific manipulations allows circuitlevel insights that would not be accessible from either approach alone.

      Conceptually, our findings revealed the region, sleep/wake transition type -specific and bidirectional dopamine dynamics, instead of the prevailing view of dopamine as a uniform arousal signal: dopamine decreases in certain limbic regions, such as the central amygdala and nucleus accumbens lateral shell, during arousal transitions, while increasing in cortical and other striatal regions. These results refine simplified models of dopaminergic regulation of arousal. In addition, our data reveal differential circuit contributions, with the VTA and DRN—but not the SNc—promoting wakefulness, highlighting functional specialization within the dopamine system.

      We acknowledge that some aspects of our study, including the optogenetic mapping and chemogenetic experiments, build on established methodologies and in part confirm prior findings. However, these experiments also provide several new insights. First, whereas individual dopamine sources have often been studied in isolation, our systematic comparison across VTA, SNc, and DRN using consistent methods reveals distinct brainwide functional contributions that were not previously established. Second, our optogenetic mapping does not simply recapitulate known projection patterns, but instead uncovers quantitative differences in dopamine release kinetics and magnitude across source–target pairs, which inform the heterogeneity of the transition dynamics. Finally, our findings provide a crucial anatomical and temporal framework for future research on the specific mechanisms driving these dynamics and their precise functional consequences.

      References:

      (1) Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17, 1156-1166, doi:10.1038/s41592-020-00981-9 (2020).

      (2) Ihalainen, J. A., Riekkinen, P., Jr. & Feenstra, M. G. Comparison of dopamine and noradrenaline release in mouse prefrontal cortex, striatum and hippocampus using microdialysis. Neurosci Lett 277, 71-74, doi:10.1016/s0304-3940(99)00840-x (1999).

      (3) Berridge, C. W. & Abercrombie, E. D. Relationship between locus coeruleus discharge rates and rates of norepinephrine release within neocortex as assessed by in vivo microdialysis. Neuroscience 93, 1263-1270, doi:10.1016/s0306-4522(99)00276-6 (1999).

      (4) Silverman, D. et al. Activation of locus coeruleus noradrenergic neurons rapidly drives homeostatic sleep pressure. Sci Adv 11, eadq0651, doi:10.1126/sciadv.adq0651 (2025).

      (5) Anaclet, C. et al. The GABAergic parafacial zone is a medullary slow wave sleeppromoting center (vol 17, pg 1217, 2014). Nat Neurosci 17, 1841-1841, doi:DOI 10.1038/nn1214-1841d (2014).

      (6) Ma, C. Y. et al. Microglia regulate sleep through calcium-dependent modulation of norepinephrine transmission. Nat Neurosci 27, 249-258, doi:10.1038/s41593-02301548-5 (2024).

      (7) Traut, J. et al. Effects of clozapine-N-oxide and compound 21 on sleep in laboratory mice. Elife 12, doi:10.7554/eLife.84740 (2023).

      (8) Grace, A. A., Floresco, S. B., Goto, Y. & Lodge, D. J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci 30, 220-227, doi:10.1016/j.tins.2007.03.003 (2007).

      (9) Darmohray, D. et al. Brainstem circuit for sickness-induced sleep. Sci Adv 11, doi:ARTN eady024510.1126/sciadv.ady0245 (2025).

      (10) Hasegawa, E. et al. Rapid eye movement sleep is initiated by basolateral amygdala dopamine signaling in mice. Science 375, 994-+, doi:10.1126/science.abl6618 (2022).

      (11) Ding, X. et al. Neuroendocrine circuit for sleep-dependent growth hormone release. Cell 188, 4968-4979 e4912, doi:10.1016/j.cell.2025.05.039 (2025).

      (12) Poulin, J. F. et al. Mapping projections of molecularly defined dopamine neuron subtypes using intersectional genetic approaches. Nat Neurosci 21, 1260-1271, doi:10.1038/s41593-018-0203-4 (2018).

      (13) Lerner, T. N. et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell 162, 635-647, doi:10.1016/j.cell.2015.07.014 (2015).

      (14) Azcorra, M. et al. Unique functional responses differentially map onto genetic subtypes of dopamine neurons. Nat Neurosci 26, 1762-1774, doi:10.1038/s41593023-01401-9 (2023).

      (15) Eban-Rothschild, A., Rothschild, G., Giardino, W. J., Jones, J. R. & de Lecea, L. VTA dopaminergic neurons regulate ethologically relevant sleep-wake behaviors. Nat Neurosci 19, 1356-1366, doi:10.1038/nn.4377 (2016).

    1. eLife Assessment

      The study presents important findings revealing previously unresolved conformational dynamics of the heterodimeric type IV ABC transporter TmrAB using single-molecule FRET. The evidence presented is solid, integrating careful experimental design with computational approaches to uncover states that are typically masked and difficult to detect. The work will be of interest to scientists studying the molecular mechanisms of primary active transport processes.

    2. Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in Kd and its implications on the mechanism.

    3. Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      We thank the reviewer for this accurate and thoughtful summary of our work and its broader significance. We agree that the combination of single-molecule FRET with orthogonal validation approaches enables mechanistic resolution of conformational states and transitions that are not accessible by ensemble measurements. In particular, this framework allows direct discrimination of ATP-free and ATP-bound conformations, real-time tracking of transport cycle progression, and identification of transient intermediates in the heterodimeric ABC transporter TmrAB. We further agree that these capabilities support a generalizable strategy for dissecting conformation dynamics in related ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in K<sub>d</sub> and its implications on the mechanism.

      The apparent K<sub>d,ATP</sub> values were determined using two complementary approaches that report on different aspects of the system. Ensemble FRET measurements yielded values of 51° ± 38° µM (TmrAB<sup>NBD</sup>), 68°  ± 25° µM (TmrAB<sup>PG</sup>), and 95° ± 26° µM (TmrAB<sup>PG_EQ</sup>), which are in good agreement with previously reported biochemical estimates (~100° µM for TmrAB<sup>EQ</sup>) (Stefan et al, 2020). The slightly elevated value observed for the E→Q variant may reflect modest perturbation of nucleotide handling in this slow-turnover background. Notably, the close agreement between labeled and unlabeled variants indicates that fluorophore attachment does not measurably affect ATP binding.

      In contrast, smFRET-derived K<sub>d,ATP</sub> values (13° ± 1° µM for TmrAB<sup>NBD</sup> and 2° ± 1° µM for TmrAB<sup>PG</sup>) are systematically lower. This difference likely arises from the difficulty of deconvoluting overlapping FRET populations at sub-K<sub>d,ATP</sub> concentrations, particularly for TmrAB<sup>PG</sup>, where state assignment is less well separated. Despite this quantitative offset, both approaches consistently indicate ATP saturation well below physiological concentrations and therefore support the same mechanistic conclusion that ATP binding drives conformational switching in TmrAB.

      Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      We thank the reviewer for the thoughtful and constructive evaluation of our manuscript and for highlighting the strength of combining structural and single-molecule approaches. We have addressed all major and minor points in detail below and revised the manuscript where appropriate to clarify limitations, justify analysis choices, and improve transparency.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      We agree that our conclusions are based on a limited number of FRET reporter pairs, and we now explicitly state this limitation in the revised manuscript. The chosen labeling positions were selected to probe two functionally critical regions—the nucleotide-binding domains and the periplasmic gate—based on prior structural and spectroscopic evidence. While this represents sparse sampling of the full conformational space, it is consistent with typical smFRET studies of membrane transporters, where experimental constraints generally limit the number of simultaneously accessible labeling positions (Asher et al, 2021; Asher et al, 2022; Levring et al, 2023; Wang et al, 2020).

      Importantly, both independent reporter variants yield consistent ATP-dependent population shifts, supporting the robustness of the observed trends. We further clarify that additional labeling sites could, in principle, resolve finer structural sub-states; however, given the already limited population separation in the current variants, such extensions would likely provide diminishing returns in state resolvability under the present experimental conditions. This trade-off is now explicitly discussed.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      We thank the reviewer for this careful observation. Low-amplitude subpopulations are occasionally detected in individual histograms; however, they were not included in the quantitative model because they do not meet criteria for reproducibility, amplitude robustness, or structural assignability. Specifically, these features vary between replicates, contribute minimally to total population, and cannot be mapped to structurally or biochemically defined states based on available cryo-EM (Hofmann et al, 2019), DEER/PELDOR (Barth et al, 2018; Barth et al, 2020), or accessible-volume simulations.

      Similar minor subpopulations have been reported in smFRET studies and often attributed to photophysical or labeling heterogeneity effects (Asher et al, 2022; Husada et al, 2018). To avoid over-parameterization, we therefore restricted analysis to reproducible, structurally supported states. This rationale is now clarified in the revised manuscript.

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

      We agree that state separation in the TmrAB<sup>PG</sup> variant is limited (ΔE° = °0.11), and we now explicitly acknowledge this constraint in the manuscript. To improve robustness under these conditions, we used a constrained fitting strategy in which the apo-state distribution was fixed from nucleotide-free measurement, reducing parameter degeneracy during fitting of ATP-bound datasets.

      While single-molecule trajectory-based approaches such as Hidden Markov Modeling would be ideal for resolving dynamic interconversion, this was not feasible due to the low fraction of dynamic traces at the available temporal resolution. We therefore rely on population-level analysis, which remains consistent across replicates and reporter variants.

      Notably, independent measurements from two reporter positions (TmrAB<sup>NBD</sup> and TmrAB<sup>PG</sup>) yield similar ATP-bound population fractions at saturating ATP concentrations (~77% vs. ~80%), supporting the robustness of the inferred state distribution despite partial overlap.

      References

      Asher WB, Geggier P, Holsey MD, Gilmore GT, Pati AK, Meszaros J, Terry DS, Mathiasen S, Kaliszewski MJ, McCauley MD, Govindaraju A, Zhou Z, Harikumar KG, Jaqaman K, Miller LJ, Smith AW, Blanchard SC, Javitch JA (2021) Single-molecule FRET imaging of GPCR dimers in living cells. Nat Methods 18: 397–405. doi:10.1038/s41592-021-01081-y

      Asher WB, Terry DS, Gregorio GGA, Kahsai AW, Borgia A, Xie B, Modak A, Zhu Y, Jang W, Govindaraju A, Huang LY, Inoue A, Lambert NA, Gurevich VV, Shi L, Lefkowitz RJ, Blanchard SC, Javitch JA (2022) GPCR-mediated beta-arrestin activation deconvoluted with single-molecule precision. Cell 185: 1661–1675 e1616. doi:10.1016/j.cell.2022.03.042

      Barth K, Hank S, Spindler PE, Prisner TF, Tampé R, Joseph B (2018) Conformational coupling and trans-inhibition in the human antigen transporter ortholog TmrAB resolved with dipolar EPR spectroscopy. J Am Chem Soc 140: 4527–4533. doi:10.1021/jacs.7b12409

      Barth K, Rudolph M, Diederichs T, Prisner TF, Tampé R, Joseph B (2020) Thermodynamic basis for conformational coupling in an ATP-binding cassette exporter. J Phys Chem Lett 11: 7946–7953. doi:10.1021/acs.jpclett.0c01876

      Hofmann S, Januliene D, Mehdipour AR, Thomas C, Stefan E, Brüchert S, Kuhn BT, Geertsma ER, Hummer G, Tampé R, Moeller A (2019) Conformation space of a heterodimeric ABC exporter under turnover conditions. Nature 571: 580–583. doi:10.1038/s41586-019-1391-0

      Husada F, Bountra K, Tassis K, de Boer M, Romano M, Rebuffat S, Beis K, Cordes T (2018) Conformational dynamics of the ABC transporter McjD seen by single-molecule FRET. EMBO J 37: e100056. doi:10.15252/embj.2018100056

      Levring J, Terry DS, Kilic Z, Fitzgerald G, Blanchard SC, Chen J (2023) CFTR function, pathology and pharmacology at single-molecule resolution. Nature 616: 606–614. doi:10.1038/s41586-023-05854-7

      Nocker C, Pečak M, Nocker T, Fahim A, Sušac L, Tampé R (2026) Single-molecule dynamics reveal ATP binding alone powers substrate translocation by an ABC transporter. Nat Commun 17 doi:10.1038/s41467-026-70021-1

      Nöll A, Thomas C, Herbring V, Zollmann T, Barth K, Mehdipour AR, Tomasiak TM, Bruchert S, Joseph B, Abele R, Olieric V, Wang M, Diederichs K, Hummer G, Stroud RM, Pos KM, Tampé R (2017) Crystal structure and mechanistic basis of a functional homolog of the antigen transporter TAP. Proc Natl Acad Sci U S A 114: E438–E447. doi:10.1073/pnas.1620009114

      Stefan E, Hofmann S, Tampé R (2020) A single power stroke by ATP binding drives substrate translocation in a heterodimeric ABC transporter. eLife 9: e55943. doi:10.7554/eLife.55943

      Wang L, Johnson ZL, Wasserman MR, Levring J, Chen J, Liu S (2020) Characterization of the kinetic cycle of an ABC transporter by single-molecule and cryo-EM analyses. eLife 9: e56451. doi:10.7554/eLife.56451

    1. Reviewer #2 (Public review):

      Summary:

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation (HD-tDCS) targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They show that HD-tDCS reduces task aversiveness and increases task-execution willingness on real-world tasks as quantified by intensive experience sampling methods, providing causal evidence for the role of DLPFC in modulating contextual features to delaying or completing one's goals.

      Strengths:

      • This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The intensive experience sampling approach which probes and assesses self-relevant task goals is innovative and aims to address an important question regarding the specific role of DLPFC in modulating specific features of chronic procrastination behavior (e.g., task-execution willingness, task aversiveness).

      • The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Weaknesses:

      • While the findings that neurostimulation reduces procrastination behavior is compelling, there remain several alternative interpretations for these effects. For example, it could be that the task-execution willingness isn't increased per se, but rather that the goal completion becomes more valuable as participants learn from feedback or become more aware of their successful attainment of or failure to complete task goals. It is unclear whether the effects could be driven by improved working memory or attention to the reported tasks (and this limitation is addressed by the authors). In short, it is also difficult to examine the temporal dynamics of how these goals are selected across time.

      • It is unclear whether the current evidence support long-retention of this neurostimulation intervention. The study includes one 6-month timepoint after the study to examine the long-term retention of the neural stimulation effect. Future studies that evaluate the long-term effects across multiple time points would strengthen the evidence for the robustness of this intervention.

    2. eLife Assessment

      This valuable cross-sectional longitudinal study leverages high-definition transcranial direct current stimulation to the left dorsolateral prefrontal cortex to examine its effect on procrastination behavior over an extended time span. The cross-sectional longitudinal study provided evidence for how stimulating DLPFC impacts reveal-world procrastination behavior. Support for the conclusions is incomplete owing to missing information about the analyses, and results, as well as some potential alternative interpretations.

    3. Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Comments on revisions:

      Overall, I think the authors made many improvements to their manuscript. There are, however, still a number of concerns that first need to be addressed, since it is still not currently possible to fully evaluate the analyses, results, and conclusions presented in the paper. I list these points below:

      (1) The authors still use causal language where they must not use causal language. This is true for many places in the manuscript; I am highlighting here just a few places, but the authors nevertheless have to go carefully through the whole manuscript to change these instances.

      Some examples:

      (a) In response to my comment (1) in the previous round, where the authors adjusted their text, the authors still use causal language in their last sentence "... procrastination behavior has been observed to impair general health..." Unless the cited study truly allowed causal conclusions, the causal language should be removed here as well.

      (b) The authors still make (causal) claims about the involvement of self-control in their observed results. To reiterate from the previous round of revisions: The authors cannot make any strong claims about the role of self-control processes because they do not directly measure self-control nor do they directly manipulate self-control or have a design that would rule out alternative mechanisms other than self-control. Therefore, their claims about self-control have to be toned down. It is laudable that the authors have added a statement towards the end of their discussion about not being able to make strong conclusions about the role of self-control. But the authors need to use similar careful wording not just at the end of the discussion but throughout the manuscript.

      (i) In the abstract, the authors use the formulation "...conceptualized roles of self-control on procrastination..." -- this wording is still too strong, suggesting that you actually studied self-control.

      (ii) In the introduction (page 4, lines162-169), the way the authors formulate these sentences suggests that they directly measured self-control. Again, the authors need to make it explicit that they are not directly measuring self-control but its hypothesized down-stream consequences on valuations/behavior.

      (iii) In the discussion, for example, on page 11, lines 555 and following, the authors write:

      "One major contribution this study has made is to disentangle the neurocognitive mechanism of procrastination by demonstrating that self-control could increase task-outcome value so as to reduce procrastination."

      Again, please be aware that you are NOT demonstrating that self-control does anything, since you only measure procrastination rates, outcome values, and task aversiveness. It is possible that mechanisms other than self-control might be relevant for this. Perhaps neuromodulation directly increases outcome values, without involvement of self-control processes. You simply cannot know that and therefore you cannot make those claims in the form that you are making them. You can write that the observed results are consistent with the idea that neuromodulation might have had an effect on self-control and this in turn might have affected outcome values. But you also need to make it explicit that, to substantiate these claims, you would need more direct evidence that indeed self-control was involved. These more careful formulations would not at all reduce the value of your work, but indeed they would rather demonstrate your carefulness in interpreting the results you obtained.

      (2) I am still puzzled by the power analysis. In the text, you write that a sample size of 18 participants (i.e., 9 per group) would be sufficient to achieve 80% power. I still feel this seems far too optimistic and hard to believe, but that is not my point here. While in the text, you write that you need 18 participants, the G*power output seems to suggest a sample size of 34, not 18. Why this contradiction? Or is it not contradictory? If it is not, then please explain it more fully.

      (3) I have several comments about the mixed-effects analysis.

      First of all, I want to thank the authors for adding more details, things have become much clearer now. However, I still have a few questions and comments related to these analyses:

      (a) The variable Emotions was within-subjects, as far as I understood. Accordingly, Emotions should most likely be modelled with random slopes varying over participants (in addition to being modelled as a fixed effect).

      (b) The analyses still cannot fully be evaluated as I cannot access the scripts and data. The authors mention that the scripts and data should be available via a link they provide (https://doi.org/10.57760/sciencedb.35140). However, when I try to access these materials via this link, no page opens; it seems the link is dead?

      (c) What are the results and conclusions if you do not include the covariates of no interest? I.e., please re-run your main models without age, gender, SES, Emotions.

      (d) The authors mention that they use GLMMs, which would suggest generalized mixed-effects models, but they do not describe what family/distribution they used. Since they mention lmerTest and seem to report F-tests, my guess is that they used Gaussian models. However, both their DVs (procrastination rates and their ratings) are bounded variables and at least procrastination rates hit the lower boundary. That can mean that their analyses suffer from inflated Type 1 and/or Type 2 rates. Therefore, please repeat the analyses with an appropriate generalized mixed-effects model (perhaps a beta regression type of model?).

      (e) When reporting the results of the mixed-effects models, the authors report the regression coefficient, standard error, DFs and p value, but not the actual test statistic. Please add the information about the test statistic and report all degrees of freedom (in case of F tests that would be the degrees of freedom of the test and the residual degrees of freedom).

      (f) Thank you for adding the analysis where you remove the last two sessions. But currently you present them in the manuscript without explaining/motivating why you do this. Please add this motivation, as otherwise it will be puzzling for the reader why you conduct these analyses.

      (4) Mediation analysis

      In your manuscript, you present some mediation analyses. Please be aware that such mediation analyses cannot establish causality and they suffer from extremely high Type 1 error rates (see, e.g., https://datacolada.org/103).

      My suggestion would be to completely remove all mediation analyses. However, if you want to keep them, then you need to be extremely careful in how you present the results. You need to explicitly mention that you cannot derive any causal conclusions from them and that simulation studies have shown that such mediation analyses suffer from extremely high Type 1 errors.

      As an example (but the mediation results are mentioned in several places, for example, also in the abstract):

      On page 10, lines 501-503: What you can causally conclude is that neuromodulation affects your measured variables (outcome values, procrastination rates, task aversiveness), but you cannot conclude that the effect of neuromodulation on procrastination rates causally operates via outcome values. Thus, please adjust the formulation accordingly. The same applies to the mediation section that follows right afterwards (page 10, lines 505-522).

      (5) In the introduction, the authors introduce several theoretical procrastination frameworks (TMT, mood repair, TDM). Do the results of the current paper help to decide which framework might be the most appropriate, at least for the authors data set? It might be of interest to address this explicitly.

      (6) The language is sometimes hard to understand and seems in quite some places grammatically incorrect. Thus, I think the paper would profit very much from thorough English proofreading.

    1. eLife Assessment

      This study provides direct and compelling evidence that lamellipodial protrusions dynamically adjust Arp2/3 complex incorporation in response to mechanical counterforces, while also modulating cellular responsiveness to upstream signals like Rac GTPase. By combining endogenous labeling, live-cell imaging, and optogenetic signaling activation, the work demonstrates how adhesion state and physicochemical perturbations reproducibly alter branched actin organization, offering a fundamental advance over previous works. The findings deliver significant insights that will resonate broadly with cell biologists and biochemists studying actin dynamics and mechanotransduction.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study describing intensity changes of lamellipodial Arp2/3 complex incorporation dependent on the substratum the cells are spreading on (PLL vs fibronectin), but also on manipulation of either contractility or osmotic pressure or even external mechanical load exerted onto cells, e.g., by increasing medium viscosity. The authors use quite fancy cell systems for their studies, first of all, a CRISPR-engineered fibroblast cell line in which both endogenous loci of the Arp2/3 complex subunit Arpc2 are tagged with mScarlet, but at the same time, conditionally removable using tamoxifen. These lines, optionally also harboring Pxn-GFP and Lifeact-miRFP670, have previously been described by the authors (Chandra et al, 2022, PMID: 34861242). In addition, they use cells allowing local photoactivation of Rac signalling through a Tiam1 activation module combined with Halo-tagged Arpc2, apparently stably co-expressed in tamoxifen-treated Arpc2-KO fibroblasts. These cells may or may not have been published previously.

      Overall, the study provides convincing evidence that Arp2/3 complex accumulation in the lamellipodium negatively correlates with its width and perhaps the mechanical load these actin networks are exposed to at the leading edge membrane, shown initially through allowing cells to spread on substrates in which the formation of integrin-based adhesions is poor (PLL) or stimulated (through fibronectin). In the latter case, lamellipodia are comparably narrow, perhaps reasonably well clutched, and thus feel sufficient counter-force at the leading edge membrane to build a dense, Arp2/3-dependent actin network. Albeit interesting and important to show as the authors did, these results are not entirely surprising given the literature published on actin remodeling in cells in conditions similar to those used by the authors (i.e., on PLL). Thus, the results should be better embedded into the context of this previous literature to more precisely reveal which aspects are new and interesting and which ones are more or less intuitive and expected.

      However, the authors also show yet another result, which is quite spectacular indeed, revealing dramatic local protrusion of a Rac-dependent lamellipodium on PLL only in the presence of methylcellulose, but not on PLL alone. Although the authors cannot fully explain the mechanisms causing these results, they are thought-provoking and will certainly stimulate future, relevant research on this topic and new insights. Altogether, I think this is an interesting study that can be shared rapidly, given that the authors provide more experimental detail and transparency concerning their used cell model systems. Aside from a few other suggestions for amendments and corrections, I would also recommend citing classical literature that has provided the basis for the interpretation of the results shown here, as specified below.

      Specific criticism and comments:

      (1) I feel the paper is interesting for actin remodeling and Arp2/3 complex aficionados, but quite difficult to read and to understand in places for non-experts in the field, so I think the text requires more detailed explanation of specific terms, model systems used, and overall correction of either grammatical or semantic errors, or colloquial language.

      (2) In general, I think the characterization of Arp2/3 complex incorporation into the lamellipodia of cells spreading on PLL versus FN is interesting, as it has not been done previously in such a systematic fashion to my knowledge. However, I think the authors could emphasize better how this relates to previously established structural features of actin filament networks, published on PLL. So more than 3 decades ago, Hotchin & Hall published clear evidence that starved fibroblasts can only form focal complexes or adhesions downstream of PDGF or LPA-stimulation if seeded on FN, but not on PLL (see Figure 1 in PMID: 8557752). Around the same time, Flinn and Ridley showed this virtual absence of classical, Rac-dependent focal complexes to be accompanied by the formation of beautiful, broad lamellipodia (see Fig. 1A in PMID: 8743960), which only formed in the absence of excess RhoA activity and thus contractility by the way (see also below). A few years later, Small et al summarized all these phenotypes in a comprehensive review and also showed that cells on PLL (similar to the rapidly migrating keratocytes) combined large, flat lamellipodia with tiny, nascent adhesions scattered throughout these structures (see Figure 2 in PMID: 10047522). These authors also noted that the sole inhibitor-mediated reduction of contractility could switch FN-phenotypes with narrow, ruffling lamellipodia and peripheral focal complexes back to a PLL-type phenotype of broad lamellipodia (see Figure 1 in PMID: 10047522). In the following decade then, different labs (Verkhovsky, Bershadsky, Vavylonis, Watanabe et al) showed beautiful phase contrast or fluorescence movies illustrating that the broad lamellipodial phenotype of cells plated on PLL was accompanied by low frequency membrane ruffling and instead a rapid, continuous rearward flow of continuously assembling actin filament networks, partly also directly shown with actin networks labeled with both LifeAct and Arp2/3 complex subunits (see e.g. PMIDs 18800171 and 22500749). In Alexandrova et al, 2008 (PMID 18800171), authors showed that the formation of adhesions in spreading cells triggers the transition from fast to slow flow (which is of course relevant to the current study and conclusions), whereas Ryan et al, 2012 (PMID 22500749) already established the broad incorporation of actin and Arp2/3 complex into the very broad lamellipodia formed on PLL by Xenopus fibroblasts and the rapid flow of both components from distal to proximal lamellipodial regions. None of these seminal studies has been cited, although they are highly relevant for the interpretation and conclusions of the results presented. I would strongly recommend specifically referring to these studies, as this will actually support the conclusions and interpretations drawn.

      (3) On the subject of literature, on the second page of the intro, end of 2nd paragraph, the authors describe Rac signaling to Arp2/3 complex through WRC considered essential for Arp2/3-mediated actin assembly at lamellipodial leading edges, but aside from one of their own papers cite none of the seminal studies by Insall, Scita, Stradal, Rottner, Bogdan labs having published seminal aspects on this pathway.

      Considering the rapid F-actin flow in lamellipodia, obviously accompanied by admittedly sparse but continuous Arp2/3 complex incorporation, it is not so surprising that the latter will be obligatory here, and also the accumulation of its prominent activator WRC, as well as the branch stabilizer cortactin. Thus, the data described on page 3 of the Results section could also be framed in the context of all this previously published knowledge, providing a more comprehensive and realistic view of the relevance and novelty of the described data.

      (4) In the abstract, the authors state in the context of the force-feedback mechanism established in vitro for the formation of Arp2/3 complex-dependent actin networks that "this phenomenon has not been explored through the examination of real-time responses of endogenous actin networks in cells". In my view, this is not correct, as in their prominent Cell paper, the Sixt laboratory has done exactly that (Mueller et al, 2017, PMID: 18800171). Although Mueller et al have not looked at Arp2/3 complex dynamics as far as I recall, they have still connected the extent and hence intensity of actin networks at the leading edges of keratocyte lamellipodia with the forces exerted onto them, including direct experimental manipulation of those forces. Although the study has been cited in an independent context, this point should be made clear, and the corresponding sentence in the abstract should be amended.

      (5) One point that struck me a little bit was the authors' detailed description of cell spreading on PLL and the quite strong variability of Arp2/3 incorporation dependent on the timing after spreading (as for instance the very strong and quite narrow Arp2/3 leading edge intensity at 2 hours post-seeding in Figure 3S2D). In the authors' view, they have worked with a very clean system, as they emphasized to even have eliminated the FN-locus in their cells, excluding the secretion of endogenous FN (PMID: 34861242), but how about ECM components potentially present in serum, such as, for instance, vitronectin? Indeed, it looks like the authors have done all experiments in the presence of 10% serum as far as I can see, although most of the classical PLL-experiments mentioned above have been performed with starved cells in the absence of serum. I think it would generate a more complete picture of the phenotypes and results as compared to the literature if the authors performed a subset of the key experiments on PLL without serum. I don't think the starving of cells as such is important and could be counteracted by simply lamellipodia-inducing growth factors adding into the spreading medium, traditionally perhaps PDGF or EGF (dependent on the receptor distribution of those fibroblasts), but the absence of serum would have two advantages: it would not only exclude any potential impact of serum-containing ECM components, but also alleviate the hyperstimulation of the Rho-pathway through LPA-bound BSA, the major serum-protein, which has previously been shown to counteract the "undisturbed" formation of PLL-type lamellipodia (see Figure 1B in Flinn & Ridley, PMID: 8743960).

      (6) Regarding the scanning EM-images shown in the Supplement, currently called Figure 3S2A and -B (in the text erroneously termed Figures 3S1A and-B, see above). I am not sure how representative these individual EM-images of the cell plated on PLL are, given the data of rapid rearward flow of actin and Arp2/3 complex subunits, at least at early stages of spreading. Again, the classical literature on PLL-type lamellipodia and, in particular, previously published movies of such lamellipodia suggest broad lamellipodia with few ruffles, and the opposite with cells plated on FN. So in this context, the scanning EM-data shown on both PLL and FN do neither fit the authors' own data very well nor the literature, and I would recommend making sure that the individual cells selected were (i) correctly annotated and (ii) representative of a specific time point of spreading actually fitting the previously described data.

      (7) It also surprised me to see that the authors describe the spreading process on PLL to actually be much slower than on FN (see Figure 3S2C - in the text Figure 3S1C). It is tempting to speculate that this might change if plating the cells in serum-free medium, as traditionally, full spreading and lamellipodia formation downstream of PDGF-stimulation (at least in 3T3 fibroblasts) is described to occur in the range of 10-30 minutes at maximum, and not several hours as shown here. This point could also be considered, or at least discussed.

      (8) The movies are of very high quality and beautiful to look at, but it would help the reader to get a bit more information in the legends (like the meaning of the time-stamps, which will display elapsed time in minutes:seconds I assume, but this info is missing from the legends as far as I can see. Also, it would help the reader to better mark in the movies when a specific treatment kicks in. For instance, in movie 10, the legend states treatment starts at 10:00 (minutes:seconds?), but it would help very much if the authors could paste the term "blebbistatin" directly into the movie, beginning with the frame of treatment start.

    3. Reviewer #2 (Public review):

      The authors work with endogenously labeled Arp2/3 complexes in mouse fibroblast cell lines plated on surfaces coated with fibronectin or poly-L-lysine. They observe increased retrograde flow, but decreased actin and Arp2/3 densities, in the absence of integrin-based adhesions. Interestingly, they further find that an increase in branching density can be achieved in the absence of adhesion by a diverse set of perturbations, including blebbistatin, physical compression under agarose, and methylcellulose-mediated increases in extracellular viscosity. Although all of these conditions are likely to have pleiotropic effects on cell physiology and signaling, one plausible common denominator is that they promote cell spreading and may thereby increase membrane tension.

      This study addresses a question of broad interest. The relationship between protrusive actin assembly, resisting forces, and membrane tension has received considerable attention in recent years (for a recent overview, see PMID: 38991476). Earlier work established that branched actin networks can respond to force by increasing network density in vitro (PMID: 26771487; PMID: 35748355), and pioneering work from the Sixt laboratory showed that keratocyte lamellipodia adapt to resisting forces by increasing actin density in cells (PMID: 28867286). Against that background, the manuscript contains novel and insightful observations. At the same time, the current version would be strengthened by a more rigorous mechanistic analysis and by clearer reporting of experimental systems and statistics.

      Major points:

      (1) Engagement with prior work on membrane tension and protrusion.

      The relationship between protrusive actin assembly and membrane tension is a subject of major current interest (PMID: 38991476), and it is unfortunate that the authors do not engage more fully with seminal prior work on this subject. In particular, work from the Weiner laboratory showed that membrane tension can act as an inhibitor of cell protrusion and branched actin assembly, at least in some cell types (PMID: 22265410; PMID: 37311454). In addition, a membrane-tension-sensitive signaling pathway involving PLD2 and mTORC2 has been proposed to mediate this negative feedback (PMID: 27280401). These findings appear, at least at first glance, to contrast with the model advanced here, in which elevated membrane tension is associated with increased branching density. A more explicit discussion of these findings and of the apparent differences between systems would be essential. Testing the relevance of some of the proposed negative-feedback regulators, for example, mTORC2 or PLD2, under at least some conditions expected to increase membrane tension would substantially strengthen the manuscript.

      (2) The central assumption regarding membrane tension should be tested directly.

      Part of the model put forward by the authors rests on the assumption that most of the perturbations used to promote cell spreading, with the exception of hyperosmotic treatment, also increase membrane tension. This is a testable hypothesis. Multiple mechanical and optical methods have been established for this purpose, including tether pulling, micropipette aspiration, and fluorescent membrane-tension probes. Directly measuring membrane tension under at least a subset of the key perturbations would significantly strengthen the manuscript.

      (3) WAVE and cortactin localization should be quantified.

      The claim that WAVE and cortactin localization are independent of fibronectin-integrin engagement (Figure 2A-B) deserves to be established quantitatively. I appreciate that some variability is expected because these experiments use exogenous fluorescently tagged constructs, but the current presentation relies too heavily on representative kymographs. Quantitative analysis would make this conclusion more convincing.

      (4) The interpretation of the increased-viscosity experiments needs stronger physical justification.

      I am aware of the recent high-profile work showing that elevated extracellular viscosity can promote migration (PMID: 36323783), and the present manuscript is clearly supporting this. However, the physical basis for this perturbation is neither well reasoned nor explained clearly enough here. The authors use 0.6% methylcellulose of the 1500 cP grade (the relevant viscosity of the final medium should be stated explicitly btw!). Estimating the added viscosity at 7 cP = 0.007 Pa·s (up from 1 to 8 cP), one can formulate the rough back-of-the-envelope calculation for the added viscous stress:

      delta τ = delta η v/h

      where τ= viscous stress (Pa = pN/µm²), η = viscosity, v= protrusion speed, h = characteristic shear length scale. For cells protruding at 1 um/min, this resistance will be 0.00001-0.001 Pa. Even if the cells would protrude 100 times faster, the resistance would not exceed one pascal! Hence, the added bulk viscous stress opposing protrusion at this viscosity appears negligible relative to the known force-generating capacity of lamellipodia. This does not invalidate the biological phenotype, but it does suggest that the interpretation should be much more careful.

      (5) Cell lines and experimental systems are insufficiently described.

      Most biological experiments in this manuscript appear to have been performed in engineered mouse fibroblast lines, but the Methods do not provide sufficient clarity about which specific cell lines were used in which experiments. More concerning, the manuscript refers inconsistently to the base model as both a mouse dermal fibroblast line and MEFs, while the only clearly distinct named line appears to be JR20 fibroblasts used for traction-force microscopy. Along similar lines, the Arp2/3 knockout cells in Figure 2 are not adequately explained in the Results, Methods, or figure legends, regarding how these cells were generated or how the knockout was validated. The authors only later note in the Discussion that these conditional knockouts were described in an earlier paper. In general, the manuscript would benefit from much more explicit reporting of which cell line or derivative was used in each experiment.

      (6) Some experiments and quantifications appear to suffer from limited replication.

      For example, the optogenetic Rac activation experiment in Figure 2E appears to have been performed possibly only for a single cell per condition, since the raw intensity traces are shown without clear indicators of variability. If that reading is correct, this is below the standard typically expected for mechanistic support and seriously reduces confidence in the strength of this particular conclusion.

      (7) Statistical reporting needs clarification.

      Although the Methods state that the graphs show 95% confidence intervals, the manuscript does not clearly define the underlying statistical unit for many quantified datasets. In several figures, sample sizes are reported as numbers of cells pooled across only two or three independent experiments, but it is not clear whether the authors performed statistical analyses on pooled single-cell measurements or on experiment-level means. The authors should explicitly state for each quantified panel what n represents, what the error bars denote, which statistical test was used, and whether the analyses were performed on per-cell values or on independent experimental replicates.

      (8) The Discussion is rather expansive relative to the amount of experimental evidence presented.

      Parts of the Discussion feel more speculative and interpretive than necessary, and the manuscript would be strengthened by focusing the Discussion more tightly on the principal findings, limitations, and immediate implications of the work.

    4. Reviewer #3 (Public review):

      Summary:

      Butler et al. investigated how different force mechanisms influence Arp2/3-related branched actin networks at the leading edge of lamellipodial protrusions in mouse dermal fibroblasts. In particular, their study aimed at characterizing the specific contribution and interplay between load force and adhesion signaling on the regulation of branched actin networks in live-cell experiments using endogenously one-labeled Arp2/3 subunit. A key finding of their work is that by plating fibroblasts on two different substrates supporting or not integrin engagement, they observe striking differences in branched network architectures that cannot be explained solely by integrin signaling. Instead, several of their results point to mechanical feedback resulting from changes in membrane tension during spreading, regulating the density of branched actin networks. Finally, by modifying the extracellular viscosity, the authors suggest that the stress generated at the actin-membrane interface would play a key role in regulating branched actin density in protrusions.

      Major Strengths:

      (1) The combination of methods used in this paper (endogenous labeling of Arp2/3, Arp2/3 genetic knockout, optogenetic activation of Rac) provides a unique opportunity to monitor spatial and temporal reorganization of endogenous branched networks generated by Arp2/3 in live cells in response to different biochemical and mechanical manipulations.

      (2) The authors provide a deep characterization of the actin-network organization and dynamics observed when plating cells on different substrates, engaging or not integrins (Figure 1 and associated supplementary: intensity and width of the signal in protrusions, retrograde flow, incorporation of actin to the edge, nascent focal adhesions), which serves as a strong basis to build the rest of the paper. They also offer a comprehensive analysis of the different parameters that could explain the lack of dense branched actin network at the leading edge of fibroblasts grown on PLL-coated surfaces (they exclude the contribution of reduced branch nucleation by NPF or insufficient branch stabilization in Figure 2, the insufficient integrin-mediated signaling activating NPF in Figure 2).

      (3) After having ruled out the influence of adhesion signaling in the regulation of branched actin-network density at the leading edge of the cells, the authors demonstrate that the enrichment of Arp2/3 at the leading edge is evolving together with cell spreading, suggesting a possible role of membrane tension in the process (Figure 3 and associated supplementary). To prove their point, they tested numerous methods to promote adhesion-independent cell spreading (Figures 4 to 6), while describing well the limitations of each of these techniques. These methods included promoting rapid spreading on PLL-coated substrate using blebbistatin or physical compression under agarose, and finally increasing extracellular viscosity by treating cells with methylcellulose. All of these treatments led to very consistent results upon the increase in membrane tension, supporting the idea of membrane tension controlling the branched actin organization of cells. This conclusion was further supported by an experiment (Figure 4 S1) in which a hyper-osmotic shock was performed, increasing the actin-membrane interface stress while keeping the spreading area of cells, which led to a drastic increase in Arp2/3 density at the protrusions.

      (4) By activating Rac optogenetically in cells plated on PLL treated with methylcellulose (Figure 8), the authors observe the formation of robust protrusions enriched in Arp2/3, showing that increased extracellular viscosity can bypass the requirement for ECM proteins to activate protrusion driven by signaling.

      Weaknesses:

      (1) Although the lamellipodial architecture in cells plated on PLL appears very different from the one developed by cells grown on fibronectin (Figure 1, wider and less homogenous), the branched network is still present, and one may wonder how these differences can affect the functionality of the lamellipodia (for example, by measuring the impact on migration in 2D and 3D systems).

      (2) To explain the differences observed in the branched actin networks developed by cells on PLL and FN, the authors envision several hypotheses, among which signaling factors or branched-promoting factors would be decreased in the absence of integrin adhesions. They could have, in addition, assessed actin network dynamics and turnover (we could imagine that competition between Arp2/3- and non-Arp2/3- driven structures could be different in the presence or absence of adhesions, the competition being nicely visible from Figure 2B and 2C, where, in the absence of Arp2/3, cells form prominent filopodia).

      (3) All of the methods used to apply physical forces on barbed ends have their own caveats and alter not only membrane tension (but the limitations are discussed in the paper). The paper may have benefited from micropatterning the cells to either reduce or force the spreading of cells in a controlled fashion. In addition, the conclusions on levels of interface stress between plasma-membrane and the barbed-ends of actin lamellipodial networks rely on an estimate of the effect of perturbations rather than on actual measurements of these stress levels.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      Although the finding that branched actin networks respond to the application of physical force by increasing their density was already known from previous in vitro studies, this paper offers a detailed and compelling characterization of the reorganization of endogenously labelled branched actin networks upon different mechanical perturbations. In addition to showing the effect of increased extracellular viscosity on promoting branched actin network densification in the absence of ECM, this paper sheds new light on the interplay between signaling and mechanics in regulating protrusion and spreading. While the authors show that both signaling and mechanical feedback are important regulators of branched actin regulation and cell spreading, they demonstrate that optogenetic Rac activation is not sufficient to trigger branch network formation in the absence of sufficient mechanical support. They thus propose that biochemical signaling would act at a different level than mechanics by promoting protrusion persistence and coherence. This work will therefore impact the field of cell biology in offering a new perspective to understand the interplay between mechanical and biochemical feedback in 2D and 3D migration. It may also have broader implications as the formation of branched actin networks under the regulation by mechanical loads has been shown to be involved in other processes such as endocytosis.

    1. eLife Assessment

      In this study, Yuan and colleagues perform transcriptomic and epigenomic experiments to study open chromatin regions and transcripts that change upon larval settlement in the sponge Amphimedon. The authors present compelling evidence to show that sponge larvae prepare for receiving an environmental cue (sunset) by extensively modifying their chromatin accessibility in the vicinity of genes that are going to be regulated during metamorphosis. The study represents a fundamental advance in understanding the fine genetic control of larval settlement and has significance beyond the immediate field of sponge larval biology.

    2. Reviewer #1 (Public review):

      Summary:

      Yuan and colleagues present a thorough study of gene activation before and during metamorphosis in sponge larvae, combining in-depth analyses of staged transcriptomes and chromatin accessibility profiling (ATACseq). Amongst several very interesting findings, the study reveals that the acquisition of settlement competence, which arises in response to decreasing light at sunset, is characterized by changes in chromatin accessibility that anticipate strong transcriptional shifts occurring as metamorphosis starts. Another notable finding is a set of transcription factors amongst the genes strongly up-regulated at the onset of metamorphosis. In addition, larvae exposed to constant light, a condition that stalls metamorphosis, were found to activate metabolic pathways that are not normally expressed in swimming larvae. Together, the findings provide a rare level of understanding into how environmental conditions can promote deployment of alternative developmental programs in planktonic larvae.

      Strengths:

      This is a very comprehensive, well-documented and rigorous study of a phenomenon of wide interest. It will inspire researchers working on other species to look for similar, environmentally-driven "anticipatory" epigenetic mechanisms. It also provides a wealth of detailed information on genes, notably transcription factors, that are candidates for involvement in regulating specific metamorphosis transitions - and beyond. The data presented here are thus undoubtedly a rich and valuable resource.

      Weaknesses:

      I see no significant weaknesses; however, the documentation of the data is very compressed, with all the findings contained in 4 multi-panel figures with succinct legends. It is not always straightforward to connect the conclusion statements in the text to the figures. Although the relevant data is available in supplementary files, I would appreciate more help in navigating the data to assess the support for key conclusions, if possible, illustrating each text conclusion explicitly in the main figures.

    3. Reviewer #2 (Public review):

      Summary:

      It is demonstrated that sponge larvae prepare for receiving the environmental cue (sunset) by extensively modifying their chromatin accessibility in the vicinity of genes that are going to be regulated during metamorphosis, in the absence of large gene expression changes. This program can be offset by modifying the cue (making light constant), leading to a novel molecular state.

      Strengths:

      This is a top-notch study of a key lifecycle transition in an organism of great phylogenetic importance, involving concurrent gene expression and chromatic accessibility profiling (to the best of my knowledge, this has never been done in non-bilaterians and likely anywhere outside Vertebrata). The result is highly non-trivial. There is also an additional experiment modifying the key environmental cue (constant light), adding additional insight.

      Weaknesses:

      I have only a couple of suggestions.

      (1) Not all new pre-emptively opened OCR regions are associated with genes that are going to be regulated during metamorphosis. Is their association with such genes statistically significant? (Fisher's exact test?)

      (2) Re: extended discussion on possible reasons for activation of specific transcription factor families. I feel it is not terribly useful since it is hardly more than guesswork. The authors should consider condensing this part to better emphasize the major (and most unexpected) large-scale regulation patterns.

      (3) Re: enrichment analysis based on significant genes (Figure 1H): Even though it is a common practice, there is nuance: as we all know very well, many genes pass a significance threshold not because they are highly differentially regulated (i.e., show large fold-change), but because they are more abundantly expressed overall and so the statistical power for them is greater. A good example is ribosomes - before we realized what was happening, they would show up as enriched in almost every experiment of ours, which was not very useful since their fold-change was quite trivial. I see the authors have ribosome enrichment too, and I suspect there are a few more functional groups that made it because they tend to express highly on average. Ideally, we want to see what is enriched among highly regulated genes, not among abundantly expressed genes. Because of this we moved to compute enrichment based only on fold-change, using the GO_MWU package (https://github.com/z0on/GO_MWU). I suggest authors give it a shot, to see if the enrichment results become more interpretable. GO_MWU is also very powerful to analyze enrichment in WGCNA modules, in case the authors want to try that.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Huifang Yan and colleagues perform RNA-seq (CEL-seq) and ATAC-seq experiments to profile the transcriptome and chromatin accessibility of sponge larvae across larval competence, settlement and early postlarval development. Amphimedon, the sponge species that they use, is amenable to lab experiments and can therefore be a convenient model for experimenting with this otherwise difficult to assay ecological parameters and cues. They had previously observed that light conditions (diminished light) at sunset are critical for larvae to enter a pre-settlement stage and prime them for settlement and metamorphosis. In this paper, they report that these conditions induce a gain of accessibility in many genes, including transcription factors, and that altering these conditions by providing continuous light at sunset affects this reprogramming event.

      Strengths:

      The above is a very interesting observation, one that the authors speculate could have a broader significance and be a theme in many more larvae. I agree with the authors that this is an important finding, and I think that the paper will be interesting for a broad readership. If this is the case, the authors open up a new theme of chromatin regulation, extensively studied in mammalian contexts, but severely understudied in pretty much every other context.

      Weaknesses:

      I think, however, that their paper often reports the data in a difficult-to-follow way, and that other sorts of analyses would have made the results more accessible for a broad readership. Here, I present some suggestions that the authors might want to take into account to improve their results.

    1. eLife Assessment

      This large-scale comparative study of odorant receptor (OR) genes across more than 100 insect species, combining sequence- and structure-based approaches, aims to explore the evolution of this large gene family involved in the detection of odorant signals by olfactory neurons. This useful work uncovers a structural feature unique to the odorant receptor co-receptor Orco that reduces ligand binding affinity. However, the strength of evidence is incomplete: the pipeline for in silico identification of odorant receptor genes lacks validation through comparison with known odorant receptor repertoires from previously studied species, and claims regarding odor response spectra, evolutionary, and ecological interpretations are not fully supported by the analyses.

    2. Reviewer #1 (Public review):

      Objectives of the study and impact of the work:

      The authors of this article primarily aim to reconstruct the evolutionary history of the insect odorant receptor (OR) family, which is responsible for the detection of odorant signals by olfactory neurons. Due to the lack of phylogenetic signal present in the sequences of this multigene family, which evolves very rapidly, phylogenetic analyses have so far never made it possible to precisely retrace how ORs diversified prior to the appearance of present-day insect orders, and what the drivers of this diversification were. For example, one may suspect that the adaptation of ORs to odors emitted by plants constituted a critical step in insect evolution during the "angiosperm terrestrial revolution," which occurred at the end of the Cretaceous, but nothing currently allows this to be asserted.

      There are very nice examples, notably in Drosophilids, derived from comparisons between closely related species and documenting mechanisms of OR adaptation to certain signals. However, what the authors attempt to do in this work is to produce a macroevolutionary analysis at the scale of insects as a whole, based almost exclusively on bioinformatic analyses. To do this, they annotated OR genes in about one hundred insect species and developed pipelines for analyzing sequence similarity, structural similarity, and functional similarity, the latter being estimated through a molecular docking approach. An important feature in the evolution of insect ORs is the emergence of a unique co-receptor, called Orco, which appears to be an OR that has lost the ability to bind odorants. In addition to the large-scale bioinformatic analysis, the authors also aim to explore more specifically the factors that favored the emergence of Orco and the selective advantage conferred by the existence of OR-Orco complexes.

      Given the importance of odorant receptors in insect biology and in their adaptation to different environments and lifestyles, retracing their evolutionary history is indeed a major question in evolutionary biology. In principle, this type of work therefore has the potential to become a reference in the field and to provide a basis for significant scientific advances.

      Major strengths and weaknesses:

      The sampling chosen for collecting OR sequences is very impressive, with more than 100 insect families represented, covering most of the major orders. This sampling appears appropriate for the question being addressed. The analysis pipeline used to collect the sequences makes sense, relying on homology-based annotation tools coupled with a structure-based filter. Nevertheless, one can note aberrant numbers of ORs for certain species (much lower than reality), which indicates that the pipeline probably did not function correctly for all genomes. In the absence of a validation step comparing the results with already known OR repertoires, it is difficult to estimate the overall quality of the data. The authors chose to apply a fairly stringent filter on sequence quality (based on predicted 3D structure), which reduces the number from 14,000 to 9,000. This choice seems logical given the subsequent use of these data, but it inevitably leads to data loss. The fact that some OR genes may be missing and that the total number may not be exact for each species is not prohibitive for studying the evolution of the family at a broad scale; however, it calls into question certain results that rely on this total number, such as the correlation between the number of ORs and genome size, lifestyle, and diet.

      From the dataset collected, the authors attempted to categorize ORs in several ways, starting with the reconstruction of sequence similarity networks. The approach is interesting, but in the end, the results do not seem to be sufficiently exploited, and it is not obvious what the advantage of this approach is compared with the "classical" phylogenetic approach, which generally fails to reveal homology relationships between ORs from species belonging to different insect orders. Here again, the majority of the clusters identified are "order-specific," and when this is not the case, the authors did not attempt to exploit the results. For example, clusters SeqC26 or SeqC28, which appear to be shared by many insects, are potentially very interesting. It might have been relevant to combine this similarity-based clustering approach with phylogenetic reconstructions within each shared cluster.

      The clustering based on structure also leads to the identification of a majority of "order-specific" clusters, but once again, the clusters shared by several orders are not truly exploited, which does not provide new insight into the evolution of ORs. However, the authors highlight a group of ORs in flies that appear to possess an unusual intracellular region. This is interesting, although it is a result more relevant to OR structure than to their evolution. The function of these ORs in Drosophila melanogaster, if it is known, is not discussed.

      The analysis of structural diversity then leads the authors to focus on the Orco co-receptors, which are characterized by modifications of the binding pocket and the emergence of an extracellular loop that could explain the loss of the ability to bind odorant molecules. This part, which relies on in vitro experiments, is interesting and constitutes the most striking result of the study, which could in itself have been the subject of a separate manuscript. However, the molecular dynamics modelling does not add anything in the way it is conducted (5 ns is too short).

      The rest of the manuscript is based on the prediction of OR response spectra using molecular docking. The work that has been carried out is extremely substantial, and the objective of linking clusters based on sequence similarity or 3D structural similarity with functional categories is entirely relevant. Nevertheless, I see two major problems with this in silico functional analysis:

      (1) The docking score threshold used was chosen thoughtfully, which is very good, and according to the calculation performed, should ensure a true positive rate of more than 20%, which is excellent in such a docking analysis. But in the absence of functional validation, this 20% true positive rate is not sufficient to extrapolate OR function as the authors do in the remainder of the manuscript. The risk of error remains too high to compare in such detail the function of ORs from insects with different lifestyles or diets.

      (2) The six functional clusters identified are only slightly different from one another, with similar detection of all chemical families except acids and amines (which was expected, given that these families are a priori detected by IRs rather than ORs). This shows that even though the approach is relevant and deserves to be tested, it cannot be used to establish a link between groups/lineages of ORs and response spectra at the scale of insects as a whole. This is reflected in the final analysis by the fact that there is no visible link between sequence or structural clusters and functional clusters. Given the uncertainty surrounding the docking results, the entire subsequent analysis of the relationship between the Binding Breadth Index and ecological variables is highly questionable.

      Finally, the evolutionary analysis proposed to conclude that the work suffers from an incorrect interpretation: ORs of non-holometabolous insects cannot be considered equivalent to those of species that existed before the Permian-Triassic extinction. The fact that a locust or a cockroach has more narrowly tuned ORs than holometabolous insects does not mean that this was also the case for ancestral insects. To advance this type of conclusion, it would be necessary to conduct a phylogenetic analysis and reconstruct ancestral states, which is not the case here.

      In summary, despite the large number of analyses performed, the authors do not succeed in achieving the stated objective of reconstructing the evolutionary history of insect ORs, and the results obtained do not sufficiently support the conclusions regarding the links between OR repertoires and environment or lifestyle.

    3. Reviewer #2 (Public review):

      The remarkable evolvability of the olfactory system enables animals to rapidly adapt to dynamic and chemically complex environments. Over the past two decades, substantial effort has been devoted to uncovering the evolutionary principles that drive the diversification of odorant receptors (ORs), yielding key insights into the forces shaping their striking variability in both vertebrates and insects. In this manuscript, Zhang and colleagues analyze the OR repertoires of over 100 insect species, leveraging sequence and structural similarity to infer patterns of gene family evolution within this diverse and ecologically important clade. By integrating sequence-based and structure-based comparisons, their study builds on a compelling and recently emerging line of research made possible by the advent of AlphaFold, which has previously clarified the phylogenetic relationship between insect Ors and the gustatory receptor gene family and revealed the unexpectedly deep evolutionary origins of this ancient structural fold.

      Applying this approach to a large set of ORs derived from species throughout the insect phylogeny, the authors confirm many previously reported patterns of OR evolution. Unfortunately, the way these results are presented lacks clarity in what is already known from previous work in the field versus what is a novel finding based on the analysis of this dataset.

      It is unclear how complete the odorant receptor sets are. I recommend benchmarking the pipeline by comparing its output to a gold standard and a frequently vetted complete OR set, such as that of Robertson and Wanner 2006 or similar.

      Using their structural clustering approach, the authors identify a structural feature mostly unique to the OR co-receptor ORco, a beta-sheet in EL2, which they functionally show reduces odorant binding affinity - a key aspect of ORco, which does not bind ligands in the ancestral ligand-binding site. This is a particularly strong part of the manuscript, since the authors support their in silico-derived hypothesis with functional data.

      Lastly, in an attempt to assess the relationship between sequence identity and structure on one hand and function on the other, the authors perform an in silico structure prediction and chemical docking analysis. As it stands, this part is on the more speculative side since the docking approach has not been verified with available functional datasets.

    1. eLife Assessment

      The study presents useful findings on the behavioral effects of nicotine exposure, suggesting the Drosophila larva as a potential model organism for studying underlying neural circuits. However, the evidence supporting the claims of the authors is incomplete and would benefit from more rigorous analysis and explanations. The study falls short of identifying the neural mechanisms and is therefore of interest to those with an interest in pharmacology and behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Dancausse et al. investigate behavioral responses to nicotine exposure in Drosophila larvae. They discover that high concentrations of nicotine lead to less movement and twitching, which recover slowly after several hours. Exposure to lower concentrations, however, increases locomotion and leads to hyperactive behavior. The authors also perform pharmacological and genetic manipulations to address the role of dopamine for these behavioral changes. Additionally, they test the role of MB intrinsic neurons by genetic silencing. Both Dopamine and MB manipulations affect responses to nicotine exposure. Finally, they investigate how larvae respond to repeated exposures to nicotine and find that they do not habituate. Additionally, repeated exposure to nicotine leads to a preference towards higher concentrations in a gradient assay.

      Strengths:

      The authors use rigorous behavioral analysis and discover interesting concentration and experience-dependent effects of nicotine exposure on locomotion in fly larvae, which will be worth investigating in the future to decipher the underlying neural mechanism.

      Weaknesses:

      As the manuscript currently stands, the results of genetic manipulations are hard to interpret and rather inconclusive. The genetic manipulations have been performed using broadly expressing genetic driver lines, which weakens the conclusions drawn by the authors. Thus, no specific neural populations or brain regions have been discovered, and there is little insight into the underlying neural mechanism.

      Based on gradient experiments, the authors suggest that fly larvae could serve as a model organism for addiction. This claim is quite strong, but no control experiments are shown for shorter exposure or a single exposure with a longer resting period before the gradient test. To compare this to addiction-like behaviors, more control experiments should be performed.

      The authors should clarify better how experiments were performed in Materials and Methods. Generally, the authors perform novel behavioral analysis, which is not explained in enough detail. The nicotine concentration that has been used for most experiments is this a relevant concentration comparable to other studies? This information would be useful to put into context with other findings.

    3. Reviewer #2 (Public review):

      Summary:

      CNS function relies on a balance of excitatory and inhibitory activity. Use of addictive stimulants such as nicotine results in a chronic imbalance of these activities, and often this activity acts through dopamine pathways. To address how stimulants cause dysfunctional signaling in the DA neurotransmitter system and how this impacts neural circuit activity and behavior, the authors of this study begin to establish Drosophila larvae as a model for studying nicotine exposure.

      They focus on three questions:<br /> (1) In what ways does nicotine-driven hyperactivation modulate behavior?<br /> (2) What roles do neural circuits play in these responses?<br /> (3) What are the mechanisms of drug dependence and addiction-like plasticity?

      To this end, the authors use high-resolution behavioral, genetic, and pharmacological methods.

      The authors show that exposure to nicotine alters the behavioral repertoire of larval Drosophila, with effects that are long-lasting (hours) and dose-dependent. Most of the study uses a 5-minute exposure to "moderate" levels of nicotine because this dosage produces the greatest potentiation of larval crawling speed. Concomitant with increases in crawling speed, they find alterations in other behavioral parameters-crawl "efficiency" and turn rate are reduced; whereas head swings are faster and more likely to be accepted. They find that reducing the activity of dopaminergic neurons reverses the valence of behavioral change upon exposure to nicotine. For example, crawling speed is decreased upon nicotine exposure in a Ple>Kir2.1 manipulation in comparison to controls. Moreover, they demonstrate that the effect of nicotine on the quantified set of behaviors depends on dopamine signaling. Beyond implicating dopamine signaling, they implicate the mushroom body, and particularly the gamma-neurons, in mediating exposure to nicotine.

      The authors further probe how nicotine exposure alters larval behavior. First, they determine what happens to crawling speed with multiple exposures, finding sustained higher crawling speeds relative to controls. Second, as a model for addition-like behavior, they examine larval behavior on a nicotine gradient after repeated nicotine exposure. The data in Figure 7D are particularly compelling, showing that after nicotine exposure, larvae prefer high concentrations of nicotine.

      Strengths:

      In a concise set of experiments, the authors demonstrate a nicotine-induced behavioral change, its interaction with a neurotransmitter system, and a locus of action within the CNS. Thus, the authors set the stage for the use of Drosophila larvae as a model to better understand addiction-related behaviors.

      Weaknesses:

      This is a clear advance for the field of larval neurogenetics, but the extent to which it changes the way we think about nicotine exposure more generally is less clear. Nonetheless, the authors clearly achieved the goal they set out to attain.

    4. Reviewer #3 (Public review):

      Summary:

      Dancausse et al. examine behavioral responses to nicotine administration in larvae. The study first distinguishes between spasms and extreme hyperexcitability elicited at high doses from a hyperactivity state triggered at lower (~1 mM feeding) doses. They then focus on the hyperactivity state and examine if dopaminergic neuron function is involved (via transgenic and pharmacological manipulations). Next, the role of the Mushroom body, a site of integration in the larval brain, is interrogated. In these studies, the authors use multiple approaches to draw complementary conclusions. The last section examines the effect of repeated nicotine exposure and of nicotine preference following repeated exposure. The findings are foundational for future studies looking to use Drosophila larvae as a system to study nicotine addiction.

      Strengths:

      Overall, I think the study is of broad importance. The neurogenetics community gets valuable insight into how ACh excitation interplays with DA signaling to regulate movement. For the addiction community, the work describes a valuable system to further interrogate genetic and environmental factors potentially driving addiction under well-controlled conditions. The quantitative analysis is generally well done, and the use of multiple experimental strategies to buttress conclusions is commendable.

      Weaknesses:

      (1) Conceptual point. Insects use ACh as the primary excitatory neurotransmitter, with nAChRs broadly expressed, while vertebrates use Glutamate in this role. (Arguably, nicotine expression in tobacco plants evolved as an insecticide, broadly disrupting the central excitatory neurotransmitter.) In vertebrates, central ACh neurons are relatively sparse - primarily originating from the basal forebrain.

      Based on these distinctions, it is important to consider/contrast nicotine-driven hyperexcitation from other methods to produce broad hyperexcitation (e.g., inhibition of GABA, high K+, elevated temperature, etc). Many of these methods to induce hyperexcitability would also modulate DA circuitry.

      A discussion of the role of ACh in insect vs. vertebrate brains is necessary to interpret the experimental design and findings with regard to addiction. These points should be addressed in the intro and discussion.

      (2) (Figure 1) Relatedly, how do the behaviors elicited in Figure 1B (30 or 60 mM) compare to the convulsions described following electroshock stimulation to induce a seizure? My suspicion is that you're essentially triggering a seizure (or seizures) in these larvae.

      (3) (Figure 4) Is a statistical analysis between the CS, Ple>Kir, Ple, and Kir locomotion at baseline done? Presumably, these manipulations would alter the intrinsic activity levels of the larvae?

      (4) (General quantitative question) How do the parameters co-vary across individuals following nicotine admin? Crawl speed and peristalsis frequency are analyzed. Turning doesn't seem to be considered. Do individuals that show large increases in velocity also show the largest reductions in turn rate? Are these relations preserved following the DA metabolism and MB function interventions?

      (5) (Discussion / general question) Beyond DA, other monoamines are involved in regulating larval locomotion - OA and TA are a clear example from Fox et al. (2006). Could the authors comment on whether they would expect similar findings in other neurotransmitter systems or if these neurotransmitter systems are involved in the ACh -> DA interplay studied here?

      (6) (Discussion) Following the establishment of nicotine preference, do larvae exhibit signs of 'withdrawal' or changes in baseline behavior when deprived of nicotine? For example, in Figure 6, does the speed following nic administration ever 'go below' the H2O line?

    5. Author response:

      We appreciate the extremely helpful feedback from the reviewers and editors for our manuscript. We are happy that the reviewers have appreciated what we are doing here, performing the initial work that should set the stage with Drosophila larva as a model for hyperactive stimulant response. Every comment is certainly addressable within a reasonably short time period and we look forward to improving our paper in an upcoming revision.

      We have some confusion about the “fundamental issue” of using nicotine, as we see the excitation as the fundamental effect we are studying, but we can continue to discuss and clarify this.

      We plan to make significant edits to our introduction and background sections to better frame the goals of the work, and will clarify and expand on our methods, and more carefully make any claims about neural mechanisms.

    1. eLife Assessment

      This work provides an important modeling-based framework for understanding the processes of temporal integration in the claustrum. These mechanisms could support a broader range of integrative brain function. The manuscript presents solid evidence for how claustrum may integrate temporal disparate signals via a novel computational phenomenon with neural dynamics evolving along neural trajectories as opposed to settling into fixed-point attractor states.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate how the anterior claustrum may integrate temporally separated task-relevant signals to guide behavior in a delayed escape paradigm. Because in vivo neural recordings from claustrum during this task are extremely limited-comprising single-trial data with small neuronal samples-the authors adopt a modeling-driven approach. They train recurrent neural networks (RNNs) using only behavioral data (escape latency) to reproduce task performance and then analyze the internal dynamics of the trained networks. Within these networks, they identify a subset of units whose activity exhibits persistent responses and strong correlations with behavior, which the authors label as "claustrum-like." Using dimensionality reduction, decoding, and information-theoretic analyses, they argue that these units dynamically integrate conditioned stimulus (CS) and door-opening signals via nonlinear, trajectory-based population dynamics rather than fixed-point attractor states.

      To bridge model predictions and biology, the authors complement the modeling with in vitro slice experiments demonstrating recurrent excitatory connectivity and prolonged activity in the anterior claustrum that depends on glutamatergic transmission. They further compare latent neural trajectories derived from previously published in vivo claustrum recordings to those observed in the RNN, reporting qualitative similarities. Based on these results, the authors propose that the claustrum implements temporal signal integration through recurrent excitatory circuitry and dynamic population trajectories, potentially supporting broader theories of integrative brain function.

      Strengths:

      This study addresses an important and challenging problem: how to infer population-level computation in a brain structure for which in vivo data are sparse and experimentally constrained. The authors are commendably transparent about these limitations and seek to overcome them through a principled modeling framework. The integration of behavioral modeling, RNN analysis, and slice electrophysiology is ambitious and technically sophisticated.

      Several aspects stand out as strengths. First, the behavioral RNN is carefully trained and interrogated using a rich set of modern analytical tools, including cross-temporal decoding, trajectory analysis, and partial information decomposition, providing multiple complementary views of network dynamics. Second, the slice experiments convincingly demonstrate recurrent excitatory connectivity in anterior claustrum, lending biological plausibility to the model's reliance on recurrent dynamics. Third, the manuscript is clearly written, logically organized, and conceptually engaging, and it offers a coherent mechanistic hypothesis that could guide future large-scale recording experiments.

      Importantly, the work has significant heuristic value: rather than merely fitting data, it attempts to generate testable computational ideas about claustral function in a regime where direct empirical access is currently limited.

      Weaknesses:

      Despite these strengths, the manuscript suffers from a recurring and substantial conceptual issue: systematic over-interpretation of model-data correspondence. While the modeling results are potentially insightful, the extent to which they are presented as recapitulating real claustral neural mechanisms goes beyond what the available data can support.

      A fundamental limitation is that the RNN is trained solely on behavioral output, without being constrained by neural data at either single-unit or population levels. As a result, the internal network dynamics are underdetermined and non-unique. Many distinct internal solutions could plausibly generate identical behavior. However, the manuscript frequently treats the specific internal solution discovered in the RNN as if it were a close approximation of the actual claustrum circuit.

      This issue is compounded by the sparse nature of the in vivo data used for comparison. The GPFA-based trajectory analyses rely on pseudo-populations and single-trial recordings, yet are interpreted as evidence for robust population-level dynamics. Because neurons were not recorded simultaneously, the inferred trajectories necessarily lack true population covariance and shared trial-to-trial variability, limiting their interpretability as genuine population dynamics. Similarly, conclusions about trajectory-based versus attractor-based computation are drawn almost exclusively from model analyses and then generalized to the biological system.

      Overall, while the modeling framework is appropriate as a hypothesis-generating tool, the manuscript repeatedly crosses the line from proposing plausible mechanisms to asserting explanatory or even causal equivalence between the model and the brain. This undermines the otherwise strong contributions of the work.

      Below are several specific points that warrant further clarification or revision:

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      This manuscript presents an interesting and potentially valuable modeling-based framework for thinking about temporal integration in the claustrum, supported by solid slice physiology. However, in its current form, it overstates the degree to which the proposed RNN dynamics reflect actual claustral neural mechanisms. With substantial revision-especially a more cautious interpretation of model-data similarity and a clearer articulation of modeling limitations-the study could make a meaningful contribution as a hypothesis-generating work rather than a definitive mechanistic account.

      Comments on revisions:

      The authors have carefully addressed the concerns raised in the initial review. In particular, the manuscript has been substantially improved in terms of tone, conceptual clarity, and the interpretation of the modeling results. The revised version now presents a well-balanced and appropriately framed account of the work.

      The study offers a compelling and useful hypothesis-generating framework for understanding temporal integration in the claustrum, and I support its publication. As a minor point, given the acknowledged limitations of pseudo-population and single-trial data, it would be preferable to slightly soften a few remaining statements that describe trajectory structure as directly "reflecting" population-level dynamics (e.g., using "consistent with" instead).

    3. Reviewer #2 (Public review):

      This manuscript reports the behavior of a computational model of rat claustral neurons during the performance of a behavioral task known as the delayed escape task (in this reviewer's understanding, this behavioral task was created and implemented by this group only). These authors have argued in a prior manuscript (Han et al.) that a group of neurons located "rostral to striatum" are part of the claustrum. The group names the region the "rostral to striatum claustrum." Additionally, in the Han et al. paper, the authors argue that these cells are responsible for maintaining a signal that lasts through the delay period.

      The main findings of the current paper are:

      (1) The authors have built a model network that was trained to show firing similar to what was reported for rats in their prior paper.

      (2) The authors' analysis of model behavior is used to suggest that the model network recapitulates biological activity, including the existence of a cluster of cells mainly responsible for the delay period firing.

      (3) The authors offer evidence from patch clamp recordings for excitatory interconnections among claustral neurons that are an essential feature of the model network.

      A major value of the computational network is that "trials" of the network can be performed. In experiments on animals, only single trials can be used.

      Concerns:

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g. in figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), the equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions intended to "test" the model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which to this reviewer's knowledge was the first to demonstrate such connectivity, including the long duration events and impact of planes of section.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Comments on revisions:

      The authors have adequately addressed the concerns that were raised in response to the first version of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for their constructive and insightful comments and agree with the importance of the points raised. We recognize that aspects of our original presentation may have been unclear or overly strong in their interpretation. We have therefore revised the manuscript to clarify our intended scope, moderate our claims, and strengthen the analysis. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript. Our detailed responses are provided below.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      We agree with the reviewer’s comment. The expressions noted by the reviewer (e.g., closely mimicked, nearly identical, recapitulate) will be replaced with alternative wording that conveys a more moderate meaning (Line 16-17, 65-66, 83, 96, 120, 212).

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer pointed out, behaviorally trained RNNs can admit multiple internal solutions that produce the same behavioral output, and we acknowledge the non-uniqueness of such internal solutions. However, we do not interpret the fact that only a subset of trained RNNs exhibit dynamics similar to those observed in the claustrum as evidence that this solution is fragile. Notably, the claustrum-like dynamics emerged spontaneously during training and were not explicitly enforced. Furthermore, our finding suggests that the emergence of this particular dynamical regime depends on relatively specific structural constraints.

      Our criterion for selecting RNNs that could inform the computational principles of the claustrum was their ability to reproduce the behavioral and physiological observations obtained in the delayed escape experiments. RNNs that were excluded may reflect information-processing strategies used by other brain regions or may rely on artificial logical structures. The computational demand of the task, which integrates temporally separated signals, naturally drives convergence toward networks with recurrent excitatory connectivity capable of maintaining persistent activity. Indeed, all networks that exhibited a claustrum-like cluster shared a common structural feature: strong recurrent excitatory connectivity within Cluster 1. This property is consistent with biological characteristics observed in the slice experiments shown in Fig 2.

      Importantly, the computational principles derived from this RNN were found to be quantitatively consistent with in vivo single-neuron activity patterns. Specifically, analysis using an eigenvalue-based metric (λ<sub>3</sub>/Σλ) revealed the same directional effect in both the RNN and the claustrum neuron data. In addition, a leave-one-neuron-out analysis showed that this pattern was broadly distributed across in vivo claustral neurons rather than being driven by a small subset (see Fig. 4).

      Taken together, these convergent lines of evidence suggest that the computational model is not simply one arbitrary solution among many possible alternatives, but rather implements a computational principle that may underlie claustral functions.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      As the reviewer pointed out, the GPFA trajectory comparison presented in the original manuscript remained largely qualitative, and we agree that this alone was insufficient to establish robustness or provide convincing evidence for population-level structure. In the revised manuscript, we have therefore added the requested quantitative analysis (see Fig. 4).

      Before describing the analysis, we would like to clarify several methodological limitations associated with pseudopopulation and single-trial data. GPFA estimates latent trajectories based on assumptions about covariance structure among neurons and temporal smoothness. In pseudopopulation datasets, the true simultaneously recorded covariance structure cannot be fully reconstructed, which is an inherent limitation. Because our dataset is based on single trials, the analysis does not directly exploit trial-to-trial variability. Nevertheless, the estimation of the latent space still depends on the covariance structure among real claustral neurons, suggesting that the inferred trajectories remain tied to biologically meaningful population dynamics.

      Accordingly, the quantitative metric we introduce is not entirely independent of the GPFA estimation step. Rather, it is intended to evaluate the geometric structure of the single-trial latent trajectories estimated by GPFA. We acknowledged this limitation in the revised manuscript.

      Specifically, for the biological data, we reanalyzed the GPFA-derived latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). For each of the 20 time bins, we applied a sliding window of 10 bins and calculated the covariance matrix within that window. The eigenvalues of PC1, PC2, and PC3 were then obtained, and the third eigenvalue (λ<sub>3</sub>) was normalized by the total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the degree to which the trajectory locally deviates from a planar structure that can be explained by two dominant axes. An increase in λ<sub>3</sub>/Σλ indicates that the population-state trajectory forms a higher-dimensional geometric structure beyond a simple two-dimensional combination.

      For the RNN data, in contrast, the activity of all units can be observed simultaneously and sufficient trial repetitions are available. Therefore, GPFA was not applied; instead, PCA was performed directly on the population activity for each trial. We then computed an average trajectory across trials and applied the same λ<sub>3</sub>/Σλ metric. Thus, although the initial dimensionality reduction steps differ between the two systems, the definition and calculation of the final quantitative metric are identical. The focus of the comparison is therefore not the dimensionality reduction technique itself, but the geometric dimensional structure of the population trajectories evolving over time.

      Importantly, within the biological dataset, the GPFA estimation procedure, preprocessing steps, pseudopopulation construction, subsampling strategy, temporal alignment criteria, and smoothing parameters were applied identically across conditions. Likewise, the same analysis pipeline was used for all conditions in the RNN. If structural biases had been introduced during covariance estimation or dimensionality reduction, they would be expected to affect all conditions within each system similarly. Nevertheless, the λ<sub>3</sub>/Σλ value was consistently and significantly higher in the CS condition than in the Neutral condition, and this directional pattern was observed in both the RNN and the claustral neuron data. This suggests that the effect reflects condition-specific differences in population dynamical structure rather than artifacts arising from a particular dimensionality reduction method.

      To further test whether the observed effect might be driven by a small subset of neurons or specific neuron combinations, we performed a leave-one-neuron-out analysis on the claustrum dataset. Recomputing λ<sub>3</sub>/Σλ while removing one neuron at a time showed that, in the CS group, most neurons contributed relatively evenly to this metric, whereas the Neutral group did not show such a distributed contribution pattern. This indicates that the observed three-dimensional structure is not driven by a few outlier neurons or incidental covariance patterns, but rather reflects an organized population-level phenomenon.

      If the result were primarily due to structural artifacts introduced by the pseudopopulation construction or dimensionality reduction procedures, it would be unlikely for consistent selective differences to repeatedly emerge between conditions under identical analysis pipelines. The consistently higher λ<sub>3</sub>/Σλ values observed in the CS condition therefore provide indirect support that this pattern reflects condition-specific population dynamics rather than estimation bias.

      Taken together, these results suggest that the observed three-dimensional structure reflects condition-specific population dynamics rather than analysis artifacts. The fact that the same quantitative metric yields consistent effects in both the RNN and claustral data further strengthens the correspondence between the two systems.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and stated that references to these theories are speculative, while substantially reducing both their emphasis and prominence in the manuscript (Line 444-446, 451).

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      We agree with the reviewer’s concern. First, we referred to the delayed escape behavioral task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals.” (Line 7-8). We also removed references to the term inference throughout the manuscript (Line 46, 51, 67, 397).

      Reviewer #2 (Public review):

      We sincerely thank the reviewer for their constructive and insightful comments. Through the revision process, the manuscript has been substantially improved, with increased reproducibility, more appropriate acknowledgment of prior work, and a clearer and more logical presentation of the study.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We agree with the reviewer that this distinction should be made clearer. In the original manuscript, we indicated in the Figure 1 legend that panels A, D, E, F, and L (left) were reproduced from Han et al. (2024). To further clarify this point, we explicitly noted this distinction again in the main text (Line 74, 85). In addition, we described the behavioral experiments and in vivo electrophysiological recordings performed in Han et al. (2024) in the Methods section and include the appropriate citation (Line 463-530).

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      We agree with the reviewer’s comment and have revised the manuscript to provide a more detailed description of the model training procedure, weight initialization, and parameter selection.

      We expanded the explanation of the model training procedure and weight initialization. Specifically, the recurrent (W<sub>rec</sub>) and output (W<sub>out</sub>) weight matrices were initialized using a Glorot normal distribution with a standard deviation of to ensure stable signal propagation during early training. In addition, we now explicitly describe the training algorithm and optimization procedure. The network was trained using the Adam optimizer implemented in TensorFlow (v2.1.0) with a batch size of 256 for 1.2 million training iterations, minimizing the per-trial loss function defined in the manuscript. We also explicitly stated how Dale’s principle was maintained throughout training: rows in W_out corresponding to inhibitory units were zeroed out, and recurrent weights were continuously constrained so that excitatory and inhibitory neurons preserved their respective positive and negative synaptic projections. To illustrate how the weight structure evolved during training, we explicitly reference Figure 2A, which visualizes the final mean inter-cluster synaptic weights and highlights the strong recurrent connectivity that emerged within Cluster 1. Regarding Equations 2 and 3 and their constants, we clarified that the target escape times used to anchor the network were based on experimentally measured behavioral latencies (48.7 s for the CS-present condition and 111.3 s for the CS-absent condition). Furthermore, the regularization coefficients (λ = 0.01 and λ<sub>FR</sub> = 0.95) were selected through a grid search procedure to maintain biologically plausible firing rates while preventing overfitting.

      We detailed the surgical procedures that were previously omitted. This includes the specific anesthesia protocol (sodium pentobarbital, 50 mg/kg, i.p.), stereotaxic mounting, and the exact coordinates for the rsCla (AP +2.95, ML ±1.95, DV -3.85 mm). To define "sparse expression," we specified that the AAV was diluted 1:4 in sterile saline. Finally, we included the precise injection parameters: delivery at 20 nL/min via a pressure injection system, with the pipette left in place for 10 minutes post-infusion to ensure adequate diffusion. (Line 635, 636-639, 641-643). We have added these contents in the Methods section. 

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We agree with the reviewer’s comment and have reorganized the figures to focus on the key results. Specifically, we separated the original figures so that they correspond to (1) Presentation of an RNN model consistent with the results of actual claustral recordings, (2) identification of dimensionality-reduced population activity patterns in the model, (3) comparison of these patterns with population activity patterns derived from recorded claustral neurons, (4) proposal of a nonlinear integration mechanism, and (5) the suggestion that such integration may be implemented through dynamic coding. Using this figure organization, we first identify RNN models trained on behavioral metrics whose dynamics are consistent with experimental claustral recordings. We then compare the dimensionality-reduced population activity patterns of these models with those derived from recorded claustral neurons to evaluate their biological plausibility. After selecting the models that satisfy this criterion, we perform further analyses that would be difficult to achieve using real neural recordings alone. These analyses ultimately allow us to propose dynamic coding exhibiting nonlinear integration as a plausible computational mechanism.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We agree with the reviewer’s suggestion and will include a reference to Orman (2015). We have clarified that neuronal activity can persist for extended periods and that such persistent activity has been observed in claustral slices prepared at a specific slicing angle (Line 144).

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustrum. Additional papers by Mathur's group and Citri's group are ignored.

      We agree with the reviewer’s comment and have revised the relevant sentences in the Introduction section.  We also included and acknowledged the contributions of previous studies by the Mathur group and the Citri group by adding additional references to their works (Line 36, 429).

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Regarding the E–E metric, we obtained the following result. When including recordings in which the whole-cell recording could not be completed, optogenetically evoked responses were observed in 38 out of 43 patched cells. This suggests that approximately 90% of the cells receive intra-claustral excitatory input. However, the current dataset does not allow us to quantify the connection probability or the strength of these connections.

      As the reviewer pointed out, the RNN developed in this study is specifically designed for the delayed escape task, and we do not intend to claim direct generalization to other proposed functions of the claustrum, such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism of the claustrum observed in this specific task. We have included this in the Discussion section. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript.

    1. eLife Assessment

      This important advancement in the field of neurotransmission delivers a novel toolkit for in vivo visualization of vesicular transporters for ACh, GABA, glutamate and monoamines in C. elegans. With the application of newly developed neuron-specific knockout methods for these vesicular transporters, the results convincingly demonstrate that over 10% of the neurons studied show transporter co-expression that may be correlated with co-transmission. These findings and toolkit will be of interest towards the study of neural circuit function.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitter-specific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) The use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offers new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons-perhaps through case study example-would strengthen the study's impact.

      Comments on revisions:

      I have no further questions regarding this work. I would like to congratulate the authors on the forthcoming publication of their manuscript. This study presents a versatile methodological framework with strong potential to advance the field of neuroscience, particularly in dissecting neural circuit function and neurotransmission dynamics in vivo.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters, and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-type specific KO to analyze function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter-resulting in fluorescence-only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Comments on revisions:

      The authors addressed my questions in the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, gabaergic and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the co-expression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Another strength is the comparison to previously published single cell sequencing data and other previously published data.

      Weaknesses:

      Co-expression of these transporters is not in and of itself sufficient to establish neurotransmitter co-release, but this caveat is acknowledged by the authors.

      Comments on revisions:

      The authors have addressed all of my previous concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitterspecific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) Through the use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offer new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      The reviewer is correct that this study does not introduce fundamentally new molecular or imaging techniques. Rather, the goal of this work is to establish a generalizable and experimentally validated framework for investigating neurotransmission in vivo at single-cell resolution. To achieve this, we deliberately integrate robust and well-established approaches, including CRISPR-based genome engineering, endogenous tagging, intersectional labeling strategies, and behavioral genetics, into a unified toolkit that enables questions that were previously difficult to address in intact animals.

      The novelty of the work therefore lies not in the invention of individual technologies, but in their systematic integration, functional validation, and deployment to reveal new biological insights, such as the prevalence and spatial organization of co-transmission in vivo.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons - perhaps through case study examples - would strengthen the study's impact.

      We agree with the reviewer that this study does not exhaustively explore the functional implications or mechanisms of co-transmission. The primary goal of this work is to introduce and share a validated set of strains that enable monitoring and cell-specific disruption of the major neurotransmitter systems in C. elegans, using molecular components that are broadly conserved across species. By establishing this toolkit, we aim to enable the mechanistic, single-cell analyses of co-transmitting neurons that extend beyond the scope of the present study but represent important next steps for the field.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-typespecific KO to analyze the function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter - resulting in fluorescence - only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Weaknesses:

      The authors evaluated the positions of fluorescent puncta by visually comparing their positions with the positions of synapses indicated by EM reconstruction. It would provide stronger supportive evidence if the authors also examined co-localization of these reporters with well-established synaptic reporters previously published by their lab, such as reporters that label presynaptic sites of AIY interneurons.

      We have now included images of the synaptic vesicle marker RAB-3 in neurons like ASE (new Figure S2) and RIB (new Figure S4D). We mention in the text that the patterns observed with VGLUT/EAT-4 (in Figure 2E) and VGAT/UNC-47 (Figure 3D) are like those observed in the Rab3 images (Figure S2 and S4D, now discussed in lines 180-182 and line 244, respectively), supporting labeling of presynaptic vesicles.

      Additionally, we now show that in the ADF neuron, a mutant for the conserved presynaptic kinesin KIF1A, results in the accumulation of VACh/UNC-17 and VMAT/CAT-1 in the cell soma and the elimination of the signal from the ADF axon (new Figure 7D-D’). These results are also consistent with the idea that these labeled transporters localize to synaptic vesicles that fail to be transported into the axon in the absence of a functional KIF1A/UNC-104 protein (lines 408-411).

      This toolkit will likely be widely used by the C. elegans community. To facilitate the adoption of the approach and method by worm labs, the authors should include their plan for the dissemination of all of the reagents included in the kit, along with all of the associated information, including construct sequences and the protocols for their use.

      We thank the reviewer or this suggestion, and in response we now: (1) have deposited all strains that we developed in this study to the Caenorhabditis Genetics Center, (2) have created a public website with sequences and genotyping information for each allele developed (https://www.intralab.app/research-papers/cuentas-condori_etal-2026) and(3) have named the tool kit, SynaptoTagMe, and included the name in the title and in the text. We also added the information of the public website to the main text (lines 140-142) and methods section (lines 540-542).

      Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, GABAergic, and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the coexpression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Weaknesses:

      A weakness of the study is a lack of comparison to previously published single-cell sequencing data. These tools are alternatively described in the manuscript as superior to the sequencing data and as validation of the sequencing data, but neither claim can be assessed without knowing how they compare and contrast to that data. It is thus not clear to what extent the conclusions of this paper are an advance over what could be determined from the sequencing data on its own. Finally, some technical considerations should be discussed as potential caveats to the robustness of their intersectional strategy for concluding that certain genes are indeed co-expressed. Overall, claims about cotransmission should be tempered by the caveats presented in the discussion, suggesting that co-expression of these transporters is not in and of itself sufficient for neurotransmitter release.

      To clarify, we do not claim that our tools are superior to single-cell sequencing data. Rather, we view the characterization of neurotransmitter identity as an iterative process of discovery and validation across complementary approaches. Moreover, while this study provides an additional lens through which to examine neurotransmitter identity, its primary advance is not in redefining transmitter identity per se, but in establishing a toolkit that enables direct, in vivo monitoring and manipulation of neurotransmitter use at single-cell resolution.

      We do agree on the importance of explicitly comparing our findings with prior studies. In the revised manuscript we have therefore strengthened this integration by:

      (1) Revising Figure S9 and its legend to indicate the source of information for each neuron;

      (2) Adding a new Table 3 summarizing neurons consistently reported to have co-transmission potential;

      (3) Adding a new Table 4 listing neurons previously suggested to be co-transmitter neurons but not consistently supported across datasets;

      (4) Revising the Results to clarify these comparisons (lines 372-374 and 381-383); and

      (5) Incorporating this discussion into the main text (lines 482–488).

      In the Discussion we also now acknowledge technical caveats of the intersectional strategy, emphasizing that co-expression of vesicular transporters indicates co-transmission potential but is not, on its own, sufficient evidence of functional co-release (lines 482–488).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The design of different recombination sites for the transporters is a key strength of this paper. While the authors have provided justification and validation for the chosen sites, it would be valuable to know whether alternative insertion sites were tested as controls. A comparative analysis of multiple sites would provide important insights, especially for the design of similar sites in other proteins or in mammalian systems.

      Our paper lists all the sites tested for labeling each synaptic vesicle transporter. To summarize this information, we have added Table 5 in the Methods section (line 591).

      (2) Given the endogenous nature of the transporter design, it would be interesting to know if the authors have observed dynamic vesicle trafficking to explain the partial overlap shown in Figure 7. A dynamic approach could better capture the potential synergism and heterogeneity of co-transmission. I recommend that the authors try time-lapse imaging to explore this dynamic process further.

      We agree that dynamic imaging approaches, including time-lapse analysis of vesicle trafficking, represent an exciting avenue to further investigate the spatial and temporal organization of co-transmission. Such experiments are part of ongoing work in our laboratory and will be the focus of future studies aimed at dissecting the dynamic regulation of transmitter-specific vesicle populations in vivo.

      (3) The paper identifies co-transmission across a significant proportion of neurons, but the functional implications and interactions of neurotransmitters released from individual cotransmitting neurons are not fully explored. A case study focusing on the uniqueness and interactions of neurotransmitter release in these neurons would provide further clarity on the biological relevance of co-transmission.

      We agree with the reviewer on the importance of dissecting the functional implications of co-transmission and understanding how different neurotransmitters interact within individual co-transmitting neurons in vivo. The primary goal of this study is to establish and share tools that enable such investigations, and we anticipate that future work, using these reagents, will examine the functional roles of co-transmission on a neuron-by-neuron basis in the future.

      (4) Minor Comments:

      (a) Figure S1D: The label "eat-4" in the eat-4::GFP image appears in italics.

      We have corrected this.

      (b) Figure 2C: The figure legend is missing the statistical significance notation (*** p).

      We have corrected this.

      (c) Figure 2D: The scale bar should be labeled as 10 μm.

      We have added the label.

      (d) Figure S4B: The image quality could be improved for better clarity.

      We have replaced the image.

      (e) Figure S8: The figure legend formatting needs attention, and the scale bar is missing in Figure S8C.

      We have added panel labels and the scale bar.

      Reviewer #3 (Recommendations for the authors):

      (1) A comparison of the results generated in this paper to the Cengen data (or other previously published data) would greatly strengthen the paper. Figure S7 seems to be a compilation of several different data sets, but this is very unclear if so, and there is no indication of which neurons are from which data, and whether there is any conflicting evidence (or what cutoffs were used to determine co-expression from Cengen). If there are indeed conflicting results, the ramifications should be discussed. Finally, given the caveat introduced in the discussion regarding the I2 neuron not expressing GABA synthesis or reuptake machinery, a more thorough analysis of which neurons identified here do or don't express other relevant genes may be warranted.

      In the revised version, we have added Tables 3 and 4 to explicitly compare our findings with CeNGEN and prior studies. Table 3 lists neurons consistently reported across independent datasets to have co-transmission potential, while Table 4 highlights neurons that have been suggested, but not consistently supported, across studies. We now also provide explicit references for each neuron in these tables and have clarified data sources and annotations in the legend to Figure S7 (now Figure S9). These additions are intended to make points of agreement and discrepancy across datasets transparent and to better contextualize our findings within existing resources.

      (2) The intersectional strategy used to identify co-expression of different transporters has some caveats that should be discussed. Specifically, removing the entire open reading frame of the eat-4 gene (as opposed to employing a T2A strategy) could potentially also remove some negative regulatory elements (for example, located within introns), leading to the inappropriate expression of the fluorescent reporter. This should at least be mentioned as a potential caveat.

      We have added this caveat into the discussion section (lines 511-513).

      (3) The colocalization experiments performed in Figure 7 seem to rely on the use of a transgenic allele (syb7882) that was not previously validated for functionality. This is only a problem because: a) another allele with a constitutive mRuby in the same position (ot907) did not seem to be fully functional in the thrashing assays (Figure S4F), and thus it is at least conceivable that the differences in localization are due to the non-functional transporters being relegated to compartments destined for degradation. Validating this strain (after panneuronal Flippase expression) in the thrashing assay would dispel this concern.

      We have performed thrashing assays with allele syb7882 (UNC-17::mRuby3 GLP-on) (new Figure S6), in which we find that labeling UNC-17 with C. elegans-optimized mRuby3 (driven by pan-cellular Flippase) results in animals whose thrashing behavior is indistinguishable from that of wild-type animals. This result is consistent with the idea that the distinct subsynaptic localizations observed between VMAT/CAT-1 and VAChT/UNC-17 in ADF neurons arise from endogenous cellular subsynaptic organization programs.

      We additionally note that allele ot907 labels UNC-17 with mKate2, not mRuby3, and that this allele is different from wild type animals in a thrashing assay (Figure S5F). The syb7882 allele that we generated labels UNC-17 with mRuby3 and is not different from wild type in a thrashing assay. We are unsure as to these distinct phenotypes between ot907 and syb7882, but note that in addition to the use of different fluorescent proteins, each allele also employs distinct linker sequences between UNC-17 and the fluorescent protein (new Figure S6). We now explain this difference in the figure legend of Figure S5 (lines 1184-1189).

      Minor comments:

      (1) Is there a difference between the strains imaged in Figures 3D and S3D? If so, this is not clear. If not, why are they shown twice, and why do they look so different from each other?

      We have replaced panel S3D with an endogenous RAB-3::mScarlet marker in RIB neurons to show that the localization of this synaptic vesicle marker parallels the punctated pattern of UNC-47::gfp11x3 reconstituted specifically in RIB neurons. See new panel S4D and line 244.

      But to explain, GFP1-10 is expressed with an extrachromosomal array, which drives variable expression of the array and can explain the difference.

      (2) Strains are alternatively denoted by their effect in the main figures, and by their allele names in the supplementary figures. This can be confusing when trying to compare data between the two figures (e.g., Figures 4C and S4F). Perhaps adding the allele names as parentheticals in the main figure might help.

      We have modified the paper to include the name of the alleles used in the panels of the main figures. Additionally, we now mention the specific alleles used for the functional assays in the figure legends.

      (3) To better understand the ramifications and efficiency of the cat-1 FLP-mediated removal (Figure 5E), it would be interesting to compare it directly to the ADF-specific removal of tph-1 referenced in the text.

      We agree that a direct comparison between the FLP-mediated removal of cat-1 and ADFspecific removal of tph-1 would be informative for assessing the efficiency and functional consequences of these manipulations. These experiments represent an interesting direction for future work, and we plan to pursue such comparisons in subsequent studies.

      (4) ADF seems to express very low levels of cho-1 (reuptake transporter), based on the images in Figure S8. Does it express higher levels of cha-1 (synthesis)?

      We have not directly compared the relative expression levels of cho-1 and cha-1 in ADF neurons in this study. Such quantitative comparisons of synthesis and reuptake machinery represent an interesting direction for future work but fall beyond the scope of the present manuscript.

    1. eLife Assessment

      In this important study, the authors used a zebrafish model and scRNAseq analysis to show that a subset of keratinocytes within melanoma microenvironment highly up-regulate Twist and undergo Epithelial-Mesenchymal Transition (EMT). Surprisingly, when overexpressing Twist in keratinocytes, the resulting alteration in keratinocytes is inhibitory for melanoma invasion in both zebrafish and human cell culture models. The results are supported by convincing experimental data that provide new insights into the interactions between melanoma cells and their environment.

    2. Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      Strengths:

      Use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if these is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses noted.

      Comments on revisions:

      The authors have adequately addressed my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulate the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveal that, in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. The authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells. Finally the authors established that similar keratinocyte phentypical changes also occurs in human melanoma tissues, setting a scene for future clinically relevant studies.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression. Using GeoMX spatial transcriptomics platform to interrogate a series of early melanoma precursor lesions, enabled the authors to demonstrate similar EMT phenotype in keratinocytes also occurs in humans.

      Weaknesses:

      Due to limitations of the current model, no EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment. However the authors compensated this through using spatial transcriptomics platform to interrogate a series of early melanoma precursor lesions in humans.

      Due to technical limitations, it remain to be determined whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development.

      Due to technical limitations, none of the gene expression changes detected through Q-PCR or scRNA-seq were examined using immunostaining or in situ hybridization, hence cellular resolution spatial information is lacking.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development. Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an exciting new line of research that warrant future studies.

      Comments on revisions:

      The authors have provided additional evidence to support their original conclusions, and the inclusion of spatial transcriptomic analysis using human samples strengthens the study. I did not identify any further issues that require attention.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which lead to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      Authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration.<br /> Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      (2) Use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and increase in vimentin in human samples would strengthen the author's findings.

      (3) Characterise better the cell-cell junctions and borders between cells (melanoma/ keratinocytes) with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      Comments on revisions:

      The authors answered most of the concerns raised.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      We appreciate your support for our overall conclusions.

      Strengths:

      The use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating the utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single-cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT-induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if this is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences are appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulates the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveals that in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. Finally, the authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression.

      Weaknesses:

      While the scRNA-seq analysis of melanoma tissue and RT-PCR analysis of EMT gene expression in isolated keratinocytes provide evidence that a subpopulation of host keratinocytes upregulates Twist and other EMT marker genes and potentially undergoes EMT, the in vivo evidence for keratinocyte EMT within the melanoma microenvironment is based on cell morphology in a single image without detailed characterization and quantification. No EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment.

      We agree this needed better support. To address this, we have collaborated with the Sorger lab who has performed Spatial Transcriptomics on early human melanoma samples (n=8 samples). The advantage of this method is that they can dissect microregions of interest (MRs) RNA-seq to discern keratinocytes vs. melanocytes. We queried regions that had higher or lower numbers of atypical melanocytes in these biopsies with our TAK or TWIST signature. While the normal sample had no enrichment, we found that a subset of the human samples had evidence of these signatures in the keratinocytes, particularly the ones which had a higher proportion of atypical melanocytes. These data support our model that early melanomas enact an EMT like program in a subset of nearby keratinocytes.

      The scRNA-seq UMAP suggests the proportion of EMT keratinocytes within the melanoma microenvironment is very small, raising questions about their precise location and significance within the tumour microenvironment. Although both in vivo and in vitro evidence demonstrates that Twist-expressing keratinocytes can suppress melanoma progression, the conditions modelled by the authors involve over-expression of Twist in all keratinocytes, which do not naturally occur within the melanoma microenvironment and, therefore, might not be relevant to naturally occurring melanoma progression. The author did not test whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development, which would establish the role of Twist expression keratinocytes in the melanoma microenvironment.

      We entirely agree, and ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      To address the potential mechanism by which Twist-expressing keratinocytes suppress melanoma progression, a second scRNA-seq analysis was conducted. However, this analysis is not adequately presented to provide strong evidence for proposed mechanisms for how Twist-expressing keratinocytes suppress melanoma cell invasion. CellChat analysis was used to attempt to identify receptor-ligand pairs that might mediate keratinocyte-melanoma cell interaction, but the interactions between tumour-associated keratinocytes (TAK) and melanoma cells were not included in the analysis. Furthermore, although genetic reporters were used to label both keratinocytes and melanoma cells, no images showing the detailed distribution and positional information of these cells within melanoma tissue are presented in the report. None of the gene expression changes detected through Q-PCR or scRNA-seq were validated using immunostaining or in situ hybridization.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development.

      Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an interesting observation.

      Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which leads to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      The authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration. Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      The CellChat data suggest that Jam3b is likely important in melanoma development, as it has been shown to be important in melanocyte development (Eom, Dev Biol 2021). Studying this specifically in melanoma progression is an area of ongoing study in our lab, and we have begun to generate the Jam3b knockouts as you suggested. Since this set of experiments is quite extensive, we feel this set of data deserves a separate manuscript, which we hope to complete in the near future.

      (2) The use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and an increase in vimentin in human samples would strengthen the author's findings.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      (3) The cell-cell junctions and borders between cells (melanoma/ keratinocytes) should be characterized better, with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      We have now added higher resolution images of these junctions. Our overall hypothesis, related to point (2) above, is that Jam3b mediates these junctions between melanoma cells and keratinocytes, which is why we are now pursuing this in a followup study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please say a little more about any phenotypes that might have been evident inTwist-overexpression fish in the absence of melanomas, and clarify in the text that these were mosaic animals, as a first (incorrect) reading left the impression that stablelines had been made.

      In these experiments, we co-injected the melanoma plasmids along with the krt4-TWIST plasmids, creating mosaic animals. Because of this, we did not have a way of specifically looking at the effect of TWIST in the absence of melanoma. We agree this needs better clarification and have added this to the Results.

      (2) Violin plot colors in main and Supplementary Figures tend to obscure data points. Colors for keratinocyte clusters are not discernible in Figure 4C.

      We have remade the plots in a different color scheme to try and make these stand out more easily.

      (3) Clarify that N-cadherin = cdh2 in Figure 1

      We have fixed this in the legend for Figure 1.

      (4) Clarify the relationship between keratinocytes highlighted in Figure 2B and used for Hallmark expression in Figure 2B, and those analyzed for expression of candidate genes in Figure 2E. The last shows many NKC whereas whereas even the larger group circled in Figure 2B as keratinocytes seems to have far fewer cells, unless massively overplotted. Is the rest of that cluster in Fig. 2B keratinocytes as well?

      In the analysis in Figure 2E, we first calculated genes differentially expressed in the TAK vs. NKCs (found in Figure 2B). We used those genes as input into GSEA analysis, which showed enrichment for EMT programs specifically in the TAKs. We recognize that the number of TAKs is relatively small (compared to all of the other cells in the single-cell UMAP) but that is the most we were able to get from this particular scRNA run, because the melanoma cells naturally make up the vast majority of the cells in the 10X run. This is why we performed downstream mechanistic analysis (in the rest of the paper) to ensure this result was not an artifact of a small number of TAKs.

      (5) Define "NES" in the Figure 2 legend.

      NES indicates “Normalized Enrichment Score”, a standard output of GSEA. This has been added to the legend.

      (6) Indicate how many control vs. Twist+ fish were found to have invasive vs non-invasive tumors upon histological examination. Were tumors in the latter fish always contained within the epidermis proper, or did some extend deeper if given enough time?

      In the histology analysis, we used n=3 control fish and n=3 TWIST overexpressing fish. Main Figure 3 shows n=1 of these fish from each group, and the other n=2 from each is shown in Supplemental Figure 1. In this cohort (taken at 26 weeks), all of the TWIST tumors were contained within the epidermis, but we did not let them grow longer to see if (given enough time) they could have invaded below this. Around 26 weeks, the survival decreased so made this an unfeasible experiment at later time points. We have added a statement about this to the Results section.

      Reviewer #2 (Recommendations for the authors):

      Going through the data presented in the figures, here are my comments:

      (1) Figure 1: To strengthen the evidence that keratinocytes in the melanoma microenvironment undergo EMT, it would be beneficial to provide immunostaining or in situ data for EMT marker genes within melanoma tissue sections co-stained with a keratinocyte marker (such as an anti-GFP antibody).

      We agree this type of analysis is an important validation of our findings. Doing this in zebrafish tumors is difficult, as human/mouse antibodies for EMT marker genes typically do not work in fish. In addition, we felt that validating our results in human melanomas would make our findings more generalizable. Therefore, we established a collaboration with Peter Sorger’s lab, who have been performing high-resolution spatial transcriptomics on early melanoma samples from humans. While these are difficult to attain (since most early lesions are processed for clinical diagnosis) they have a collection of n=8 samples that they subjected to GeoMX spatial analysis. In this method, the samples are first stained with antibodies to definitively mark keratinocytes (PANCK) vs. melanoma cells (SOX10) and all samples are reviewed by expert pathologists. From this, microregions (MRs) of interest are selected to then undergo RNA-seq. After control analysis to ensure both keratinocytes and melanocytes were present in the samples, they then used our TAK or TWIST signatures as a query. Both signatures were enriched in the keratinocytes adjacent to early melanomas, but not in normal skin samples or in samples with few atypical melanocytes. This provides further evidence that the altered keratinocytes we see in our fish are present and enriched in human biopsy specimens.

      (2) Figure 2: In panel B, the UMAP shows the separation of single cells, and keratinocytes are circled. However, there are two clusters of keratinocytes, and the graph does not indicate which cluster represents tumour-associated keratinocytes (TAKs) versus normal keratinocytes (NKCs). The two clusters also appear to differ in abundance, so it would be helpful to report the proportion of keratinocytes that are TAKs undergoing EMT, according to the individual dots in Figure 2E. In Figure 2E,TAKs seem to have very few cells compared to the other clusters. Given the relatively small number of EMT-TAKs detected in the single-cell RNA-seq data, I wonder how much direct influence these cells could exert on the bulk of melanoma cells in vivo.The evidence would be strengthened if an IHC analysis could show the location of Twist-expressing keratinocytes within the melanoma microenvironment and whether they are associated with certain melanoma cell markers but not others (i.e., markers indicating different differentiation states of melanoma cells). To further support the role of Twist-expressing keratinocytes in the melanoma microenvironment, it would be beneficial to perform a knockout (KO) of Twist in keratinocytes within the melanoma microenvironment.

      In Figure 2B, we agree that the color scheme made it difficult to discern TAKs vs. NKCs.

      We have changed the color scheme to make this more clear.

      The number of TAKs undergoing EMT is relatively small, and this is why we performed the overexpression studies of TWIST in order to expand the field of keratinocytes undergoing EMT. To get at the question of whether these are really important in tumor initiation and progression, we ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would not be expected to be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      (3) Figure 4: Co-culture results show that melanoma cells migrate further on a control HaCaT cell monolayer compared to a TWIST-overexpressing HaCaT cell monolayer. While this phenotype might support the conclusion that TWIST-expressing keratinocytes reduce melanoma cell invasion, it should be interpreted with caution. The data can be interpreted as TWIST-HaCaT cells inhibiting melanoma cell migration; however, an alternative explanation cannot be ruled out. For example, wild-type HaCaT cells might provide a suitable substrate for melanoma cells to migrate, whereas TWIST-HaCaT cells lack this property. To address this, the baseline melanoma cell migration should be established in this assay by coating the plate with cells from the same melanoma cell line and allowing melanoma cells from the flipped cover slip to migrate out.

      We have performed the experiment you suggested using Hs.294T and SKMEL2 cells and provided this as a new Supplemental Figure 2. This demonstrated that the melanoma cells in this context could indeed migrate out of the coverslip at baseline. Thus, it is possible, as you indicated, that the phenotype we have observed might be due to something lacking in the TWIST keratinocytes that promotes migration. Since we cannot differentiate between these two possibilities (i.e. that TWIST KCs actively inhibit migration vs. lacking something that promotes migration), we have modified the text to indicate both of these possible mechanisms could be at play.

      (4) In the representative images shown in the figure, it appears that both HaCaT cells and melanoma cells in the upper and lower panels are at very different densities."Contact inhibition" and "cell sorting" are well-known phenomena in tissue-cultured cells, so when cells are seeded at different densities, their ability to move away from the initial location could vary. From the Materials and Methods section, it is unclear why cell densities are drastically different in the images presented. Images in the upper panel show both melanoma cells and keratinocytes at lower densities, and in the TWIST group, melanoma cells under the cover slip appear to aggregate into clusters with TWIST-expressing keratinocytes surrounding each aggregated cluster. This suggests that cell sorting might be occurring, potentially mediated by cadherins or Eph-ephrins.

      We recognized this discrepancy as well. In the setup of the experiment, we seeded the exact same number of cells for both the Hs.294T (Figure 4E) and SKMEL2 (Figure 4G) experiment. But when we took the images after 20 hours of co-culture, it was clear that the HaCat densities were different, as seen in the figures. We suspect this might be because these two melanoma cells may secrete different factors (i.e. growth factors) that impact upon HaCat proliferation, adhesion or cell sorting. Despite this, in terms of the ability of the melanoma cells to migrate into the HaCATs, we saw similar results across both experiments, suggesting that it is not HaCAT density alone that explains the results. But we agree we need to clarify this point about cell density more clearly in the manuscript, and we have amended the Discussion to indicate the above points.

      (5) Figure 5: Single-cell RNA-seq analysis comparing cells from control melanomas with cells from melanomas developed in a Twist-expressing keratinocyte background could provide valuable information on how melanoma cells alter their phenotype and how Twist-expressing keratinocytes respond to melanoma development. However, the information presented in the manuscript is not persuasive in this regard (appears to be minimal).

      (a) In Figure 5C, the differences between melanoma cells in a control background versus those in a Twist-expressing keratinocyte background include cells from more than one unique cluster, but most of the different clusters are not discussed, except for one prominent cluster indicated by an arrow.

      The reason we pointed out that one cluster is that it was the major thing that was different in the control melanomas vs. the TWIST melanomas. To better clarify this point, we have made a new Supplemental Figure 3 comparing the clusters in each situation: 7 in the control melanomas vs. 8 in the TWIST melanomas (Supp. Figure 3d). To then better understand the nature of the TWIST melanomas, we performed Gene Set Enrichment Analysis (GSEA) compared to the control melanomas. Interestingly, this revealed a striking enrichment for pathways related to oxidative phosphorylation using both GO and Hallmark terms. Because we had previously shown that melanoma cells with high ox-phos are typically in the more melanocytic and less invasive state (Lumaquin-Yin, Nature Communications 2023), we therefore analyzed our TWIST melanomas by comparing this unique cluster to the well-annotated melanoma cell state signatures from Tsoi et al (Cancer Cell, 2018). This showed that most of the TAKs and TWIST-KCs were in the melanocytic/transitory cluster, which are thought to be the least invasive of all the melanoma cell states. Thus, it seems likely that high levels of TWIST in the keratinocytes induces a low invasion state in the melanoma cells. We have added this data and interpretation to the Results and Discussion sections of the manuscript.

      (b) In Figure 5D, it is unclear whether TAKs include both wild-type keratinocytes and Twist-expressing keratinocytes. 

      We oversimplified this plot for the sake of visualization, but realize that in doing so we obscured some important details. In the plot, we separate normal keratinocytes (NKCs) vs. tumor associated keratinocytes (TAKs). TAKs are, by definition, TWIST<sup>hi</sup>/EMT<sup>hi</sup> and represent upregulation of endogenous TWIST. In contrast, when we force overexpression of TWIST in the keratinocytes, then we see an entirely new cluster appear, as expected. 

      (c) In Figure 5F, TAKs are interacting with melanoma cells so it is unclear why the CellChat analysis did not include TAKs. 

      This was an oversight on our part, and the Figure has now been corrected to include this. TAKs in both the control and TWIST melanomas have numerous interaction partners, whereas the TWIST-KCs have relatively fewer and more specific interactions.

      (d) Finally, Figure 5G needs clearer labelling,currently unclear which gene is expressed by the sender and which is by the receiver.

      This has been clarified in Figure 5F with specific indicators of “sender” vs. “receiver”.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1E - in this figure, it is possible to observe the altered morphology of keratinocytes but these cells are not in the vicinity of the melanoma cells - can authors please make a zoom-in in the region of the interface? And quantify the distance between cells - at least the image they show looks like the cells that are mostly de-formed are far away from the melanoma but perhaps was just this example....please clarify. Or there are patches of keratinocytes that go through EMT and others that maintain their epithelial structure?

      We have now added zoom-in images of the interface (Figure 1E). In nearly all sections examined, some keratinocytes maintain their hexagonal normal epithelial structure, but the majority of the cells appear altered. We have attempted to quantify this effect, along with the distance between cells with this EMT-like morphology, but have not found a reliable method given the heterogeneity across samples. That is why we instead chose to quantify the EMT-like keratinocytes (what we refer to as TAKs) using single-cell RNA seq, which showed that 32% of the population had the TAK signature, whereas 68% resembled normal keratinocytes. We feel this is more quantitative than imaging alone.

      This data has been added to the Results section.

      (2) Figure 3B - could not find the number of fish analyzed.

      This was an oversight on our part. We studied n=135 control melanomas vs. n=118

      TWIST melanomas. This data has now been added to Figure 3B.

      (3) Figure 3D - missing a graph with quantification and zoom images in the tail keratinocytes/ melanoma interface.

      In this particular cohort of animals, we unfortunately did not specifically track body vs. fin melanomas, so we are not able to quantify this.

      (4) Figure 4 - it would be nice again to have a zoom-in to observe the interface of cells- maybe use a phalloidin staining to visualize better how cells are touching each other.

      We have added a zoom in image of the interface to the image (Figure 4E). We have very much wanted to do immunohistochemistry (not just for phalloidin, but for other markers as well) on these coverslip co-cultures and have tried, but we have not been successful. This is likely because the assay requires plastic plates, which are incompatible with doing this, but agree that getting this to work would be an important area for future development.

      (5) I believe the paper deserves a last figure - with the model.

      We agree and this has now been added as Figure 7.

    1. eLife Assessment

      This important work advances our understanding of the single neuron coding types in the mouse gustatory cortex and the functional roles of these neurons for perceptual decision-making. The conclusions are based on compelling evidence from rigorous behavioral experiments, high-density electrophysiology, sophisticated data analysis, and neural network modeling with in silico perturbations of functionally-identified units. This work will be of broad interest to systems neuroscientists.

    2. Reviewer #1 (Public review):

      The manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One minor weakness is the mismatch between the neural analyses and behavioral data. Neural analyses (i.e. population activity trajectories) indicate a separation of the neural activity associated with each mixture. Given this analysis, one might expect the psychometric curve to have a significantly steeper slope. One potential explanation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity matched concentrations. In this case, a single stimulus can (theoretically) dominant the perception of a mixture resulting in a biased behavioral response despite accurate concentration coding. Given the difficulty of iso-intensity matching concentrations, this concern is not paramount.

    3. Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I initially had some suggestions for further analyses to clarify the contribution of constrained and unconstrained units. In the revised version, the authors have performed all the suggested analyses, further strengthening their conclusions.

    4. Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision making tasks, reflecting both sensory and decision variables. In the present study, Lang et al., set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods with regard to the behavioral task and electrophysiological recordings/data analysis are straightforward, solid and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced choice task that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small populations of neurons with specific tuning profiles were sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. At a single trial level activities evoked by similar, intermediate mixtures are much harder to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimulus to guide the correct decision. The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised manuscript we discuss this issue.

      Reviewer #1 (Recommendations for the authors):

      (1) The apparent mismatch between neural and behavioral data. I am providing more details in this section to hopefully better illustrate my concern.

      (a) Based on the author's psychometric curve, sucrose appears to be a more salient signal causing the behavior to be shifted (e.g., a 50/50 mixture results in a >60% predicted behavioral performance). If both sucrose and salt were intensity-matched, a 50/50 mixture should result in a behavioral performance near 50%. The increased salience of sucrose could cause the animals to have lower overall performance despite accurate neural encoding. Alternatively, certain animals could display a strong side bias, skewing the data slightly. These issues have seemingly been fixed in the model data, which displays a more balanced psychometric curve. Accordingly, the model data seemingly displays a larger shift in error trials as compared to correct trials (Figure 6A).

      The reviewer is correct in observing that the average experimental psychometric curve in Figure 1B shows a slight shift in favor of the sucrose side with a 50/50 mixture. We fit psychometric curves to each session and the mean value of P(Sucrose choice | Stimulus = 50/50) across sessions was significantly different from 0.5 (one-sample t-test, p = 0.003), with 5 probabilities below 0.5 and 18 above it.

      This slight bias could be attributed to a slight mismatch in the perceived intensity of the mixture components and/or lateral motor biases. In any case, it is subtle and its origins were not a focus of this study.

      Models were not trained to match the animals’ psychometric curves, but rather to choose correctly in an ideal scenario where stimuli have matched intensities. This explains why the model simulations lack the bias observed in animal behavior data.

      We do not believe that there is a mismatch between the experimental behavioral and neural data, as trial-averaged pseudo-population trajectories are farther in neural space for more discriminable stimuli and closer in neural space for more similar stimuli, consistent with behavioral performance that is high for more discriminable stimuli and low for more similar stimuli. Moreover, as the model also shows, a clear separation of trial-averaged trajectories still results in a sigmoidal performance function for trial-to-trial behavior.

      Finally, subtle behavioral biases would not necessarily be expected to appear in our dPCA analyses since we used this technique to find a single axis that best separates all stimuli conditions regardless of choice when the pseudo-population data are projected upon it. Additional modes of activity that explain less overall variance might better reflect biases.

      (b) Although I am not an expert at these analyses, I wonder whether the elevated bump (i.e., >0) in Figure 3C of the 55/45 mixture that occurs early in the stimulus presentation further supports the hypothesis mentioned above and could indicate an early signal of salience/increased intensity?

      The reviewer is correct that the 55/45 trajectory features a brief positive wave right after stimulus delivery before going negative. While this may be related to stimuli not being explicitly balanced for intensity, it could also reflect a signal related to ambiguity or balanced mixtures. We are hesitant to interpret this positive deflection as conclusive evidence of a bias in neural activity, given its short duration and the natural variability of neural signals.

      (2) The increase in step-perception neurons after the decision period is confusing (Figure 4C). The text states (line 246) "the analysis reveals a small and time-invariant proportion of step-perception neurons". However, the proportion doubles after the decision-making process, which is seemingly a significant change. Why does this occur? This observation is noticeably missing from the network data. Could it be attributed to a mislabeling of "step-choice" neurons, given the correlation between the left/right decision and sweet/salty? Either way, it is very noticeable and should be addressed.

      We cannot be sure of the reason for the increase in step-perception neurons after decisions. One possibility is that they are acting as feedback for learning, encoding the percept to compare with choice and outcome to improve performance. The model, which presumably learns the task differently from the animals, does not seem to leverage this signal for its own learning. We have modified the text, now referring to a “small but consistently present proportion” of step-perception neurons, and included this proposed explanation in the Discussion.

      (3) Optional: I think the authors are missing an opportunity to analyze the temporal aspect of this multiplex code using their network-based modeling approach. A significant proportion of neurons fall into different categories (i.e., step-perception/linear, etc.) at different time points. However, the virtual ablation experiments remove any neuron that falls into one of these categories at any time. By limiting the cell-specific virtual ablation to specific time windows, you could (I think) provide stronger evidence for the temporal sequence of the encoding of these perceptual aspects.

      This was an excellent suggestion for an additional modeling experiment, so we performed it. A new supplemental figure (Figure S8) and additional text in the revised manuscript showcase the results. In summary:

      In terms of behavioral results, ablating the linear coding units in the beginning (that is, silencing all units that are labeled linear in any bin within the first 1.2 s after stimulus onset for the entirety of the 1.2 s) significantly reduces performance, as does ablating the step-perception or step-choice coding units at the end (1.2 s prior to choice). The remaining combinations of coding type and timing of the ablation do not affect performance.

      Regarding the dynamics of coding types (compare Figure 7A), stimulus coding activity was significantly blunted only by ablating the linear coding units in the beginning, whereas choice coding activity was diminished by ablating the choice coding units at the end or by ablating the linear coding units at either the beginning or the end.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units are ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units was fixed at 5.88, thus constrained units were ~17% of the network and unconstrained units were ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we have reported it in the Results of the revised manuscript for clarity.

      We have also edited the Methods because they wrongly stated that the ratio of unconstrained (rather than total) units to constrained units was 5.88.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we have specified the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: linear = 194 constrained and 691 unconstrained units; step-perception = 147 constrained and 840 unconstrained units; step-choice = 129 constrained and 814 unconstrained units; “other” = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we have included two additional supplemental figures (Figures S5-6) where the analyses of Figure 6 are carried out separately for constrained and unconstrained units. In short, the results for the constrained units strongly resemble those for the experimental data, while the results for the unconstrained units strongly resemble those for all model units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version includes a supplemental figure (Figure S7) with the results of these additional ablation simulations.

      In summary:

      In terms of behavioral performance, the prior results showing that ablating linear, step-perception, or step-choice units significantly impairs performance, while ablating “other” has no significant effect, hold even if ablation is restricted to only constrained or only unconstrained units. There is a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs performance more, most likely due to their larger population size.

      In terms of dynamics, to impair stimulus coding by ablating step-choice units, you must ablate them all; to impair stimulus coding by ablating linear or step-perception units, however, ablating just the unconstrained ones suffices. As before, ablating linear, step-perception, or step-choice units significantly impairs choice coding activity, while ablating “other” units does not; these results hold even if ablation is restricted to only constrained or only unconstrained units. Finally, there is again a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs dynamics more, most likely due to the larger population size.

      Reviewer #2 (Recommendations for the authors):

      (1) In addition to panel 5B, it would be informative to show data from individual mice and the corresponding RNNs trained on each mouse, to assess how closely they match. If available, including one representative example of a good match and one of a less accurate match would help the reader get a better sense of the data.

      Figure 5B shows the average behavioral performance of the model. Individual models were not trained directly on the psychometric curves of experimental sessions; they were trained to perform the task correctly. After successful training, model simulations were run with input noise to be able to produce a sigmoidal psychometric curve. However, although the input noise was tuned to capture the overall correct rate of the corresponding experimental session, we did not attempt to match the details of the psychometric curve. See also the next reply.

      (2) In addition to panel 5C, it would be useful to add examples of experimentally observed PSTHs and the corresponding activity trajectory for the units in the RNN trained to match them, for all the other coding patterns (step-perception and step-choice).

      We note that the PSTH in 5C is not an example of a linear coding unit as the Reviewer implies, but simply one with a good fit, and here the model’s output was produced in the absence of input noise. In order to classify step-perception and step-choice responses one needs error trials, but the model was trained without this input noise that induces errors (and produces a sigmoidal psychometric function) to match experimental PSTHs from correct trials only. Post-training simulations were then run with input noise to induce error trials, and model unit response profiles were classified based on this. However, there is no guarantee that error trials in the model match the error trials in the experiment; therefore, step-perception and step-choice units in the model may or may not be step-perception and step-choice units in the data. Despite this limitation, the revised manuscript includes additional examples, in Figure S2, of experimentally observed PSTHs and their corresponding model activity, to supplement Figure 5C and provide a better sense of the goodness-of-fit.

      (3) Electrophysiological data in Figure 2 - It would be helpful to provide statistics on how many neurons change their activity in each session.

      In the revised manuscript we have included across-session statistics for proportions of neurons that are taste-responsive and that show decision preparatory activity. We have also included tables (Tables S1 and S3) with the numbers of neurons that are taste-responsive and that show preparatory activity for each session in the experimental and model data.

      (4) Peak auROC selection - How was the peak auROC selected? Selecting only one bin for the peak could be potentially problematic and may result in the incorrect identification of an outlier that does not faithfully represent the neuron's overall activity. The peak selection could instead be based on several consecutive bins showing a consistent trend. If this approach was already implemented, the authors should explicitly describe it in the Methods section.

      Peak auROC was selected from a single bin (with average duration about 50ms). While it is true that this may result in outlier neurons that transiently prefer one stimulus strongly but more consistently prefer the other, we opted for a simple criterion to sort the neurons into two categories for visualization. Adopting more stringent criteria that consider multiple bins may result in neurons that cannot be placed in either category, and we wanted a way to examine the entire pseudo-population. Also, the entire auROC trace is visualized in the heatmap, so potential outliers are not hidden and can be assessed by eye.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I'm missing a clearly stated specific hypothesis and what is predicted on the basis of that hypothesis. What is the alternative?

      The null hypothesis is that single neuron activity patterns, even when clearly structured, do not matter for population activity or behavior. Alternatively, they do matter for these phenomena, and our model supports the alternative hypothesis. We have made this hypothesis clearer in the Introduction.

      (2) Discussion: Much of the text is a recap of the Introduction and Results sections. Please elaborate on the specific insights gained from the findings. The idea that tuned neurons in the sensory cortex are the basis for perception and perceptual decisions concerning the features being represented by those neurons is generally accepted. What the present study adds to this insight could be described more explicitly. On the other hand, the idea that small populations of tuned neurons are responsible for perception of taste/perceptual decisions about taste appears in contrast with previous accounts where stimulus features/decisions are reflected in correlated changes in activity across distributed populations of taste cortical neurons, including ones that are not necessarily tuned or even overtly responsive. How do the present findings relate to this idea?

      This is a very good point about reconciling these findings with past ones that have focused on coordinated changes across ensembles of neurons, i.e., metastable dynamics of internal (hidden) states. There is a brief mention of metastability toward the end of the Discussion, but we agree it deserves elaboration.

      This work does emphasize single unit activity, but in the context of, and as relevant to, population activity. We believe that the findings and frameworks of previous studies and those presented here are compatible rather than mutually exclusive. There is no reason why neurons with the coding patterns we studied here cannot coordinate with others to participate in the formation of different metastable states. The question of which—neurons with specific response profiles, or ensemble activity patterns that may involve these neurons?—is necessary and sufficient for producing perception and behavior during the mixture-based decision-making task is interesting but rather difficult to answer because of the single units’ contribution to both alternatives. One would need to utilize a manipulation that disrupts ensemble coordination without disrupting single unit activity to differentiate between them. We have made these points clearer in the Discussion.

      (3) Results: RNNs were based on data from single sessions -- how many neurons of each tuning type were observed in each session? In particular, there were 23 sessions but only 25 neurons total tuned to choice, suggesting that modelled choice neurons were based on ~1 neuron.

      The revised manuscript includes the session-by-session breakdown of response types for both experiment and model in two supplementary tables (Tables S2 and S4). We note that there are 25 neurons tuned to choice during the last 500 ms of the trial prior to decision, but 114 out of 626 neurons in total are tuned to choice in some time bin in the experimental data.

      (4) Minor: Indicate the time windows used for analysis of stimulus sampling, delay, and choice on the figures.

      The revised manuscript now includes the illustration of sampling and delay windows in Figure 2C-D, since we averaged the values over these windows for use in a 2-way ANOVA. All other figures either are associated with bin-by-bin analyses and have the first central and lateral licks (T and D) indicated, or have the time windows specified (e.g., Figure 4B, which uses [T, T + 0.5 s] and [D - 0.5 s, D]).

    1. eLife Assessment

      This study presents valuable findings on the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence. The evidence supporting the claims of the authors is solid, although a more exhaustive characterisation of how the different signals interact would have strengthened the study. The work will be of interest to cognitive and systems neuroscientists working on decision-making.

    2. Reviewer #1 (Public review):

      Summary:

      This paper characterises the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence, with a focus on the centroparietal positivity and motor beta lateralization. The main finding is that the centroparietal positivity builds up during evidence accumulation but falls back to baseline during gaps, while motor beta lateralization maintains a continuous a sustained representation throughout the gap and until response.

      Strengths:

      - Elegant combination of electroencephalography and computational modelling.

      - Innovative task design, including parametric manipulation of gap duration.

      - The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      Weaknesses:

      - A direct characterization of how the centroparietal positivity and motor beta lateralization interact is missing, which limits the novelty. In their reply to reviewers, the authors argue that the signal-to-noise ratio of EEG signals is insufficient for such analyses at the single-trial level. If so, a binned or trial-averaged approach could still be attempted.

      - An exhaustive characterisation of sensors and frequency bands is also missing. In their reply to reviewers, the authors suggest that this would detract from their hypothesis-driven focus. I disagree: the main hypothesis and figures could remain centred on the centroparietal positivity and motor beta lateralization, with a more comprehensive mapping of sensors and frequencies placed in supplementary material. Since the purpose of the paper is to examine EEG-based decision signals in a novel behavioural context, a broader characterisation of the underlying EEG landscape would seem appropriate.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity, and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments which use almost exactly the same design (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023). Overall performance in this experiment was much worse than previous experiments: Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. A second weakness is that, while the authors present a model which describes the data based on pre-mature decision-commitment, they do not examine explanations from the existing literature, that evidence is flexibly weighted, and do not provide any analyses which could be used to compare these descriptions. While their model can describe the data in this manuscript, it cannot explain the data from previous experiments showing a stronger weight on the second pulse.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterise the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence.

      Strengths:

      (1) Elegant combination of electroencephalography and computational modelling.

      (2) The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      (3) Innovative task design, including different gap durations.

      Weaknesses:

      (1) The authors introduce the CPP as tracking an intermediary (motor-independent) evidence integration process, and the MBL as motor preparation that maintains a sustained representation of the decision variable. It would help if the authors could more directly and quantitatively assess whether their current data are in line with this. That is, do these signals exhibit key features of evidence accumulation (slope proportional to evidence strength, terminating at a common amplitude that reflects the bound)? Additionally, plotting these signals report locked (to the button press) would help here. What do the results mean for the narrative of this paper?

      The reviewer is correct that properties such as temporal slope scaling with evidence strength and stereotyped threshold-like amplitude were key in establishing that the CPP reflects evidence accumulation in conventional continuous-stimulus tasks, and its motor independence was demonstrated in how it exhibited the same evidence-dependent dynamics in the absence of motor requirements (e.g. O'Connell et al 2012). We agree that it is of interest to check any such properties that can be feasibly tested in the current, distinct task context of intermittent evidence with delayed responses. Given the way in which participants performed our delayed-response task, sometimes terminating decisions early, it is in the CPP-P1 that conventional patterns of coherence-dependence in slope and amplitude would be expected. Indeed, we found that the CPP-P1 reached higher amplitudes (Fig. 3A, Author response image 1) and exhibited a steeper build up in high- compared to low-coherence trials (Author response image 1). The slope and amplitude profile of the CPP-P2 is complex due to the variability in baseline activity across our various delay conditions and the bounded process that participants engaged in, but it is still consistent with an accumulation process. Our simulations provide a full account of how an accumulating signal could produce the observed results.

      Author response image 1.

      Grand-averaged (± sem) CPP-P1 traces in both experiments (top). Bottom boxplot graphs indicate the average slope computed as the slope between 0.2 s post P1 onset (when CPP begins its buildup) and the time when peak amplitude was reached within the [0.4-0.6s] interval, computed for each subject individually. Red crosses indicate outliers, computed as values exceeding 1.5 times the interquartile range away from the bottom or top of the box. Grey lines indicate single subject estimates, and asterisks reflect the significance of paired ttests for the estimated slope and amplitude effects; **p<0.01, *p<0.05. H = high coherence, L = low coherence.

      Like in other delayed-response tasks (Twomey et al 2016; McCone et al 2026), we observe here that the CPP peaks and falls well before the response is cued or indeed executed (here, in fact peaking and falling for each individual pulse). Thus, its pre-response dynamics will not relate to stimulus-driven evidence accumulation in the way they do in immediate response contexts (e.g. O’Connell et al. 2012; Steinemann et al. 2018). We therefore do not analyse response-aligned CPPs in the experiment.

      As to the intermediary role we have interpreted for the CPP, in addition to the local pulse driven peak-and-fall dynamics compared to the sustained profiles of motor preparation signals, we can point to the obvious temporal delay between the signals, where evidence-dependent buildup in the CPP substantially precedes that of motor preparation, as observed in all previous studies comparing the two (e.g. Kelly & O'Connell 2013).

      (2) The novelty of this work lies partly in the aim to characterize how the CPP and MBL interact (page 5, line 3-5). However, this analysis seems to be missing. E.g., at the single-trial level, do relatively strong CPP pulses predict faster/larger MBL? The simulations in Figure 5 are interesting, but more could be done with the measured physiology.

      As exemplified in the extant EEG-decision literature, the low signal-to-noise ratio of EEG is such that attempts are seldom made to link two EEG signals on a single-trial basis, and studies instead favour testing single-trial relationships between each individual EEG signal and behaviour, or, most commonly, comparing patterns of variation in the EEG signals across experimental conditions (e.g. difficulty). Accordingly, here we show that trials with high coherence P1 evoked 1) higher CPP amplitudes (Fig. 3A,C), and 2) stronger MBL (Fig. S2 & S3). Further, we showed that particularly high CPP amplitudes following the first pulse led to stronger weights on choice for the first pulse (Fig. S11), which could only be mediated by the motor system.

      (3) The focus on CPP and MBL is hypothesis-driven but also narrow. Since we know only a little about the physiology during this "gaps" task, have the authors considered computing TFRs from different sensor groupings (perhaps in a supplementary figure?).

      While we agree that it might be interesting to explore frequency bands and sensors more broadly, we feel that such an exploration would detract from the hypothesis-driven focus on how prominent, well-characterised decision signals in the brain behave in a context where evidence is presented in an atypical, seldom-studied manner, namely in the form of temporally separate pulses. Our aim was not to explore whole-brain dynamics that might be engaged during the task, but rather to get a better understanding of the functional roles of the neural processes underlying the CPP and MBL during decision making. Providing a detailed description of whole-scalp responses is thus beyond the scope of this paper, but given that all data will be made publicly available this can be pursued in future work and by other researchers.

      (4) The idea of a potential bound crossing during P1 is elegant, albeit a little simplistic. I wonder if the authors could more directly show a physiological signature of this. For example, by focusing on the MBL or occipital alpha split by the LL, LH, HL and HH conditions, and showing this pulse- as well as report-locked. Related, a primacy effect can also be achieved by modelling (i) self-excitation of the current one-dimensional accumulator, or (ii) two competing accumulators that produce winner-take-all dynamics. Is it possible to distinguish between these models, either with formal model comparison or with diagnostic physiological signatures?

      In addition to the CPP amplitude effects we report in the main paper, the reviewer is correct that pulse-locked MBL can also provide a physiological signature of the greater number of pulse-1 bound crossings when that pulse is high-coherence. This is shown in Figure S3, where we see this coherence-dependent effect consistently across all gap durations and both experiments. Figure S2 also shows that the MBL step-change after P2 is greater in P1-low coherence trials in Experiment 1, as predicted by the bound-crossing account, and consistent with the CPP findings. We note that this effect appears absent in Experiment 2, but this is likely because the greater proportion of shorter gap durations (0, .12, .36s) mean that updates following P2 are likely to still capture P1-driven changes, due to signal-transmission delays. Please also note that Fig. S2 and S3 have been updated from the previous version, because while revising the paper we noticed a mistake whereby we were plotting alpha band power (813Hz) rather than the intended beta (13-30Hz). The results remain qualitatively unchanged. Although there isn’t sufficient single-trial signal-to-noise ratio to be able to categorise individual trials as having crossed a threshold or not, this is strong evidence in support of the coherence dependent amplitudes of the CPP and motor updates. Analyzing beta locked to the report would not be informative in this case because of the delayed reporting structure of the task and the threshold-crossing relationship beta exhibits with response execution (O’Connell et al. 2012). That is, beta will reach the same amplitude immediately prior to the response regardless of whether or not decisions were terminated during P1. Instead, we believe that the empirical CPP-P2 traces we show provide direct evidence that the second pulse was not fully integrated in all trials, and as our modelling confirms, this is consistent with bound crossings occurring sometimes before P2. First, the fact that CPP-P2 amplitudes were overall lower than CPP-P1 amplitudes mirrors the behavioural observation that the first pulse had a stronger weight on choice than the second one. Second, we show that trials where the CPP was particularly high after the first pulse were also trials where P1 also exerted a particularly strong influence on choice (see Fig. S11), further validating the idea that higher CPP amplitudes are directly related to behaviour.

      Regarding self-excitation (SE) and winner-take-all competition (WTAC), these could indeed contribute to the behavioural primacy effects, but they would not detract from our central finding that the CPP does not encode a sustained representation of a decision variable, but rather reflects two rounds of evidence accumulation feeding into a single decision process. Further, it is not immediately clear whether/how these alternative models might also account for the CPP-P1/CPP-P2 results as simply as our bounded model does. While it might be theoretically possible for SE/WTAC models to explain 1) why the CPP-P2 is generally lower than the CPP-P1 across conditions, and 2) why the maximum CPP-P2 amplitudes in P1-high trials are smaller than in P1-low trials, these patterns of results are not an immediate consequence of standard implementations. Further, while the question of whether the accumulation process is perfect integration or involves SE or WTAC is certainly of additional interest, given that this is a delayed response task and does not provide information on termination timing through RT distributions, arbitrating between these modes of integration would not be straightforward with the current data.

      (5) The way the authors specify the random effects of the structure of their mixed linear models should be specified in more detail. Now, they write: "Where possible, we included all main effects of interest as random effects to control for interindividual variability." This sounds as if they started with a model with a full random effect structure and dropped random components when the model would not converge. This might not be sufficiently principled, as random components could be dropped in many different orders and would affect the results. Do all main results hold when using classical random effects statistics on subject-wise regression coefficients?

      The equations in the paper include the full details of the random effects structure we used for each model. We note that only two of our four equations did not include a full random effect structure, indeed due to convergence issues. We have now fit these models with a maximal random effects structure (i.e. including all fixed effects as random effects as well) with the ‘bobyqa’ optimiser. This resulted in singular fits for both Eq. 2 (Exp. 1 and Exp. 2) and Eq. 3 (Exp. 2 only). Following previous suggestions, we used a weakly informative wishart prior (Chung et al. 2015) to regularise the random effects covariance matrix using the blme package (Chung et al. 2013), which resolved the singular fit problem. However, the model still produced convergence warnings in some models. To assess these models’ robustness, we compared the fixed effect parameter estimates across multiple optimisers, as suggested by the lme4 developers (see lm4 documentation). Parameter estimates across optimisers rarely deviated by more than one decimal point across 6 optimisers (see Bates et al. 2011), and we thus concluded the model estimates were robust and convergence warnings were a false positive, a known issue in lme4. For all models in the paper, we report the parameters estimated using the “bobyqa” optimiser. All main inferential results remain unchanged (except for one interaction that was not of interest in Exp. 1), and the estimated slopes and statistical results for all models have been updated in the manuscript. We also included all these details in the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments, which use almost exactly the same design (Kiani et al., 2013; TohidiMoghaddam et al., 2019; Azizi et al., 2021; 2023). The authors suggest this may be due to the staircase procedure used to calibrate the coherence of (single-pulse) dot motion stimuli for individuals at the start of the experiment. But it remains unclear why overall performance in this experiment is so bad. Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. It seems odd that adding the 0% coherent motion in the temporal gaps would impair performance so greatly, given it was clearly colour-coded. There is a lack of detail about the stimulus presentation parameters to understand whether visual processing explains the declined performance, or if there is a more cognitive/motivational explanation.

      We thank the reviewer for highlighting this. We apologise for not providing full details about the visual display, which we have included now.

      The moving dots were presented centrally on the monitor, at a 5 degree aperture, and moving at a speed of 5 degrees/second. The monitor refresh rate was 60Hz for 19 participants and 85Hz for 3 participants in Experiment 1, while it was 85Hz for 19 participants and 60Hz for 2 participants in Experiment 2. Dot density in our task was similar to previous studies (16.7 dots/degree/s<sup>2</sup>, as in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019; Azizi et al. 2021, 2023). However, in contrast to previous studies, we did not include any feedback on a trial-bytrial basis, instead only providing feedback at the end of each block indicating the average accuracy. This would have made it harder for participants to continually assess how well they were performing and to adjust their strategies (e.g. increase their bound for better accuracy) accordingly. We agree that the inclusion of 0% coherence dots during the gap between pulses is unlikely to have caused the participants’ relatively low overall performance, especially since we did not find accuracy to be overall lower for longer 0%-coherence gaps.

      Further, as the reviewer notes, we used a staircasing procedure at the beginning of the experiment which used only single pulses of evidence. This may have encouraged participants to set a bound that can usually be reached by one pulse, and the resultant early terminations meant that they seldom used the full 400ms of evidence that were available to them. In fact, we would like to thank the reviewer for pointing out Golmohamadian et al., 2025, which used a similar variable delays task structure but with different visual stimuli. They, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, also like us, reported a stronger choice reliance on pulse-1. This suggests that these two factors may suffice to induce a primacy rather than a recency effect.

      There are other reasons why performance may have been different in our task compared to previous studies. For example, our task included a lead-in period that was longer than in previous studies and contained 0%-coherence dots, in order to minimise interfering VEP components (the lead in period was between 700 to 1050ms in our study, compared to 200– 500 ms in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019 & Azizi et al. 2023, and 400 -1000 ms in Azizi & Ebrahimpour 2021). This longer and visually explicit preparation period may have acted as a warning cue, allowing participants to fully prepare before the first pulse, and again making it easier for them to hit a bound with only that information.

      We have added a more detailed discussion about how our stimuli and the task characteristics may have resulted in a substantially different performance in our task compared to previous studies in the discussion section.

      Recommendations for the authors:

      Reviewing Editor:

      Please consider the following reviewer suggestions for how to strengthen the evidence for your central claims, which could translate into an improved assessment of the "strength of evidence".

      Apart from these useful suggestions, I had some concerns about scholarship, because the list of studies currently cited in your introduction is exclusively from your group, while one of the phenomena of interest - motor beta power lateralization (MBL) in decision-making - has been widely studied by several groups, using also other techniques.

      I was wondering why you chose not to cite the ample MEG evidence for the role of MBL in decision-making. This has been shown both in classical random dot motion tasks (Donner et al, Curr Biol, 2009; de Lange et al, J Neurosci, 2013; Pape et al, Nat Commun, 2016; Urai et al, Nat Commun, 2022) as well as in tasks involving discrete evidence samples (Wilming et al, Nat Commun, 2020; Murphy et al, Nat Neurosci, 2021). Another relevant EEG study is by Ian Gould et al, J Neurosci, 2010. There is also quite a bit of monkey LFP work (mainly by Saskia Haegens) on choice-selective beta power in the motor system of the macaque, although the link to the lateralized beta power suppression in your work and the above human E/MEG studies remains a bit elusive. I feel it would be important to provide a more balanced reflection of the existing literature on this phenomenon.

      We thank the editor for this fair comment, and we apologise for having provided a too narrow, EEG-centric view of the literature, arising from our interest in the CPP component which hasn’t yet been characterised in MEG or LFPs. We have now substantially expanded the introduction to provide a more balanced and comprehensive overview of the literature.

      Reviewer #1 (Recommendations for the authors):

      (1) The diffusion model needs to be explained in more detail. For example, it should be explicitly stated that the model was fit to only choices, as most readers would expect reaction times. Further, it needs to be started if the model was fit separately for each subject or in one go to the group-level data. If the former, it is important to add error bars of the betweensubjects variability (in simulated and empirical data) to Figure 4A. If the latter, it would be important to determine uncertainty using bootstrapping.

      The original model was fit to grand-average data, as stated in the methods section. To assess between-subjects variability, we have re-fitted the model to each individual subject, for each experiment. The average of the individually-estimated model parameters closely recapitulated the values obtained from the fit to grand-averaged data (Fig. S12). We then simulated N = 10000 trials for each individual, and we report the grand-averaged results with error bars indicating the standard error of the mean as a supplementary figure (Fig. S13). The results replicate the ones reported in the main manuscript. We have also made it explicit that the models are fit to accuracy data but not RT.

      (2) The authors write numerous times that the MBL exhibits an "evidence-dependent" buildup. However, should this not be "choice-dependent"? In Figure 2A, one can clearly see that the sign of MBL follows choice and not objective evidence.

      We thank the reviewer for this comment. By evidence-dependent, we mean that lateralisation towards the correct response is strongest in high-coherence trials (see Fig. S2, S3). This is indeed because the sign of MBL is choice-dependent, and participants are less likely to make mistakes in high-coherence trials. We have added a clarification sentence in the text.

      (3) It would aid readability to add sub-conclusions at the end of each Results section.

      We have added clarifications where needed.

      (4) In Figure 1B, I cannot see a dashed line for the HL condition. I understand that it must lie under the LH condition, but it would be good to show it separately.

      We thank the reviewer for this comment. Since we cannot show both lines separately without additional panels, given the HL and LH lines perfectly overlap, we indicate at the end of the caption that this is the case as follows: “Note that a perfect accumulator predicts identical accuracies for the HL and LH conditions, and therefore the two lines overlap.”

      (5) In Figure 4B, is the horizontal dashed line important? It is confusing because the legend incorrectly states that this is "data".

      Thanks for this observation - it was only there to indicate a 50% as a benchmark to assess how frequent early terminations are, but we agree that it was unnecessary and potentially confusing, so we have removed it from the plot.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors should more directly address how behaviour in their task differs quite substantially from previous experiments with very similar designs (including why such high coherence levels are required, over a longer duration, to reach overall worse performance). Some readers may also be interested in a broader discussion of how decision-makers may use flexible weights when integrating evidence across samples over time. While the explanation of bounded accumulation is convincing in this context, Tsetsos et al., (2012) suggest recency effects (as in Cheadle et al., 2014; Wyart et al., 2015) cannot be explained by bounded accumulation, but rather integration leak. Other factors may include stimulus consistency (Glickman et al., 2022) or even choice consistency across decisions (Bronfman et all., 2015). Golmohamadian et al., 2025 demonstrated flexibility in decision strategies across sensory modalities.

      As we described above, we have added some more detailed explanation about why it might be the case that behaviour in our study differs from previous reports using similar tasks. We agree that the reversed pulse-reliance in our study compared to others presents an opportunity to discuss flexibility in decision strategy and so we have now added a broader discussion on different patterns of integration in various task contexts. We thank the reviewer for pointing out Golmohamadian et al., 2025, as they, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, like us, reported a stronger choice reliance on pulse-1.

      (2) Another open question is how central parietal positivity reflects an accumulation signal in the case of continuous evidence, but reflects momentary evidence in the case of discrete evidence samples. If, in both cases, the parietal evidence is passed along to motor processes for bounded decision commitment, how do motor processes deal with the changes in what is represented? Can the relationship between MBL and CPP in the model-simulated data shed some light on this? Specifically, how is the 0-gap condition treated in this simulation (which shows only 1 CPP peak but with a longer time to decay) compared to non-zero gap conditions (which show 2 peaks)?

      This is a very interesting and important point, and we thank the reviewer for raising it. We believe that the CPP in our intermittent-dots task reflects dot-motion evidence integration in the same way as in conventional continuous evidence tasks, building at an evidence dependent rate (see Author response image 1), with the only difference being that integration processes can be turned “on” or “off” depending on whether evidence is present, and can thus be temporally split into multiple “rounds” of accumulation when there is a gap.

      Our model simulations assume that evidence integration is triggered by the dots turning yellow, indicating the presence of evidence, and feeds continuously to the motor system in these periods. However, it is switched off either when 1) a bound has been hit, or 2) the dots turn blue again, at which point the CPP falls (see various rates of signal decay in Fig. S7). The reason the CPP continues longer before it peaks and falls in the zero-gap condition, by this account, is because there is no dot-colour change at the end of pulse-1 to switch it off, and thus the accumulation process continues until either a bound is hit, or the yellow dots turn blue after pulse-2. When there is a non-zero gap, despite the CPP being switched off, the decision variable itself remains encoded at the motor level so that no information is lost. This requires that the same instruction that turns-off the CPP must also break or pause the flow from the CPP to the motor level and allow it to hold its current level until either a second pulse resumes a feed from a newly-triggered CPP, or response execution is cued. Thus, in our account, the accumulation process underlying the CPP in our intermittent-evidence task is identical to conventional continuous-evidence tasks, but since it can be turned “on” and “off” as a function of whether or not evidence is clearly present or absent, produces two “rounds” of integration in non-zero gap conditions. The motor process also receives a feed from the CPP as in conventional continuous-evidence tasks, but with this feed similarly gated by the presence of evidence.

      A slightly different and perhaps more challenging question (which the reviewer was perhaps alluding to) relates to tasks where evidence comes not in short noisy snippets, but rather as static tokens (e.g. Wyart et al. 2012, 2015; Murphy et al. 2021; Parés-Pujolràs et al. 2025). In these instances, the CPP exhibits transient evoked responses to each token, which scale with the belief updates resulting from it (Parés-Pujolràs et al. 2025). However, it remains unclear whether these transient potentials reflect a temporally-evolving integration process to compute the appropriate belief update afforded by that token in the context of a particular task, or rather reflect the output of such a process. The former account would be similar to our interpretation of the transient deflections observed in this gaps task, which we believe capture the same temporal integration processes as those commonly observed in conventional continuous noisy stimuli paradigms, only short-lived. The latter account would instead be specific to low-noise stimuli like tokens, where the computations required for belief updating may not require a temporally-extended integration process, but rely on different mechanisms to compute belief updates (e.g. prior-based modulations of sensory encoding, attention or neural gain). These questions remain open for future investigation.

      (3) From what I understand, the model suggests all-or-none integration of the second pulse: either the bound has not been reached and the pulse is perfectly integrated, or the bound has been reached and so the pulse is not integrated. The CPP amplitude at pulse 2 is therefore determined not only by the strength of the evidence at pulse 2 but also by the proportion of trials where the evidence is not ignored: CPP at pulse 2 is of lower amplitude because it is calculated as an average across trials where it is either similar to CPP at pulse 1 or otherwise completely absent. Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper). It would be nice to show the dichotomy predicted by the model in the empirical data. I'm thinking of something similar to this 'bifurcation' analysis from Sergent et al., 2021. Or more simply, estimates of CPP amplitude from single trials (perhaps an average over a short window around the peak) should be more variable at pulse 2, with some reaching similar amplitudes to pulse 1, and many close to baseline, whereas at pulse 1, there should be a more uniform cluster of amplitudes. If all CPP peak amplitudes were lower, would this motivate a model comparison where, for example, additional evidence from the second pulse was down-weighted according to certainty following the first pulse (leading to all trials down-weighting the second pulse)? This could link in nicely with some of the more nuanced analyses related to attention in the supplementary figures.

      We thank the reviewer for this insightful comment, which will help us clarify how our model works. The integration of the second pulse does not work in an all-or-none manner. In our model, the accumulation stops whenever a bound is reached at the downstream motor level. This can happen 1) at some point during the 1st pulse (no integration of pulse 2 at all), 2) during the 2nd pulse (partial integration of pulse 2, until the bound is hit), or 3) not crossed at all (full integration of pulse 2). Our model thus allows for partial integration of the second pulse rather than all-or-none. Author response image 2 shows 3 example trials that illustrate how the model works. The CPP amplitudes at pulse 2 are thus determined by two main factors: 1) whether or not accumulation of P2 is precluded by an earlier bound crossing in P1 (if it is, the CPP amplitude is assumed to equal 0), and 2) whether and when accumulation ended if it did take place. Our interpretation is that, given that trials where pulse 1 was low coherence were 1) less likely to terminate early (Fig. 4B) and 2) had achieved lower levels of accumulated evidence (Fig. 4C), the LL and LH conditions are linked to a higher proportion of trials where accumulation at pulse 2 does occur, and it lasts for a longer amount of time because the distance required to reach a bound is longer than in their pulse 1 high-coherence counterparts. We have clarified this point in the results section describing the model.

      The reviewer notes: “Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper)”. However, our interpretation in fact predicts that the vast majority of trials should indeed exhibit smaller amplitudes. That can again be explained by the three trial types mentioned above. Unlike in CPP-P1, there would be a majority of trials where integration does not occur at all. Only trials where evidence was at least partially integrated during P2 would be predicted to have CPPP2 amplitudes that are overall positive, and even in those instances, average amplitudes would be overall lower than CPP-P1 in trials that terminated early, because of the lower distance remaining to be covered before hitting a bound. Author response image 2 illustrates this point. Thus, the prediction regarding how CPP amplitude variance or distribution shape would compare between P1 and P2 is less straightforward than if it were all-or-none on P2, not to mention the fact that EEG noise would likely drown-out distributional features like this. We therefore focus on a comparison of the means, for which our model has the clear prediction that most trials should exhibit lower CPP-P2 amplitudes. To assess whether empirical observations meet this prediction, and following the reviewer’s suggestion, we extracted the mean amplitudes around 0.45-0.55s after P1 and P2, for each single trial. CPP-P2 data were baselined using the amplitude 100 ms before P2 onset, as in Fig. S5 - note that this is likely to introduce spurious drifts due to overlapping potentials from P1, but given that grand averaged traces still qualitatively captured the key effects we assume it is a valid approach. We then pooled CPP-P1 and CPP-P2 amplitudes across pulses, and z-scored them for each participant separately. In both experiments, in a majority of participants (Exp. 1: 16/22, Exp. 2: 17/21) the median z-CPP-P1 amplitude was higher than that of z-CPP-P2. Author response image 3 illustrates the pooled distributions.

      Author response image 2.

      Decision variable simulations illustrating sample single trials (top) and CPP traces averaging data across conditions and N = 1000 trials (bottom), using model fits from Exp 2, in the long gap condition. Overlaid text indicates the percentage of trials in each subset, for each condition. The horizontal line indicates the bound; shaded areas indicate pulse presentation times. A. The bound was hit during P1, and therefore no further accumulation occurred during P2. B. The bound was hit during P2, and therefore P2 was only partially accumulated, C. No bound was hit, and therefore all evidence from P2 was accumulated.

      Author response image 3.

      Pooled CPP–P1 and CPP-P2 amplitudes [450-550ms post-pulse] distributions, normalised within-participant, and baselined 100ms before pulse onset. In both experiments, CPP-P2 amplitudes had a lower median (vertical line) normalised amplitude than CPP-P1.

      (4) A minor note: Full details of stimulus presentation (size, number of dots, dot size, speed, lifetime) would be appreciated.

      Thank you - we have now provided these details in the methods section (see also reply to public reviews above).

      (5) Are the authors sure they want to use this 'Gaps task' name? It seems a bit strange to introduce this name in this context, where there isn't really a 'Gap' (random dot motion fills the gap). A reader could get the impression the name was given in the Kiani et al., 2013 study (page 3, paragraph 1: "This scenario has begun to be studied using an intermittent- evidence or "gaps" task (Kiani et al., 2013) ...") but this is not true, Kiani et al. never use the term "Gaps task", nor has any other study since (as far as I know).

      We thank the reviewer for noting this oversight on our part - we have now made it clear that “gaps task” is the way we refer to the task originally developed by Kiani et al. 2013 in the introduction. We have decided to still use this name because it is a convenient proxy, in the understanding that “gap” refers to a “gap” in coherent motion as in Kiani et al (2013), albeit not a proper blank as in the original implementation.

    1. eLife Assessment

      This study provides valuable insights with convincing evidence detailing altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel and quantitative tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. The behavioral experiments are well executed and set the stage for subsequent mechanistic, causal, and computational approaches. The work is relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be overextended given the nature of the data (solely behavioral), the reliance on repeated d′ measures may obfuscate some of the results without clearer psychometric or regressionbased analyses, and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

      We thank the editors for their positive assessment of the data quality and the novelty of our behavioral task, and for pointing out the limitations inherent in behavioral studies.

      We would like to clarify one important point regarding the use of d′ measures. While d′ was included to quantify sensitivity, our conclusions are not based solely on repeated d′ measures. In addition to d′, we analyzed raw behavioral data (correct and incorrect choice rates), and categorization performance was assessed using psychometric curves fitted with logistic regression models. These complementary analyses provide converging evidence and ensure that our interpretations are supported by multiple robust measures.

      In the revised manuscript, we have further strengthened the analyses by including additional regression-based assessments, reporting effect sizes for subtle effects, and refining the statistical methods for clarity and transparency.

      We fully acknowledge that this work is behavioral and does not directly reveal the underlying neural mechanisms. Nonetheless, the translational framework we have developed establishes a robust foundation for future studies. This platform can be directly applied in clinical research on autism and other neuropsychiatric conditions involving sensory-cognitive interactions, and provides a solid basis for subsequent mechanistic, causal, or computational investigations to uncover the neural circuits mediating these effects.

      We greatly appreciate the editors’ and reviewers’ guidance and believe the revisions have clarified and strengthened the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      We appreciate the reviewer’s statement highlighting the importance of our study.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

      We thank the reviewer for these constructive comments. We acknowledge that aspects of the analyses were previously difficult to follow, and we have reworked the Results section to improve clarity and transparency.

      We would like to emphasize that all d′ measures are complemented by analyses of raw response rates (correct and incorrect choices), ensuring that our interpretations are not solely dependent on this metric. In addition, we applied standard psychometric analyses wherever possible. For the training phase, only two stimulus amplitudes were presented, which precluded the construction of full psychometric curves; however, for the categorization phase, psychometric analyses were feasible and are reported in Figure 3. Specifically, psychometric functions were fitted to the data using logistic regression, allowing us to estimate both categorization bias (threshold) and precision (slope) across stimulus intensities. These analyses revealed no evidence of categorization bias or precision in Fmr1<sup>-/y</sup> mice across stimulus strengths.

      Following the reviewer’s suggestion, we have also added general linear model analyses that account for trial history, providing a complementary perspective on decision-making dynamics. Finally, while the calculation of d′ is detailed in the Methods, we have revised the Results to clearly explain its use and appropriateness in each relevant analysis.

      These revisions aim to provide a clearer, more comprehensive picture of the data while ensuring that all conclusions are supported by multiple complementary measures.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

      We thank the reviewer for the careful reading of our manuscript and for these constructive comments. We agree that our study is purely behavioral, and we appreciate the opportunity to clarify the scope and interpretation of our findings. The primary goal of this work was to characterize behavioral patterns during tactile discrimination and categorization in a translationally relevant mouse model of autism.

      Although we did not include direct neural recordings, causal manipulations, or computational modeling, our analyses combining choice behavior, sensitivity measures from signal detection theory, psychometric curves, and regression-based models of trial history provide a detailed and robust characterization of perceptual learning, stimulus discrimination, categorization, and the interplay of cognitive processes with tactile perception. The manuscript has been revised to explicitly state that our conclusions are behavioral, emphasizing that this work establishes a foundation for future studies aimed at elucidating the neural and circuit mechanisms underlying these sensory–cognitive interactions.

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered.

      Alternative explanations for our findings including differences in motivation, fatigue, satiety, stereotyped licking, or reward valuation were carefully considered. As described in the Methods, only testing sessions with >70% correct performance on the training stimuli (12 µm and 26 µm) were included, excluding sessions with reduced motivation, fatigue, satiety, or stereotyped licking that could confound performance on low- or high-salience stimuli.

      Although differences in reward valuation could affect learning speed, we observed no genotype differences in training duration (Fig. 1B-D, Fig. S1C-D). Sessions with disengagement were analyzed only during epochs of active task performance (information added to the revised Methods section, lines 619-620). Reward-driven choice biases were unlikely, as no genotype differences were observed in categorization bias (Fig. 3F) and GLM analyses confirmed that previous reward outcome did not affect current choices (Fig. 4D).

      Finally, altered reward valuation could increase miss rates. Elevated miss rates in Fmr1<sup>-/y</sup> mice were restricted to the lowest-intensity stimulus (12 µm) under high cognitive load, demonstrating a salience- and context-specific effect inconsistent with generalized motivational or reward deficits. The Discussion has been updated to clarify these points and delimit the scope of our interpretations (lines 483-499).

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. References to Load Theory were meant to provide conceptual inspiration for assessing attention in high cognitive load conditions during categorization, rather than to indicate a formal test. Moreover, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced facilitation of across- category discrimination. Finally, we agree that citing Adaptive Resonance Theory, which is grounded in artificial neural network models, could be misleading, and we have revised the text accordingly.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      We thank the reviewer for this comment and agree that our study is purely behavioral and does not provide direct mechanistic evidence for top-down pathway dysfunction. In the first version of the manuscript, the term “top-down” was used at the behavioral level, referring to the influence of higher-order cognitive processes (e.g., categorization, attention, sensory and choice history integration) on tactile perception, rather than to imply specific neural circuits.

      We acknowledge that identifying the neural pathways underlying these effects would require extensive mechanistic experiments, including identifying the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself and performing pathway-specific recordings and manipulations. Such work represents a substantial mechanistic research program beyond the scope of the present study.

      To clarify that our study does not provide insights into the neural underpinnings of the studied behavioral processes, we have revised the manuscript, removing the term “top-down” or replacing it with “higher-order processes” where appropriate. We also explicitly noted that future work using neural recordings or causal manipulations will be needed to uncover the neural underpinnings of these behavioral phenomena (lines 508-510).

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      We recognize that terms such as “reduced top-down categorization influence” and “choice consistency bias” are derived from behavioral observations. However, we respectfully note that these behavioral inferences are widely used in clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021) and are not inherently speculative.

      The translational impact of our work lies in the development of a robust behavioral platform that allows precise dissection of tactile perception and cognitive influences in a manner directly comparable to clinical studies. While we agree that neural, circuit-level, or causal manipulations would provide valuable mechanistic insight, the current study establishes a foundational behavioral framework that can guide and inform future investigations into the underlying neurobiological substrates.

      To ensure clarity, we have revised the manuscript throughout to explicitly indicate that all conclusions are based on behavioral measures and do not imply mechanistic evidence.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      We chose to present both statistically significant effects and trends to ensure transparency and to highlight that commonly used aggregate measures, such as d′, can sometimes obscure meaningful underlying patterns. In the text, p-values between 0.05 and 0.1 are described as trends without over-interpreting their significance. To further support interpretation, we have now computed effect sizes (Hedges’ g) for all subtle effects. In the revised manuscript, all interpretations of non-significant effects have been reworded to avoid overstatement.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      The number of mice used per genotype is consistent with standard practices in behavioral studies of sensory processing. To complement statistical analyses and account for small sample sizes, we have calculated effect sizes (Hedges’ g) for all subtle or trend-level effects (p ≈ 0.05–0.1), providing a measure of effect magnitude independent of sample size.

      As the reviewer correctly noted, no animals were excluded as outliers, since observed variability reflects true biological differences rather than experimental or technical errors. In the revised manuscript, we re-examined all datasets for potential outliers, and when identified, analyses were performed both with and without the data point. Any results sensitive to single animals are explicitly reported. This procedure is now detailed in the Methods section (lines 675-679).

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      We thank the reviewer for highlighting this important point. To control for false positives arising from multiple comparisons, we applied the Bonferroni correction. This information has been added to the Methods section (line 682) to ensure transparency and reproducibility of all statistical tests.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      We thank the reviewer for raising this point, as this was not done intentionally. In the revised manuscript, miss rates for high- and low-salience stimuli were reanalyzed using a mixedeffects linear model, which appropriately accounts for repeated measurements within sessions (Fig. 5; Results section: lines 320-340). This analysis confirmed that Fmr1<sup>-/y</sup> mice exhibit increased miss rates specifically at the 12 µm amplitude, with the effect disappearing at higher low-salience amplitudes (18 µm). Post-hoc comparisons with Bonferroni correction revealed a strong trend for increased misses at 12 µm (T-test: t = -2.8437, p = 0.058, Hedge’s g = 1.23), while no significant differences were found at other amplitudes. The Methods section has been updated to detail this statistical approach for analyzing miss rates (lines 686687).

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

      As mentioned above, our goal was not to directly test theoretical frameworks such as Adaptive Resonance Theory, Load Theory of Attention, or Weak Central Coherence, but rather to provide a context for interpreting our behavioral findings. In the revised manuscript, we have removed references to the Load Theory from the Results section and reframed the Discussion to emphasize that our results are consistent with certain predictions from these cognitive theories, without implying that the experiments directly assessed them. This clarifies that the interpretations are based on observed behavioral patterns, while still acknowledging the potential relevance of these frameworks to better understand tactile perception and cognition in autism.

      Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice.

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      We appreciate the reviewer’s positive assessment regarding our study’s translational value and the importance of our behavioral findings.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, provides additional insights into learning dynamics. In response, we have added these analyses to the revised manuscript (Fig. S1, Fig. S2), which illustrate both individual and group-level learning trajectories and trial-by-trial licking patterns.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While this is an interesting and important question, and is motivated by previous preclinical and clinical findings, it falls outside the scope of the current manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main Comments

      (1) This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism vs. WT controls. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention. The experiments seem well performed, with interesting results. I found certain aspects of the analysis not clearly explained, which made it difficult at times to understand.

      Please see specific details in the comments below.

      (2) To measure sensitivity, the authors present many comparisons of d' - sometimes between pairs of stimuli (or sometimes even for a single stimulus level).

      (a) Firstly, the calculation of d' for a single stimulus value is unclear (because the same proportion of high/low choices for a given stimulus can result from shifts in bias/criterion).

      We agree with the reviewer that calculating d′ for a single stimulus conflates sensitivity with response bias/criterion differences. For this reason, the panels showing d′ for individual stimulus amplitudes during training (Fig. 1F and 1G in the original manuscript) have been removed from the manuscript.

      In addition, we revised our d’ (Fig. 1E) and criterion calculations (Fig. 2A), treating the high amplitude stimuli as “signal” and low amplitude stimuli as “noise”, based on the Signal Detection Theory. The formulas used in the revised manuscript take into account correct responses during high amplitude stimuli and wrong responses during low amplitude stimuli to calculate the sensitivity and bias of the mice during discrimination in the training period.

      Sensitivity (d′) is now computed as:

      d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus)

      and the criterion (c) as:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      (b) Secondly, while calculating d' makes sense for comparing two stimulus levels (like in the training condition), in the test condition (with a spread of stimuli), this becomes a little tedious - at times difficult to follow and unclear.

      I would have thought that sensitivity (at least for overall performance) would be better compared using data from all the stimuli - e.g. either using:

      (i) the sigma of the psychometric curve (although the downside of that approach is that it ignores history effects), or

      (ii) a logistic regression for the choices, given the stimuli, where the weights assigned to the stimulus magnitude indicate sensitivity (the advantage of that approach is that history effects, like the previous trials/choices can be used as regressors in the model). Accordingly, it can simultaneously also quantify the history effects. This could even be expanded to a GLMM (mixed effects for different mice).

      We thank the reviewer for this very valuable feedback. Indeed, during the testing phase, we calculated sensitivity d’ to probe the overall categorization sensitivity (Fig. 3H).

      (i) This analysis was only complementary to the psychometric curves (fitted on the rightward lick rate for each stimulus amplitude using a general linear model – Fig. 3A). As the reviewer proposes, we had calculated the sigma of the psychometric curve (Fig. 3G, slope) to assess categorization precision. Sensitivity calculations have also now been revised using the aforementioned formula (d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus).

      (ii) To incorporate history effects, we implemented generalized linear models (GLMs) with a binomial link function to predict high-salience licks (right-lick choices) based on the current stimulus, trial history, genotype, and their interactions. A main-effects model included current stimulus, previous stimulus, previous outcome, previous choice, and genotype, followed by interaction terms to assess genotype-specific modulation of history effects. These analyses are now presented in the new Figure 6.

      The resulting coefficients are shown in Fig. 6A. As expected, decisions were primarily driven by current stimulus amplitude (Fig. 6A, B). Both genotypes displayed a tendency to repeat previous choices (Fig. 6A, C), while previous reward outcomes did not influence current choice (Fig. 6A, D). Notably, stimulus amplitude history showed genotype-specific effects: WT mice were negatively influenced by the previous stimulus, whereas Fmr1<sup>-/y</sup> mice remained unaffected (Fig. 6A, E).

      To clearly visualize these findings, we plotted psychometric curves and marginal effects accounting for current stimulus, previous choice, previous outcome, and previous stimulus (Fig. 6B-E). These analyses are now fully integrated into the Methods (lines 688-702), Results (Fig. 6, lines 341-369), and Discussion (lines 469-479) sections of the revised manuscript.

      (3) I find some of the terminology used confusing/misleading:

      (a)The term "Categorization thresholds" can be misleading - in psychometric curves, "thresholds" often refer to the sigma (SD) of the fitted curve used to measure sensitivity (inversely related). Here, I think that the meaning is in terms of the PSE/ criterion. Perhaps the terminology can be improved to prevent confusion on this matter. E.g., I think that here the authors mean a measure of bias/criterion/PSE or similar. Correct? Not really a perceptual "threshold".

      We thank the reviewer for pointing this out. In our analysis, the term “threshold” referred to the inflection point (i.e., the midpoint parameter μ) of the fitted logistic psychometric function used to categorize high- versus low-amplitude stimuli. We termed it “threshold” in the categorization of high and low amplitude stimuli. We agree with the reviewer that we could also use the term “Categorization bias”. We originally opted to avoid this term, not to confuse the readers when referring to the criterion (signal detection theory) as “response bias”. However, seeing as the term “threshold” may be confusing as well, we adopted the term “Categorization bias” in the updated version of the manuscript (lines 282, 284, 637-638, 785, Fig. 3F).

      (b) Similarly, I think that "Categorization accuracy" can be misleading when describing the slope of the psychometric curve. Performance could have a steep slope but still be quite inaccurate (e.g., if there is a big bias). Perhaps "precision" is a better description of the slope?

      We thank the reviewer for this suggestion. The slope of the psychometric curve is often referred to as “sensitivity” in the literature (Carandini and Churchland, 2014), but in our original manuscript we used the term “accuracy” to avoid confusion with the d′ measure from signal detection theory. We have revised the manuscript and Figures with the term “precision” as the reviewer suggested (lines 282, 284, 637-638, 786, Fig. 3G).

      Minor Comments

      (1) Abstract: "determines how autistic individuals engage" - there are other factors too. So, I think that "determines" is a little strong. Perhaps "influences" is more appropriate.

      We have incorporated the reviewer’s suggestion (line 7).

      (2) Figure 1 F, G. On the one hand, d' is defined as "sensitivity (d') in discriminating between high- and low-salience stimuli" - that seems to make sense. But then d' is also calculated and presented for each salience level on its own. How was this done? Namely, percent correct (or proportion of choices high/low salience) could be affected by criterion shifts as well as sensitivity. This makes calculating the d' for a single (low or high) salience stimulus ambiguous. So, how do these authors make this conclusion?

      We agree that calculating d′ for a single stimulus amplitude is ambiguous, because the resulting value conflates true stimulus sensitivity with shifts in response bias or criterion. Consequently, all analyses and figures reporting d′ for individual high- or low-salience stimuli (e.g., Figures 1F and 1G) have been removed from the revised manuscript.

      In the updated analyses, d′ is calculated only across high- versus low-salience stimuli, following standard Signal Detection Theory procedures, ensuring that it reflects true discriminability between the two categories (Methods, line 631; Figure 1E).

      (3) "Our results showed comparable correct choice rates in Fmr1-/y and WT mice (Fig. 1H), for both high- and low-salience stimuli (Fig. S1C-D). In contrast, Fmr1-/y mice presented a significantly higher rate of incorrect choices (Fig. 1I)." - aren't correct choices and incorrect choices complementary (i.e., 1-x) in a 2AFC? How is this possible?

      We thank the reviewer for pointing this out. Correct and incorrect choices are complementary at the single-trial level if miss trials are excluded. However, in our analyses, correct and incorrect choice rates were calculated by normalizing the number of correct or incorrect responses to the total number of trials (including misses), which breaks this complementarity and contributes to the differences observed in Fig. 1H–I. This was clarified in the Methods section (lines 616-617). Moreover, incorrect responses were less frequent than correct ones and are thought to reflect lapses, response bias, and impulsive responding rather than sensory performance, making them more sensitive to genotype-dependent differences in behavioral control. Based on this concept, we further examined whether incorrect choices were preferentially associated with specific stimulus amplitudes and assessed response bias and prior effects.

      (4) The conclusion that "they showed a strong trend toward reduced sensitivity for lowsalience stimuli (Fig. 1G)" has a confound - it could be that there was a criterion shift (rather than differences in sensitivity)?

      We agree with the reviewer that the previously reported trend in sensitivity for low-salience stimuli could reflect a criterion shift rather than true differences in sensory sensitivity. Because sensitivity estimates for individual stimulus amplitudes are not well-defined in a 2AFC framework, we have removed the sensitivity calculations for high- and low-salience stimuli considered independently. Instead, we now present salience-specific differences using correct and incorrect response rates for each stimulus amplitude, which more directly capture performance differences without assuming changes in sensory sensitivity (Fig. 1G-I, S1E-F).

      (5) Figure 3D, E - I stumbled over this in comparison to Figure 3B, C. That is because (a) In D and E, the authors compare right-lick responses (reporting high salience) to stimuli of 12 μm and 14 μm amplitude (Figure 3D) and low-salience lick rates for the same (Figure 3E). I would have thought that these approaches are simply complementary (1-x) - see related minor question above/below. So, what is the advantage of presenting them both?

      We presented both panels to clarify the source of the observed differences in performance. Specifically, showing right-lick responses (reporting high-salience choices) alongside low salience lick rates allows us to distinguish whether reduced high-salience reporting arises from an actual shift in choice (e.g., increased leftward licking) versus an increase in miss trials at the lowest amplitude (12 µm). By presenting both, we can demonstrate that the effect is primarily driven by an increase in leftward choices rather than by missed responses, providing a more precise interpretation of behavioral changes. The complementary analysis for leftward choices has now been moved to the supplemental material (Fig. S5A) and the reason for this analysis has been clarified in the Results (lines 275-276).

      (b) In B and C, the authors compare two differences in stimulus magnitude (2 and 4 μm), but in Figure 3D and E, only one difference (2 μm) from two perspectives. I was expecting a comparison with stimuli differing by 4 μm in amplitude (comparable to the high stimulus comparison of 26 μm vs. 22 μm stimuli).

      We have indeed analyzed the 12 μm versus 16 μm stimulus pair, which corresponds to a 4 μm difference and is reliably discriminated by both genotypes. In the original manuscript, we did not include this comparison because of the differences already seen at a 2 μm amplitude difference. Based on the reviewer’s suggestion, we have now included the 12 μm vs. 16 μm comparison in the revised manuscript (Results, lines 270-272; Fig. 3E) to provide a complementary perspective consistent with the high-salience comparisons (26 μm vs. 22 μm).

      (c) "Sensitivity d' for high- and low-salience stimuli was calculated based on the Correct and Incorrect choice rate for high- and low-salience stimuli respectively." How were trials for which the animal did not respond taken into account? Were these part of the denominator? Or were these excluded when calculating proportions? (related to the Q regarding Figure 3 D,E above).

      Indeed, the Miss trials were part of the denominator. This is now clarified in the Methods section (line 631).

      (d) "c = d'(high)- d'(low)." - I did not understand this fully. There were several high and several slow stimuli - so how were these calculated? Pooled for high and pooled for low? Per stimulus difference?

      This was indeed calculated for pooled high and low amplitudes during testing. In the revised manuscript, criterion c has been recalculated based on the average correct high rate (for stimuli of 20-26 µm amplitude) and average incorrect low rate (for stimuli of 12-18 µm amplitude), using the same formula as in the analysis of the training dataset:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      Pooling across amplitudes allows us to obtain a single summary measure of response bias toward the right lickport, independent of stimulus discriminability. This approach is consistent with standard signal detection theory practices when multiple stimulus levels are present.

      If the inter-trial interval is 5-10s, how is a 5s timeout a punishment?

      The 5 s timeout serves as a punishment by temporarily delaying access to the next trial and potential reward, thereby reducing the overall reward rate. Even though the inter-trial interval (ITI) varies between 5 and 10 s, the timeout increases the effective delay before the next opportunity to earn a reward, discouraging incorrect responses. This is consistent with standard operant conditioning procedures, where brief timeouts act as negative consequences without being overly severe. Across most trials, the timeout effectively reduces expected reward rate, though its impact is minimal when the ITI is already long.

      Reviewer #2 (Recommendations for the authors):

      Task-related questions:

      (1) What evidence is there that the 40 Hz, 12 μm stimulus is "low salience: while the 40 Hz, 26 μm stimulus is "high salience"? This seems like an arbitrary distinction without showing sensitivity curves across a group of animals. Better definitions of the stimuli and the actual forces applied are necessary.

      We thank the reviewer for this comment. Based on our previous work (Semelidou et al., bioRxiv; Accepted in Advanced Science), both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli are clearly suprathreshold. In the present study, however, stimulus salience is defined in a relative and operational manner within this suprathreshold range.

      Specifically, analysis of miss trials (Fig. S3E) shows that the 40 Hz, 12 μm stimulus consistently elicited a higher proportion of missed responses compared to the 40 Hz, 26 μm stimulus across animals, indicating lower behavioral performance for the lower-amplitude stimulus. We therefore refer to the 12 μm stimulus as “low salience” and the 26 μm stimulus as “high salience” to denote relative differences in perceptual strength and attentional engagement within the suprathreshold range, rather than differences in detectability or absolute sensory sensitivity. This definition has been clarified in the Methods (lines 583-587) and Results sections (lines 115-119; lines 225-227).

      (2) Sensitivity curves/detection thresholds for each mouse should be included in the study.

      We thank the reviewer for this suggestion. Sensitivity curves and detection thresholds for low-amplitude and low-frequency vibrotactile forepaw stimulation have been systematically characterized in our previous study (Semelidou et al., bioRxiv, Accepted in Advanced Science). In that work, we demonstrated that stimuli with similar amplitudes and even lower frequency (10Hz) than those used in the present study are reliably detectable by mice, confirming that both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli fall within the suprathreshold range.

      Because the goal of the present study was not to determine absolute detection thresholds but rather to examine discrimination and categorization performance within a suprathreshold range, we did not re-establish full psychometric detection curves for each mouse.

      We have clarified this rationale in the revised manuscript (Results, lines 108-113; Methods, lines: 577-579).

      (3) What force is being applied during stimulus presentations? 12 or 26 μm does not provide enough information about the stimuli applied. What are the physical parameters of the indenter? What material, what tip size?

      Vibrotactile stimuli were delivered to the forepaw via a piezoelectric actuator. A 12.7 mm stainless steel post (ThorLabs) was mounted on the actuator vertically and a 0.6 mm stainless steel rod (ThorLabs) was clamped horizontally onto this post. The horizontal rod served as the contact bar on which the animal rested its right forepaw.

      Stimuli were sinusoidal vibrations at 40 Hz with peak-to-peak displacements of 12 μm (low salience) or 26 μm (high salience). The actuator displacement was calibrated prior to experiments to ensure accurate vibration amplitudes.

      Animals were positioned in the setup to ensure stable and consistent forepaw contact with the rod delivering the vibration. Pilot experiments with an extra sensor to monitor forepaw placement confirmed that the mice did not remove their forepaws from the bar before stimulus delivery. All this information is now added in the Methods section (lines 552-555, 580-582).

      (4) Only one vibration stimulus was used (40 Hz) - this preferentially activates specific subsets of low-threshold mechanoreceptors and not others. A range of vibrotactile stimuli (with varying frequencies) would be more useful. From this limited range of stimuli, it is difficult to assess whether the findings would extrapolate to other types of stimuli.

      We agree that using a single vibration frequency limits the generalization of our findings across the full range of mechanoreceptor subtypes and vibrotactile stimulus conditions. In the present study, we deliberately focused on amplitude discrimination within the flutter range (<50 Hz), as this frequency preferentially activates subsets of low-threshold mechanoreceptors relevant for flutter perception and is commonly used in clinical studies of tactile amplitude discrimination (Puts et al., 2014, 2017; Asaridou et al., 2022). By holding frequency constant and varying only amplitude, we were able to isolate amplitude-dependent perceptual and decision-making processes while minimizing frequency-dependent variability and to facilitate direct translational comparisons with human studies using similar flutter stimuli.

      We acknowledge, however, that extending the paradigm to additional, high frequencies would help determine whether the observed effects generalize across mechanoreceptor channels. We have now added this point as a future direction in the Discussion section (lines 510-514).

      (5) The methods indicate that during the implementation of the water-restriction protocol, mice had access to a solid water supplement in their home cage. How did they control for how much water supplement was consumed by each mouse before the testing sessions?

      We thank the reviewer for raising this point. The solid water supplement was divided into premeasured individual portions, and each mouse received its allotted amount only after the daily training/testing session. Daily body weight measurements were used to monitor hydration and ensure that all animals maintained stable body weight. If necessary, supplemental water was adjusted to maintain animals within the approved weight range. This procedure is now described in the Methods section (line 567-571).

      (6) A control version of the test, perhaps using a different sensory modality, would be useful for making conclusions.

      We agree that testing other sensory modalities would provide a useful control for assessing the generalizability of the observed effects. However, in the present study, we intentionally focused on the tactile modality, as touch has been shown to play a critical role in autism across sexes and predict other core behavioral symptoms. This makes touch particularly relevant for investigating translational mechanisms in this model.

      By specifically targeting tactile perception, we aimed to investigate the link between sensory discrimination, decision-making, and cognitive modulation within a modality that is strongly implicated in autism. Previous studies in autistic individuals have demonstrated similar interactions between cognitive processes and perceptual decision-making in the visual domain, suggesting that such effects may not be modality-specific. Nevertheless, extending this paradigm to additional sensory systems would be valuable to directly test whether comparable cognitive influences on perception generalize across modalities. We have now incorporated this perspective as a future direction in the Discussion section (lines 514-518).

      Reviewer #3 (Recommendations for the authors):

      There are several questions:

      (1) It is important to show stimulus intensity-response curves representing tactile responses for both WT and Fmr1-/y mice.

      We thank the reviewer for this important comment. Detection sensitivity curves for lowamplitude and low-frequency vibrotactile stimulation of the forepaw have been characterized in detail in our previous study (Semelidou et al., bioRxiv; now accepted in Advanced Science). In that work, we showed that stimuli at or above 8 µm amplitude and 10Hz frequency are reliably detected by both WT and Fmr1<sup>-/y</sup> mice.

      Based on these findings, the current study employed vibrotactile stimuli at a higher frequency (40 Hz) and amplitudes of 12 µm and above, ensuring that all stimuli were well within the suprathreshold range for both genotypes. This experimental choice was made to specifically probe discrimination, categorization, and decision-making processes, rather than basic sensory detection. As a result, the behavioral effects reported here cannot be attributed to differences in stimulus detectability.

      We have clarified this rationale in the revised manuscript to make explicit that the absence of full intensity-response curves in the current study reflects a deliberate focus on suprathreshold perceptual and cognitive processes rather than sensory threshold differences (Results, lines 108-113; Methods, lines: 577-579).

      (2) There is no difference in the time it takes to learn the task between WT and Fmr1-/y mice. But how does the learning rate curve look? Is there a difference in the slope between WT and Fmr1-/y early vs late into learning?

      We thank the reviewer for this suggestion. To directly address whether learning dynamics differed between genotypes, we analyzed learning curves across training.

      We first computed the correct choice rate per day for each animal (Fig. S2A) and fit a mixedeffects model including training day, genotype, and their interaction. This analysis revealed no genotype differences in baseline performance or learning rate with minimal Genotype × Day interaction (Fig. S2A-top, Fig. S2C).

      We additionally computed the slope of the learning curve for each individual, which also showed no difference across genotypes (Fig. S2B). In addition, within-animal day-to-day performance variability was also comparable across groups (Fig. S2A-bottom, S2D).

      These analyses indicate that WT and Fmr1<sup>-/y</sup> mice exhibit similar learning trajectories during training. The learning curves are now included in Figure S2, described in the Results (lines 140–151) and detailed in the Methods (lines 644-658).

      (3) It would be useful to see raster plots of licks for different trials and the corresponding lick density plots for early vs late trials.

      We thank the reviewer for this suggestion. To visualize trial-by-trial behavior, we included example lick traces from an early 100-trial session and a late 100-trial session, alongside the corresponding raster plots of licks (Fig. S1A–B).

      (4) Consistent with the first question, examples of intermediate learning stages would help gain more insight into how both WT and Fmr1-/y mice learn.

      In line with the reviewer’s suggestion, we examined whether WT and Fmr1<sup>-/y</sup> mice showed different performance during intermediate stages of learning. To this end, we defined the middle three days of the training period of each animal as the intermediate learning phase. We compared both the mean correct-choice rate and individual learning slopes across this interval. Statistical analyses revealed no significant genotype differences in either measure, indicating comparable performance and learning dynamics during the intermediate phase of training (lines 152-156).

      (5) How does the learning rate change with increased cognitive load for both WT and Fmr1-/y mice?

      We thank the reviewer for this question. While our experimental design did not include a manipulation of cognitive load during the learning phase itself, we assessed whether increased cognitive load affected performance by analyzing behavior on the first day of testing, when animals were required to categorize and discriminate among a larger set of stimuli compared to training.

      Using performance on the training stimuli during this first testing session as a proxy, we found no significant difference between WT and Fmr1<sup>-/y</sup> mice in correct choice rate (Author response image 1). This indicates that increased cognitive load did not differentially affect performance on familiar stimuli across genotypes at this stage.

      Because this analysis does not reflect learning rate per se, but rather performance under increased task demands after learning had already occurred, we did not incorporate it into the main Results section. Instead, it is presented here to directly address the reviewer’s question.

      Author response image 1.

      Correct choice rate for the 12 µm and 26 µm stimuli during the first day of testing when the cognitive load is high.

      (6) How does the learning rate change if the sensory stimuli are more challenging for both WT and Fmr1-/y to detect?

      We thank the reviewer for this question. In the present study, animals were deliberately trained using well-separated, suprathreshold low- and high-salience stimuli to ensure reliable stimulus detection and to avoid confounding learning rate with perceptual difficulty or discrimination limits.

      A recent study (Heimburg et al., 2025) has shown that learning is slower when the difference between the two training stimuli is reduced. Based on these results, we would expect that decreasing the separation between low- and high-salience stimuli would similarly increase training duration for both WT and Fmr1<sup>-/y</sup> mice, since our results do not indicate any discrimination or categorization deficits in the mouse model of autism. However, directly testing how stimulus difficulty modulates learning rate would require a dedicated manipulation of stimulus spacing during training and was beyond the scope of the current study.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals.

      These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. eLife Assessment

      Du et al. present a valuable study examining neural activation in medial prefrontal cortex (mPFC) subpopulations projecting to the basolateral amygdala (BLA) and nucleus accumbens (NAc) during behavioral tasks assessing anxiety, social preference, and social dominance. The strength of the evidence linking in vivo neural physiology to behavioral outcomes was considered solid; however, the slice electrophysiology data and their interpretation were less well received. Overall, the reviewers felt that the revised work provides insight into how distinct mPFC→BLA and mPFC→NAc pathways influence anxiety, exploration, and social behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that neurons in the medial prefrontal cortex (mPFC) are involved in higher cognitive functions such as executive planning, motivational processing and internal state mediated decision-making. These internal states often correlate with the emotional states of the brain. While several studies point to the role of mPFC in regulating behavior based on such emotional states, the diversity of information processing in its sub-populations remains a less explored territory. In this study, the authors try to address this gap by identifying and characterizing some of these sub-populations in mice using a combination of projection-specific imaging, function-based tagging of neurons, multiple behavioral assays and ex-vivo patch clamp recordings.

      Strengths:

      The authors targeted mPFC projections to the nucleus accumbens (NAc) and basolateral amygdala (BLA). Using the open field task (OFT), the authors identified four relevant behavioral states as well as neurons active while the animal was in the center region ("center-ON neurons"). By characterizing single unit activity and using dimensionality reduction, the authors show differentiated coding of behavioral events at both the projection and functional levels. They further substantiate this effect by showing higher sensitivity of mPFC-BLA center-ON neurons during time spent in the open arms of the elevated plus maze (EPM). The authors then pivoted to the three-chamber social interaction (SI) assay to show the different subsets of neurons encode preference of social stimulus over non-social. This reveals an interesting diversity in the function of these sub-populations on multiple levels. Lastly, the authors used the tube test as a manipulation of the anxiety state of mice and compared behavioral differences before/after in the OFT and social interaction tasks. This experiment revealed that "losers" of the tube test spend less time in the center of the open field while "winners" show a stronger preference for the familiar mouse over the object. Using patch-clamp experiments, the authors also found that "winners" exhibit stronger synaptic transmission in the mPFC-NAc projection while "losers" exhibit stronger synaptic transmission in the mPFC-BLA projection. Given the popularity of the tube test assay in rank determination, this provides useful insights into possible effects on anxiety levels and synaptic plasticity. Overall, the many experiments performed by the authors reveal interesting differences in mPFC neurons relative to their involvement in high or low anxiety behaviors, social preference and social rank.

      Weaknesses:

      The authors have addressed all comments.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA ) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appear particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts, this is important to understand the "neural code" within the PFC that contributes to such behaviours and how it is relayed to other brain structures.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within BLA or NAc or the diversity of projection neuron subtypes that mediate these pathways. This is an important future direction for this work but does not detract from the main finding as reported. The electrophysiological data in Figure 7 have some experimental confounds that makes their interpretation challenging.

      Comments on revisions:

      The authors have improved the manuscript somewhat by refining their description of the results. However, the normalized EPSC experiments still do not make much sense. If you have a higher light intensity or LED duration the curve of the EPSC response will saturate earlier. Similarly, if you are in a highly, or poorly labeled slice or subregion of a slice then you will see responses emerge at different intensities based on the number of synapses labelled. There is no standardization in the way these experiments were performed, so performing some arbitrary post hoc normalisation does not correct for this. Similarly, they also place the fibreoptic manually above the slice each time. This makes it much harder to determine the actual light intensity delivered to the slice on a cell by cell and group by group basis.

      I have reduced my public statement from significant experimental confounds, to some experimental confounds. But the way the experiments were performed does not allow the normalized data to really be interpretable. They still argue that normalized EPSCs are relatively larger. I don't even really understand what this means biologically.

      The subsequent rise/decay and other measures is now better described. However, they note that the decay constant is larger. This means that the kinetics are slower, not enhanced, as they describe.

    4. Author response:

      The following is the authors’ response to the previous reviews

      We sincerely thank the editors and reviewers for their careful evaluation and constructive feedback, which has helped us substantially improve the clarity and rigor of the manuscript. In the revised version, we have clarified the interpretation of the electrophysiological experiments, corrected the labeling of recorded signals as light evoked EPSCs, and removed statements implying differences in absolute synaptic strength. To address concerns about the interpretation of Fig. 7, we have added quantitative analyses of EPSC kinetics and revised the text to focus on synaptic response dynamics rather than amplitude differences. We have also removed analyses that could cause confusion and expanded the Methods section to provide additional experimental details, including the optogenetic stimulation configuration in slice recordings. Together, these revisions strengthen the interpretation of the electrophysiological results and improve the overall clarity and transparency of the study.

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness:

      The authors focused primarily on female mice limiting generalizability and leaving the readers with questions about the impact of sex differences on their results. The tube test is used as a manipulation of the "emotional state" in several of the experiments. While the authors show the changes to corticosterone levels as a consequence of win/loss in the tube test, stronger claims might be made with comparisons to other gold standard stressors such as forced social defeat or social isolation.

      We thank the reviewer for these thoughtful comments.

      First, we acknowledge that the present study was conducted primarily in female mice, which may limit the generalizability of the findings. Female mice were selected to reduce variability associated with male aggression and housing-related stress, which can complicate behavioral assays such as social interaction and dominance testing. While focusing on a single sex allowed us to maintain experimental consistency across multiple behavioral paradigms, we agree that sex differences could influence the neural circuits underlying emotional and social behaviors. We have now added a statement in the Discussion acknowledging this limitation and noting that future studies will be necessary to determine whether similar circuit mechanisms operate in male mice.

      Second, we appreciate the reviewer’s suggestion regarding the use of other stress paradigms. In this study, the tube test was used primarily to establish social dominance relationships between paired mice rather than as a classical stress-induction paradigm. Nevertheless, we observed measurable physiological changes associated with repeated win/loss outcomes, including alterations in corticosterone levels in brain lysates of loser mice after repeated tube-test competitions. Notably, repeated win/loss outcomes in the tube test were associated with significant increases in corticosterone levels in loser mice, indicating that the paradigm produced measurable physiological responses consistent with stress-related processes. These findings suggest that repeated social competition in this context can induce transient physiological and behavioral changes associated with social hierarchy. We agree that paradigms such as chronic social defeat stress or social isolation represent well-established models for inducing sustained stress responses. We have therefore revised the manuscript to clarify that the tube test in our study serves as a model of social competition and rank establishment rather than a canonical stress paradigm, and we highlight the comparison with other stress models as an important direction for future work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In relation to figure 7. Their response does not really clarify the issue:

      (a) They argue that they are not making claims about synapse strength. However they still state "In the mPFC→NAc pathway, blue light stimulation evoked larger excitatory postsynaptic currents (EPSCs) in winner mice compared to losers (Fig. 7E). This suggests stronger synaptic transmission in winners' mPFC→NAc circuits. " They don't show this, they just show that normalized to some arbitrary value the responses of the earlier durations is higher or lower, which is very hard to interpret.

      They argue in the rebuttal that the aim of this is to highlight response kinetics, but these are not quantified or discussed in any way.

      We thank the reviewer for this helpful comment. We agree that the normalized input output curves shown in the original submission did not allow conclusions about absolute synaptic strength, and we also acknowledge that response kinetics were not previously quantified despite being mentioned in the rebuttal.

      To address both concerns, we have revised Fig. 7 and added quantitative analyses of EPSC kinetics. Specifically, we measured the rise and decay slopes of light-evoked EPSCs recorded in postsynaptic neurons within the NAc and BLA of winner and loser mice. In the mPFC→BLA pathway, both the EPSC rise and decay slopes were significantly increased in loser mice compared with winners (rise slope: p = 0.0138; decay slope: p = 0.0392), suggesting enhanced synaptic responsiveness and faster charge transfer kinetics in BLA neurons of losers. In contrast, in the mPFC→NAc pathway, both mEPSC rise and decay slopes were not significantly different between groups. 

      These results provide a quantitative characterization of synaptic response dynamics and reveal pathway-specific differences in synaptic properties associated with social hierarchy. Importantly, this analysis does not rely on amplitude normalization and therefore allows a more interpretable comparison of synaptic response profiles between groups. We have updated Fig. 7 and the corresponding Results section to include these analyses. 

      (b) They still haven't labeled the responses correctly. The responses in figure 7 are not "voltage spikes" but light-evoked EPSCs.

      We apologize for the incorrect terminology. All instances of “voltage spikes” have been corrected to “light-evoked EPSCs” in the figure legends and text.

      (c) They argue that responses do not vary across experiments/slices because they use a constant viral injection volume targeted to the same co-ordinates and identical placement of the fiber and recording location. While I am sure they aim to do that, it is almost impossible to ensure that this was identical across experiments and that the degree of opsin labelling in their slices was the same (See for example Mao et al., 2011 PMID: 21982373 who pioneer the approach of using within slice comparisons to account for this). If I understand their explanation of their strategy correctly, the authors own rebuttal highlights this point, they seem to have needed to vary the LED duration by an order of magnitude (1-10ms) to ensure reliable responses across experiments, even for the same projection.

      We thank the reviewer for raising this important point. We agree that it is not possible to ensure identical opsin expression or light delivery across experiments. We have revised the manuscript to explicitly acknowledge this limitation and clarify that normalization was used to mitigate, but not eliminate, inter-slice variability. We now avoid any interpretation that relies on absolute response amplitude across animals.

      Regarding “LED duration variability (1-10 ms)”, we agree that the need to adjust stimulation duration reflects variability in effective opsin activation across slices. We now clarify this point in the Methods and Results and emphasize that stimulation parameters were optimized to reliably evoke responses rather than to equate absolute light input across experiments.

      Importantly, our main conclusions do not rely on absolute EPSC amplitude comparisons. Instead, they are supported by analyses that are less sensitive to variability in opsin expression or light delivery, including EPSC kinetics (rise and decay slopes), paired-pulse ratio measurements, and AMPA/NMDA ratios. These complementary measures provide a more robust characterization of synaptic properties across conditions.

      (d) Similarly in Fig S6 it is unclear what they are showing. The Y axis is still labeled in pA, yet they claim this is an action potential? Also this analysis is rather irrelevant to the data shown in figure 7 as the pathway between PFC and BLA/NAc is not preserved.

      We thank the reviewer for pointing out the lack of clarity in Fig. S6. We agree that it does not directly inform the interpretation of Fig. 7 and may cause confusion. To improve the clarity and focus of the manuscript, we have therefore removed Fig. S6 from the revised manuscript. The removal of this supplementary figure does not affect the main conclusions of the study.

      (e) It now also seems that these experiments were performed by placing a fiber optic into the slice to elicit responses. This should be detailed in the methods.

      We thank the reviewer for noting this omission. We have added a detailed description of fiber-optic placement within the slice for optogenetic stimulation to the Methods section. Specifically, we clarify that blue light was delivered through a fiber optic positioned above the recorded slice to activate ChR2-expressing mPFC axon terminals within the BLA or NAc. The placement of the fiber relative to the recorded neurons and the stimulation parameters are now explicitly described in the revised Methods section.

    1. eLife Assessment

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide important support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells. While some of the conclusions of the authors are convincing, many aspects of the manuscript remain incomplete and could be improved, especially aspects of cell-type classification and validation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

    3. Reviewer #2 (Public review):

      Summary:

      Bai et al. present in their study three single-cell RNA seq datasets derived from gastrulae, trochophores, and adults of the bivalve Crassostrea gigas. While a dataset on the oyster trochophore has already been published previously (Piovani et al. 2023), the gastrula and adult datasets have not been published yet. The authors conclude that cell types secreting the oyster shell valves use a genetic repertoire that is also used by epithelial and secretory cell types of very different spiralians, such as annelids, chaetognaths and flatworms.

      Strengths:

      The study provides new single-cell datasets from multiple developmental stages of an oyster, offering a valuable resource for the field. It takes a broad comparative approach using state-of-the-art techniques across diverse animal groups and addresses an important question regarding the origin and evolution of shell-forming cell types.

      Weaknesses & suggestions to improve the manuscript:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      (2) Robustness of cell type classification

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Bai et al. reports single-cell transcriptomics of the oyster mantle to elucidate the respective contributions of ancient conserved programmes and lineage-specific genes to the origin of the molluscan shell. The authors compare their dataset with other oyster larval datasets as well as data from other organisms (annelids, chaetognaths) and find evidence of evolutionary conservation and functional similarity with secretory cell types. They also observe that cells involved in secreting the larval skeleton express predominantly recent genes, whereas the adult skeleton-secreting programme is evolutionarily more conserved.

      Strengths:

      The manuscript is well written and clearly presented, and the results are interesting, particularly the distinction between larval and adult skeleton secretion, which is placed in a thoughtful evolutionary context.

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

      We thank the reviewer for the positive evaluation of our work. We are encouraged that the reviewer finds our conclusions balanced and the analyses appropriate. Although no major concerns were raised, we will incorporate clarifications and improvements prompted by the other reviewers to further strengthen the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      We thank the reviewer for this important comment regarding cell type validation. In the previous version of the manuscript, we provided a detailed compilation of referenced marker genes from previous studies in Supplementary File 2. It is possible that, due to an incorrect or unclear reference in the main text, this information was not readily accessible. We will correct and clarify these citations in the revised manuscript to ensure that these resources are clearly presented.

      We agree that spatial validation would provide important support for cell type identities. In the revised version, we will strengthen this aspect by selecting more specific marker genes for each SEC cluster and performing fluorescence in situ hybridisation (FISH) to validate their spatial localization.

      Regarding the gastrula dataset, our original intention was to investigate the developmental transition of shell gland-related cell populations from gastrula to trochophore stages. However, following the reviewer’s suggestion and considering the limited interpretability of the gastrula dataset in its current form, we agree that its inclusion does not substantially strengthen the study. We therefore plan to remove the gastrula dataset from the revised manuscript, and instead focus on the trochophore stage as a representative developmental stage for larval shell formation, enabling a clearer comparison between larval and adult shell-forming cell populations. We note that this change does not affect the main conclusions of the study. In addition, we will curate a refined set of experimentally supported marker genes, and provide an updated supplementary table summarizing detailed information, including cell type annotations, literature sources, and experimental validation methods.

      (2) Robustness of cell type classification 

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      In the revised manuscript, we will refine marker gene selection by prioritizing genes with higher specificity and stronger discriminatory power to improve the robustness of cell type identification. To further support cell identity assignment, we will select representative marker genes for SEC clusters and perform FISH to validate their spatial localization. These revisions will lead to a more robust and conservative interpretation of cell populations.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      We agree that secretory cell types are highly diverse across spiralians and that cross-species comparisons require careful interpretation. In the revised manuscript, we will adopt a more cautious framework, highlight partial conservation of regulatory program alongside functional convergence in secretory processes. We also will strengthen the comparative framework by integrating functional annotations, which may provide complementary support beyond individual gene overlaps. Importantly, we will improve the reliability of oyster SEC annotations through FISH-based spatial validation, thereby increasing confidence in cross-species comparisons. These revisions will provide a more balanced and biologically grounded interpretation of secretory cell evolution across spiralians.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

      In the revised manuscript, we will re-evaluate marker gene annotations to ensure support from existing experimental evidence. For SEC populations, we will validate representative markers using FISH. We will also expand the functional annotation of marker genes and strengthen cross-species comparisons. In addition, we will substantially revise the Results and Discussion sections to improve clarity and depth, reduce overinterpretation of transcriptional similarities, and ensure that all conclusions are more tightly aligned with the strength of the supporting evidence.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      We agree that our reliance on previous studies for functional and experimental characterization requires clearer justification and integration. In the revised manuscript, we will compile a new supplementary table summarizing marker genes with available experimental validation, including their associated cell types, literature sources, and experimental methods. For SEC populations, we will select representative marker genes and perform FISH to validate their spatial localization, thereby providing independent support for cell identity.

      Regarding trajectory inference, we agree that methods such as Monocle are sensitive to assumptions. We will clarify the rationale for root cell selection, test alternative root assignments to assess robustness, and revise our interpretation to avoid strong lineage claims. Rather than stating that proliferative cells give rise to mantle cells, we will describe the observed trajectory as being consistent with a potential developmental relationship, while acknowledging that this does not constitute direct evidence of lineage progression.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      We thank the reviewer for this important comment. In the revised manuscript, we will expand the functional interpretation of key genes by incorporating available literature and, where possible, functional annotations. We will also clarify the basis for cell type annotation and explicitly describe the criteria used, including for proliferative cell populations (e.g. cell proliferation-associated markers).

      Regarding gene annotation, gene names in oyster were assigned based on sequence similarity searches against the eggNOG database. In the revised manuscript, we will provide a comprehensive supplementary table linking gene IDs to their annotations, along with the corresponding database sources. In addition, we will clearly describe how orthology relationships were assessed, including the methods and criteria used (e.g. sequence similarity searches and orthology databases). Throughout the revised manuscript, we will ensure that the use of vertebrate-style gene names is accompanied by appropriate annotation information and does not imply unsupported one-to-one orthology relationships.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      In the revised manuscript, we will expand the Methods section to provide a clearer and more detailed description of the experimental and analytical procedures. BMKMANU is a droplet-based single-cell RNA-seq platform, conceptually comparable to Drop-seq and similar in principle to 10x Chromium. We will also explicitly state that the data generated are compatible with the Cell Ranger pipeline, which was used for downstream processing and analysis. Although BMKMANU is less widely used than 10x Genomics platforms, it has been successfully applied in several recent studies (e.g. Li et al., 2024: https://doi.org/10.1007/s11427-023-2548-3; Li et al., 2025: https://doi.org/10.1038/s41559-025-02642-6; Wei et al., 2024: https://doi.org/10.1038/s41467-024-46780-0), demonstrating its applicability for single-cell transcriptomic analyses across different biological systems. Regarding platform performance, based on technical information provided by the manufacturer, BMKMANU shows comparable sensitivity and cell capture efficiency to 10x Genomics platforms (http://www.biomarker.com.cn/zhizao/dg1000danxibao). In this study, the mantle sample was obtained from a single individual oyster and processed in a single sequencing run, without batch effects introduced by multiple runs. We will clearly state this in the revised manuscript. In addition, we will provide detailed quality control metrics, including the number of cells retained, gene detection rates, and filtering criteria.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      We agree that restricting the phylostratigraphic analysis to mantle tissue represents a limitation when attempting to place our findings in a whole-organism evolutionary context. In the revised manuscript, we will explicitly acknowledge this limitation and expand the Discussion to address how gene age profiles in mantle tissue may differ from those in more evolutionarily conserved tissues. In particular, we will clarify that the enrichment of younger, lineage-specific genes observed in shell-forming cells may reflect tissue-specific functional specialization, and therefore should not be directly generalized to other cell types.

      We acknowledge that a broader single-cell atlas spanning multiple tissues would provide an important comparative baseline for interpreting gene age patterns across the organism. While generating such a dataset is beyond the scope of the present study, we will highlight this as an important direction for future research.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

      We thank the reviewer for this insightful suggestion. Lineage-specific gene duplication is likely to play an important role in shaping both transcription factor repertoires and shell-associated gene families in spiralians, including oysters. In the revised manuscript, we will incorporate a discussion of lineage-specific duplication, particularly in relation to transcription factors and biomineralization-related proteins. We will also, where feasible, explore its potential contribution to our observations and highlight how such duplications may drive the expansion and diversification of secretory gene families.

    1. eLife Assessment

      This study presents a valuable perspective on platelet-mediated fibrin compaction, proposing that fibrin fibers undergo "winding" or coiling, an intriguing framework with potential implications for thrombosis and clot mechanics. However, the evidence supporting an active platelet-driven winding mechanism remains incomplete, relying largely on correlative observations without direct or quantitative validation of the proposed dynamics. Overall, the work is thought-provoking and of clear interest to the field, but stronger mechanistic evidence will be required to substantiate the central claims.

    2. Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions. Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity? Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

    3. Reviewer #2 (Public review):<br /> <br /> Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism. Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated. Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates. The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics. Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers. The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system. Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions.

      We explained our hypothesis concerning the physical mechanism of how platelets may generate the swirling motion, lines 200-215 and in the discussion under "ideas and speculations". We will provide, however, a more detailed explanation about this process in the revised version.

      The reviewer is right, it is difficult to imagine how platelets in a 3D fibrin mesh can accumulate fibers at the base of their extensions to form a cage-like fiber organisation around the center of the platelets. We therefore developed the 2D fiber-retraction assay, which we believe provides important insights for the coiled fiber accumulations above spread platelets in the 2D situation but also provides a framework for interpreting similar processes that may occur within a 3D clot. In response, we will place greater emphasis on clarifying and strengthening the comparison between the potential mechanistic aspects in the 2D and 3D assays, in order to better support our proposed model.

      Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity?

      Yes, double stranded fibrin protofibrils have a helical twist [1]. Furthermore, a clot formed in the absence of platelets and other cellular components shows intrinsic tensile forces [2]. However, we show that inhibition of actomyosin actions prevents fibrin fiber accumulation in the 2D fiber-retraction assay providing evidence that platelet actions are necessary to observe the coiled fibers above spread platelets.

      Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

      Under the in vitro fiber retraction conditions used in our study (constrained or unconstrained clots or even in the 2D assay) individual platelets are homogenously distributed within the forming clot or on the coverslip. Therefore, there are no big platelet aggregates or clusters of platelets under our experimental conditions and the results can only demonstrate how individual platelets act on the fibrin fibers. We will emphasize this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism.

      In the revised version, we will avoid the terms fiber winding/compaction when introducing the 2D fiber-retraction assay (figure 3) to better align with the level of evidence, since coiled fibers cannot be distinguished in this figure. However, coiled fibers above spread platelets are clearly visible in figure 4 and 8 and dynamic fiber rotations or winding are observed in figure 12 and video 9. These observations will be presented more cautiously, as indicative rather than definitive evidence of a winding mechanism.

      Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated.

      For fibrin fiber bundling, staggered or crosslinked protofilaments no platelet actions are necessary as described previously [2, 3] . Since we observed a clear difference between +/- blebbistatin conditions in the 2D fiber-retraction assay, the fiber compaction we observe depends on platelet actions. Consequently, we consider these alternative mechanisms unlikely based on our data. This will be stated explicitly in the results section.

      Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates.

      We will incorporate quantitative measurements to complement the observations obtained from live imaging. It is important to note, however, that angular velocities and coiling rates are likely influenced by the number of fiber–fiber contacts present at the time coiling occurs. Specifically, an increased number of contacts is expected to elevate tension within the network, thereby modulating the forces generated by platelets and, consequently, affecting both velocity and coiling dynamics.

      The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics.

      These live videos are quite difficult to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time needed to adjust parameters for image acquisition, necessitates an arbitrary choice of the acquisition window and only one acquisition (90 min) per sample preparation is possible.

      Furthermore, the laser induced illumination can perturb the observed processes. We therefore use high-spatial-resolution 3D confocal time-lapse imaging, performed in photon-counting mode with very low laser excitation.

      For these reasons, the use of additional markers would be technically challenging and could perturb the delicate equilibrium and dynamics of the process under investigation.

      Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers.

      Importantly, in the 2D fiber-retraction assay, we do not propose that the rotational actomyosin activity leads to a contractility of the platelets which would allow fiber retraction. Rather, we suggest that cytoskeletal actomyosin swirling (as demonstrated for nucleated cells by Bershadsky's team) can induce rotational dragging of extracellular bound fibrin fibers around the pseudonucleus of spread platelets thereby promoting accumulation of fibrin fibers. Consistent with this interpretation, inhibition of myosin by blebbistatin prevents the accumulation of fibrin fibers above spread platelets in the 2D fiber-retraction assay (Fig. 3).

      The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      We thank the reviewer for this insightful comment and acknowledge that the proposed model relies on several important assumptions. In our view, the most significant assumption is that integrin molecules undergo rotational downstream motion as a consequence of their coupling to the swirling cytoskeleton. To assess the necessity and impact of these assumptions, we will perform additional calculations and include the results in the Supplementary Information. These analyses will also provide further validation of the proposed model and underlying mechanism. At the same time, it is important to emphasize that the primary purpose of the model was to examine whether the hypothetical swirling dynamics of the cytoskeleton, together with the associated receptors, could in principle reproduce the experimentally observed fibrin organization.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

      We agree that the proposed mechanism requires further validation. In the revised manuscript, we will therefore present a more cautious and explicitly hypothesis-driven interpretation of the mechanism. We hope that the publication of our observations will be of interest to researchers in the field of thrombosis and clot mechanics who possess the specialized tools and expertise necessary to rigorously evaluate and either substantiate or refute the proposed mechanistic model.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system.

      We agree that the cage-like fibrin organization around platelets is disrupted in the 2D fiber-retraction assay when platelets are completely spread on the coverslip before they have encountered fibrin fibers (Fig. 4). However, some platelets form the same number of extensions as platelets in a 3D clot (Fig. 9 A, B) and are not completely spread on the glass surface. For these platelets a cage-like fibrin organisation is retained under the 2D conditions (Fig. 5 and 6). However, the fiber density at the base of the bulbs is higher in the 2D assay than under the constrained 3D clot retraction conditions (Fig. 1C and Fig. 2), probably because in the 2D condition the fibers are less constrained and readily available for compaction.

      Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      We thank the reviewer for raising this point. We would like to clarify that Figure 2 and Figure 7 correspond to two distinct experimental setups: the constrained clot retraction assay (Figure 2) and the 2D fiber-retraction assay (Figure 7). As such, they are not directly comparable. We understand, however, that the reviewer is likely referring to the apparent differences between Figures 3–6 (lower plasma volume, higher fiber density) and Figures 7–8 (higher plasma volume, lower apparent fiber density).

      The reduced number of visible fibers in the latter condition is not solely a consequence of plasma volume per se, but rather results from the formation of a labile fibrin gel at higher plasma concentrations, which is lost during the fixation and aspiration steps. This effect was initially observed across samples from two donors with differing plasma fibrinogen levels. In one case, an unusually low fibrinogen concentration allowed the addition of higher plasma volumes without inducing gel formation. In contrast, in the other sample, a more typical fibrinogen level resulted in gel formation under the same conditions.

      Importantly, we performed all experiments using matched donor plasma and platelets. As a result, the precise fibrinogen concentration could not be determined prior to experimentation. Nonetheless, post hoc measurements confirmed that fibrinogen levels in most donor samples fell within the normal physiological range, which allowed us to always use the same plasma volumes for low and high plasma concentrations (4ul/ml PBS and 7 ul/ml PBS, respectively) except for one donor as mentioned above.

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      The reviewer probably refers to figure 3 and he is right; it is premature to mention fiber winding at this stage of the results section (see our response to reviewer #2). In the revised version, we will therefore present the criteria used to classify the different degrees of fiber accumulations without referring to fiber winding.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      The "gearwheel" is observed for completely spread platelets in the 2D fiber-retraction assay and a figure illustrating our hypothetical speculations to compare the 2D gearwheel with the 3D clot situation is presented in the discussion under the "Ideas and Speculations" paragraph (Fig. 13). We will give a more comprehensive explanation in the revised version.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      For the platelet shown in the middle panel of Figure 9B, the extensions cannot be clearly distinguished in the MIP (Maximum Intensity Projection) image because extension #8 is positioned above extension #7 and is therefore superimposed in the projection. However, the two extensions can be differentiated when examining the 3D image stack (Video 4). As indicated in the figure legend, the number of extensions was determined manually by scrolling through the z-stack image sequence. In the revised version, we will also define the abbreviation “MIP” as Maximum Intensity Projection.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      We thank the reviewer for this valuable concern. We chose not to combine the experimental fibrin organization and the modeling results within the same figure panel, as the resulting image would be too complex and difficult to interpret. However, we will provide a more detailed comparison between the experimental observations and the modeling results in the Results section. It is also important to emphasize that the comparison between the model and the experimental data was intended to be primarily qualitative rather than quantitative.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

      As mentioned already above, these live videos are quite tricky to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time required to optimize imaging parameters, necessitate the selection of an arbitrary acquisition window. Consequently, only a single acquisition of approximately 90 min can be performed per sample preparation, with no guarantee that relevant platelet-fibrin interactions can be acquired in the acquisition window.

      Furthermore, after blood donation, the first sample is usually ready to be acquired around 3 pm, acquisition time 90 min. At least 10 successful acquisitions per condition would be required to ensure statistical robustness, but maximal 4 can be acquired per donor, because platelet samples start to deteriorate within twelve hours after blood donation.

      Taken together, the intrinsic heterogeneity of the platelet population, the low likelihood of capturing informative events, and the limited availability of suitable imaging resources at our institute render a robust and quantitative comparison between conditions with and without blebbistatin extremely challenging, if not impractical, within a reasonable timeframe.

    1. Author response:

      eLife Assessment

      This valuable study reports that the ALDH-abundant cells display stem cell properties and may play a key role in the endometrial epithelial development in the mouse. The data supporting the main conclusion are solid, although further improvements are needed to strengthen the conclusions. This work will be of great interest to reproductive biologists and biomedical researchers working on women's reproductive health.

      We thank the reviewers and editor for their critical reading and assessment of our manuscript. We carefully considered each of the points raised by the reviewers. In this document and in the edited manuscript and figures, we have carefully addressed each of the comments and requested modifications. In light of these changes, we expect that you will find that the manuscript has improved.

      We indicate our responses to the reviewers below in blue font and highlight the changes in the manuscript using the line numbers corresponding to the tracked version of the revised document.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Tang et al. characterizes the expression dynamics and functional roles of aldehyde dehydrogenase 1 activity in uterine physiology. Using a combination of in vivo lineage tracing and cell ablation coupled with organoid culture, the authors propose that Aldh1a1 lineage-marked cells contribute to uterine gland development and epithelial regeneration. The descriptive data will be of interest to reproductive biologists and clinicians and will build on established hypotheses in the field. The manuscript is well written and scientifically sound; however, several experimental limitations and interpretation caveats should be addressed.

      We thank the reviewer for their comments and expert assessment of our paper.

      (1) The methods surrounding the passage number and duration of culture following sorting prior to transcriptomic profiling should be clarified in the figure legends. Related to this, the representative images in Figures 1D and 1E do not appear consistent with the quantification presented in Figures 1F-H and should be reconciled.

      Thanks for this comment. We have now clarified this in the Figure 1 legend as follows,

      Lines 1026-1029: “Organoid formation assay performed immediately after luminal epithelial cell isolation and by plating equal numbers of viable ALDH<sup>LO</sup> (D) and ALDH<sup>HI</sup> (E) epithelial cells. ALDH<sup>LO</sup> and ALDH<sup>HI</sup> organoids were cultured for two weeks and passaged once prior to the organoid formation assays and transcriptomic analyses.”

      Regarding the second comment, we recognize that the images we showed may not have been the most representative of our quantification. As such, we replaced them with the organoid images below so that they better reflect the quantification outlined in Figure 1F-H.

      (2) The conclusion that ALDH1A1+ cells are enriched in populations with stem cell characteristics relies primarily on transcriptomic analysis. Protein-level co-localization should be performed to strengthen this claim.

      We thank the reviewer for this comment. Unfortunately, the antibodies for many of these stem cell markers (such as LGR5, AXIN2, and SUSD2) are not well-suited for immunostaining. Others that have been proposed in human and are amenable to immunostaining are not suitable markers for mouse endometrial stem cells (such as CDH2). We hope that by showing that ALDH1A1 is expressed in patterns that are similar to the previously published stem cell markers LGR5 and AXIN2 (i.e., throughout the epithelium in the developing uterus and subsequently enriched in the tips of the endometrial glands of adult mice), along with transcriptomic studies, we can demonstrate its utility as a marker for mouse endometrial stem cells.

      (3) The overlap of 19 genes between the data set here and AXIN2 HI data is presented as evidence of shared stemness identity, but no statistical assessment of this overlap is provided. A hypergeometric test should be performed to determine whether this overlap is greater than expected by chance.

      Thank you for this suggestion. We have performed a hypergeometric test and determined that the reported shared genes between the two datasets are greater than is expected by chance. We have updated the results section to state the following:

      Lines 133-141: "We determined that the overlap between ALDH<sup>HI</sup> and Axin2<sup>+</sup> stemness marker genes was significantly greater than expected by chance for both upregulated (21/346 genes, 1.81-fold enrichment, p = 0.0067) and downregulated (19/674 genes, 1.67-fold enrichment, p = 0.021) gene sets (hypergeometric test, universe = 23,182 genes)."

      (4) The impact of tamoxifen injection on Aldh1a1 expression should be characterized in the neonatal uterus, as tamoxifen itself has known estrogenic activity that could confound interpretation of the lineage tracing results at early postnatal timepoints.

      Although we took measures to control for this possibility by using multiple time-points and models to trace the impact of Aldh1a1<sup>+</sup> cells in development and adulthood, we recognize the importance of this comment and acknowledge that this is a limitation in the design of our study. We have included the following text to the Discussion acknowledging this point:

      Lines 434-442: “Given the well-documented impacts of tamoxifen for lineage tracing studies, it is imperative to use doses of tamoxifen that will minimize estrogenic impacts and result in off-target effects (Rios et al., 2016). This often requires administration at doses that will achieve maximal recombination of the desired gene, while ensuring that the potential deleterious impacts of tamoxifen are minimized (Chen et al., 2023; Pimeisl et al., 2013). The cre/ERT2 tamoxifen inducible model is widely used to study uterine biology where it serves as a useful tool to interrogate the spatiotemporal impact of key genes, either through inactivation or for lineage tracing. Despite its widely documented utility across many tissue types and developmental timepoints, the use of tamoxifen and its impacts on the endometrium remain a limitation of our study, which we tried to address by implementing multiple timepoints, doses, and orthogonal assays in our experimental design.”

      (4b) Related to this, while low-dose tamoxifen is shown to label individual cells within 24 hours of injection, the translation dynamics of the label following Cre-mediated recombination can require up to 72 hours. The presence of only a few labeled clones at PND8 but multiple separate clones per cross-section at later timepoints warrants discussion and may reflect labeling kinetics rather than clonal expansion.

      The reviewer raises an important point. We agree that the 72hr-translation kinetics of the cre-mediated recombination is a legitimate consideration for interpreting our data and we have added the text below to the Discussion section acknowledging this point.

      We have addressed this by adding the following text to the discussion:

      Lines 418-423: We hypothesized that the singly labeled cells observed from one day tracing experiments expanded in a clonal fashion during the various timepoints we measured. We note that the translation kinetics of the labeled cells following cre-mediated recombination may contribute to the limited labeling observed at PND8/PND15 and there is a potential for delayed labeling of cells between 24 and 72 hours of tamoxifen administration. However, the continuous increase in labeled cells at the subsequent timepoints favors our interpretation of clonal expansion as the primary explanation.

      (5) It would strengthen the in vivo ablation data to validate the degree of cell death following diphtheria toxin treatment directly. It is possible that a general decrease in cell number rather than specific loss of a stem cell population is responsible for the observed reduction in gland number and FOXA2 expression (Tongtong et al 2017).

      We agree that this is an important control to incorporate into our experimental design. To rule out this possibility, we performed immunohistochemistry of cleaved caspase 3 in the uterine tissues of DTR<sup>flox/flox</sup> and DTR<sup>flox/flox</sup>;Aldh1a1<sup>cre/ERT2</sup> mice 4 days after administration of diphtheria toxin. The results indicate similar levels of cleaved caspase 3 detection in both genotypes, suggesting that the decrease in FOXA2+ cells is not due to non-specific cell death, but rather the result of ALDH1A1<sup>+</sup> cells. These data and the following text have been added to the manuscript:

      Lines 321-325: “We determined that the decreased in FOXA2<sup>+</sup> cells in the experimental mice was not the result of non-specific DT-mediated cell death, as similar levels of cleaved caspase 3-positive cells were detected in the DT-treated control ROSA26<sup>DTR/DTR</sup> and ROSA26<sup>DTR/DTR</sup>;Aldh1a1<sup>cre/ERT2/+</sup> mice 4 days post-diphtheria toxin administration (Figure S3G-H’).”

      (6) The lineage tracing data in the postpartum endometrium demonstrate that Aldh1a1-marked cells are present during regeneration, but it remains unclear whether these cells are preferentially activated or expanded in response to tissue injury. Coupling these studies with diphtheria toxin-mediated ablation during active regeneration would more directly test the proposed regenerative role of this population.

      This is a great point and one that we would be very interested in pursuing as follow-up studies in our future work. Regretfully, due to the long generation time and experimental procedures associated with these proposed studies, we are not able to include these experiments in the current manuscript. Thus, we have changed our wording and conclusions throughout the manuscript to be less definitive in terms of the role of Aldh1a1 in regeneration, since this will be the focus of future studies

      The contribution of stromal Aldh1a1 lineage-positive cells is underexplored in the discussion, given the lineage tracing data showing stromal labeling across multiple timepoints and its potential relevance to mesenchymal-to-epithelial transition.

      Thank you for the suggestion. We have now expanded this section in the Discussion to include the following:

      Lines 497-505: We also found ALDH1A1<sup>+</sup> stromal cells were more prevalent when tracing began in adult mice. Other studies have shown that mesenchymal cells contribute to endometrial regeneration in the postpartum phase or after induced menses through a process of MET (Cousins et al., 2014; Kirkwood et al., 2022; Li et al., 2025). Similarly, lineage tracing studies have shown that MET is an active process and contributes to epithelial cell regeneration in the post-partum phase (Huang et al., 2012; Patterson et al., 2013). Although this is an area of active investigation in the field, with some contradicting reports, it is plausible to hypothesize that endometrial tissue has the capacity to undergo wound-healing and regeneration via several mechanisms (Ang et al., 2023; Ghosh et al., 2020). The process of MET in wound healing is widely documented in other organs, such as the kidney, liver and lung, where MET is associated with depletion of the resident epithelial cell pool (Bi et al., 2012; Niayesh-Mehr et al., 2024; Zeisberg et al., 2005).

      Finally, the word 'control' may overstate the functional evidence presented. 'Contribute' may be more accurate given the partial and context-dependent nature of the phenotypes observed.

      We agree with the reviewer’s point that control may overstate the evidence that we provide in the manuscript. To reflect this, we have edited the manuscript title and text to address this suggestion.

      Reviewer #2 (Public review):

      Tang et al. investigated the contribution of Aldh1a1+ cells, as putative stem/progenitor cells, to endometrial development, maintenance during the estrous cycle, and postpartum repair in mouse models. They employed in vitro organoid formation and in vivo lineage tracing models coupled with RNA-seq to test the stem-ness of Aldh1a1+ cells. They found that mouse endometrial cells with high ALDH activity (using the ALDEFLUOR assay) formed more and larger organoids and were enriched for stem/progenitor cell gene signatures. Similar results were shown using endometrial cells from a human patient sample. Epithelial ALDH1A1 expression was shown to be hormonally regulated, becoming more restricted to the glands, a putative epithelial stem cell niche, under estrogen stimulation. Using lineage-tracing initiated postnatally/prepubertally, Aldh1a1+ epithelial cells were shown to expand, contributing to both the luminal and glandular epithelium into adulthood, whereas adult initiation of labeling showed expansion of stromal Aldh1a1+ cells but not epithelial. Postnatal ablation of single-labeled Aldh1a1+ epithelial cells resulted in impaired gland development. Lastly, Aldh1a1-lineage traced cells (adult labeled) were present during postpartum endometrial repair as were epithelial/mesenchymal transitional cells.

      This study addresses an important area of research in the field of endometrial stem/progenitor cell biology. The authors are commended for their use of multiple complementary methods, including lineage tracing, DTR-mediated cell ablation, organoid assays, and RNA-seq in mouse and human models to assess the stem-like nature of Aldh1a1+ cells. The data support the stem/progenitor phenotype of Aldh1a1+ epithelial cells during endometrial development; however, there are noted discrepancies between organoid formation assays and lineage tracing experiments regarding the stemness of Aldh1a1+ epithelial cells in adults. Specifically, organoids were generated from adult cells and demonstrated in vitro stem cell activity; however, in vivo lineage-tracing of adult cells either during the estrous cycle or postpartum repair does not show expansion of Aldh1a1+ cells, suggesting they do not have stem/progenitor activity. Additionally, the stem-ness of epithelial vs stromal Aldh1a1+ cells is confounded in the study because epithelial cells were not purified for organoid experiments, epithelial cells were not exclusively lineage-traced as stromal cells were also labeled, and mesenchymal-epithelial transition was suggested to occur during postpartum repair. The following specific comments are presented to detail these concerns:

      We thank the reviewer for their critical reading of our manuscript and constructive comments.

      (1) The statement in the brief summary, "...critical for lifelong endometrial regeneration," is not supported by the data provided.

      We have edited the brief summary to exclude this statement, it now reads as follows:

      Lines 4-5: “We uncover ALDH1A1<sup>+</sup> cells as a group of hormone sensitive stem cells contributing to endometrial development and regeneration.”

      (2) AlDH1A1 is not restricted to the endometrial epithelium, and epithelial cells were not purified by flow cytometry for experiments in Figure 1. Figure 2 clearly shows the presence of mesenchymal cells, even using the described method for enriching for epithelial cells. Therefore, contaminating mesenchymal cells with high ALDH activity may confound the experimental results in Figure 1, either through promoting epithelial cell growth or through MET. The authors should provide clear evidence of epithelial purity in organoid experiments or that mesenchymal cells are not contained in the ALDHhi population. These comments also apply to the human organoid experiments in Figure 7.

      We thank the reviewer for raising this important point. Our group has been using the enzymatic method to routinely separate epithelial from stromal cell populations from the mouse uterus (see references dating back to 2015, PMID 26721398, 28324064, 34099644). In these experiments we typically obtain >98% purity in the epithelial and stromal cell compartments, respectively. We can directly observe this purity in the immunofluorescence images shown below, where mouse endometrial epithelial cells and stromal cells were enzymatically separated and immunostained with E-cadherin and vimentin antibodies to detect epithelial and mesenchymal cells in both cell preparations. The images show very few contaminating epithelial and stromal cells in either cell preparation. We have observed similar results when preparing epithelial and stromal cell preparation from the human endometrium, where the epithelial cell organoids display high purity with ~100% epithelial cell expression when we perform immunostaining.

      Author response image 1.

      Purity of mouse endometrial epithelial cells obtained via enzymatic and mechanical dissociation. A-B) Shows the epithelial (A) and stromal (B) cells plated on glass coverslips and immunostained with an epithelial cell marker (cytokeratin 8, red), a stromal cell marker (vimentin, green), and DAPI.

      Author response image 2.

      Human endometrial epithelial organoids were fixed and immunostained with cytokeratin 8 (green) and DAPI. The images are typical for our epithelial cell cultures and demonstrate that all epithelial cells are CK8-positive.

      (3) Lines 186-187: Susd2 was increased in EpSC clusters, yet this is a mesenchymal stem/progenitor marker in humans. The authors should discuss the implications of this.

      We thank the reviewer for highlighting this. We have now included the following in our Discussion to address this point:

      Lines 528-533: Clustering with this population of EpSCs were Susd2<sup>+</sup> cells, which are well-characterized mesenchymal progenitors that are enriched in the perivascular regions of the human endometrium (Darzi et al., 2016; Khanmohammadi et al., 2021). The presence of Susd2<sup>+</sup> cells, while unexpected in an epithelial stem cell niche, could indicate the presence of a transitional mesenchymal or perivascular cell that is differentiating into epithelium. Evidence for both mesenchymal and Nestin2<sup>+</sup> pericytes have been recently described in the mouse endometrial epithelium (Kirkwood et al., 2022; Li et al., 2025).

      (4) In Figure 5, RFP+ epithelial cells should be quantified as in previous figures to substantiate the statement in lines 279-280, "At PPD5, the proportion of RFP+ epithelial cells had expanded relative to PPD1 and PPD3 (Figure 5E-E')." Especially because in the low mag images (C-E), RFP+ epithelial cells appear to be most abundant at PPD1 and decrease at PPD3 and PPD5, suggesting that they may not be involved in endometrial regeneration/repair (contradicting the interpretation in line 285). Further, if there is in fact a decrease over postpartum repair, then regeneration should be removed from the title of the manuscript. RFP+ stromal cells should also be quantified.

      We appreciate this reviewer’s comment and agree that as stated, the conclusion is not fully supported by the data. To address this comment, we have edited the results so that they clearly indicate the results and remove any ambiguity:

      As requested, we quantified the number of RFP+ stromal and epithelial cells during the postpartum phase and noted that RFP+ cells were prominent in the stromal compartment of the endometrium. While RFP+ epithelial were also observed during these timepoints, they were less abundant than RFP+ stromal cells. Because the number of RFP+ cells did not significantly change over the postpartum phases in neither the stromal nor epithelial compartment, we have modified our conclusion to state that ALDH1A1+ cells are transiently detected in the regenerating endometrium.

      Results:

      Lines 286-295: “By analyzing the uterine tissues near the placental detachment site, we observed that RFP positive cells were prominent in the endometrial stromal cells that were adjacent to the luminal epithelium (Figure 5C-C’, green arrows). RFP<sup>+</sup> cells were also observed in the stromal cells near the placental detachment sites at PPD1 and PPD3 (Figure 5D’-E’, red & blue arrows) and in limited luminal epithelial cells (Figure 5D”,E”). Quantification of RFP<sup>+</sup> cells throughout these postpartum phases indicated that stromal cells had more frequent ALDH1A1<sup>+</sup> stromal cells (360 ± 103, PPD1, n=3; 217 ± 107, PPD3, n=3; 254 ± 32, PPD5, n=4) than ALDH1A1<sup>+</sup> epithelial cells in the regenerating endometrium (65 ± 65, PPD1, n=3; 20 ± 10, PPD3, n=3; 114.25 ± 39, PPD5, n=4) (Figure S4).”

      Discussion:

      Lines 513-521: “We also noted that a majority of ALDH1A1<sup>+</sup> cells were localized to the active areas of endometrial regeneration near the placental detachment sites at PPD1 with a pronounced expression in the sub-epithelial stromal cells. As regeneration progressed, we continued to observe ALDH1A1<sup>+</sup> cells in the stromal compartment within the placental detachment sites at PPD3 and PPD5, with a progressive, but not statistically significant, increase in ALDH1A1<sup>+</sup> epithelial cells. Collectively, our data demonstrate that ALDH1A1<sup>+</sup> lineage cells participate in the restoration of endometrial architecture and functional compartments in the postpartum phase, even if their direct contribution is transient. Future detailed and mechanistic studies will be necessary to fully characterize their role in this process and their long-term consequence in postpartum regeneration.”

      (5) For Figure 7F, it should be clearly stated in the main text that the results are from one patient sample and the data presented are experimental replicates, so as not to be confused with biological replicates (the same for Supplementary Figure S4). Were B and G in Figure 7 also from one patient?

      Thanks for pointing this out. We have edited the figure legends in the main text and supplemental figures to indicate this.

      Lines 337-338: “…main figures show representative results from one patient sample performed in technical replicates, with additional patient samples included in the supplement…”

      (6) Lines 425-427: "Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed a complete restriction of ALDH1A1 to the glandular crypts." In Figure 2 S' ALDH1A1+ cells are visible in the LE (the staining is lighter than in the GE but looks real), contradicting this statement.

      This is an important distinction. We have now edited this part of the manuscript to state:

      Lines 459-462: “Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed enriched ALDH1A1 in the glandular crypts with weak luminal epithelial staining, while the ovariectomized controls had strong ALDH1A1 expression throughout the luminal and glandular epithelium.”

      (7) Lines 466-467: "In cycling mice, we found sporadic cells that expressed both stromal and epithelial markers in the ALDHA1+ cells." These data are not presented.

      We apologize for the confusion, this sentence has been removed from the discussion.

      (8) These data support the role of Aldh1a1+ cells in endometrial epithelial development, but conclusions about their role in repair/regeneration should be tempered as the data are much weaker here.

      We thank the reviewer for their overall assessment. To address this point, we have thoroughly edited the appropriate areas to temper the conclusions and ensure that they are strongly supported by our data. We have also edited the manuscript’s title to reflect this.

      Reviewer #3 (Public review):

      Summary:

      Tan et al demonstrated the importance of ALDH-high cells in the epithelial development in the mouse endometrium, and these cells displayed properties of stem cells.

      We thank the reviewer for their assessment of our manuscript.

      Strengths:

      The findings are solid, supported and validated through a combination of technical methods. I appreciated this combined use of mouse and human endometrial cells to strengthen the findings. Genomic results from a single-cell sequencing dataset were informative as they depicted the different stages of the estrus cycle during the regeneration process. Verification with immunostainings with various markers made it convincing for readers to visualize the cell's location, progression, and status at different timepoints. Utilizing human endometrial cells further demonstrated that the phenomenon observed in mice can be translated to humans.

      This work will greatly advance the understanding of endometrial regeneration for reproductive biologists.

      We thank the reviewer for their expert assessment and positive comments regarding our manuscript.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      Reference

      Ang, C.J., Skokan, T.D., and McKinley, K.L. (2023). Mechanisms of Regeneration and Fibrosis in the Endometrium. Annu Rev Cell Dev Biol 39, 197-221.

      Bi, W.R., Jin, C.X., Xu, G.T., and Yang, C.Q. (2012). Bone morphogenetic protein-7 regulates Snail signaling in carbon tetrachloride-induced fibrosis in the rat liver. Exp Ther Med 4, 1022-1026.

      Chen, M.Y., Zhao, F.L., Chu, W.L., Bai, M.R., and Zhang, D.M. (2023). A review of tamoxifen administration regimen optimization for Cre/loxp system in mouse bone study. Biomed Pharmacother 165, 115045. Cousins, F.L., Murray, A., Esnal, A., Gibson, D.A., Critchley, H.O., and Saunders, P.T. (2014). Evidence from a mouse model that epithelial cell migration and mesenchymal-epithelial transition contribute to rapid restoration of uterine tissue integrity during menstruation. PLoS One 9, e86378.

      Cousins, F.L., Pandoy, R., Jin, S., and Gargett, C.E. (2021). The Elusive Endometrial Epithelial Stem/Progenitor Cells. Front Cell Dev Biol 9, 640319.

      Darzi, S., Werkmeister, J.A., Deane, J.A., and Gargett, C.E. (2016). Identification and Characterization of Human Endometrial Mesenchymal Stem/Stromal Cells and Their Potential for Cellular Therapy. Stem Cells Transl Med 5, 1127-1132.

      Ghosh, A., Syed, S.M., Kumar, M., Carpenter, T.J., Teixeira, J.M., Houairia, N., Negi, S., and Tanwar, P.S. (2020). In Vivo Cell Fate Tracing Provides No Evidence for Mesenchymal to Epithelial Transition in Adult Fallopian Tube and Uterus. Cell Rep 31, 107631.

      Huang, C.C., Orvis, G.D., Wang, Y., and Behringer, R.R. (2012). Stromal-to-epithelial transition during postpartum endometrial regeneration. PLoS One 7, e44285.

      Khanmohammadi, M., Mukherjee, S., Darzi, S., Paul, K., Werkmeister, J.A., Cousins, F.L., and Gargett, C.E. (2021). Identification and characterisation of maternal perivascular SUSD2(+) placental mesenchymal stem/stromal cells. Cell Tissue Res 385, 803-815.

      Kirkwood, P.M., Gibson, D.A., Shaw, I., Dobie, R., Kelepouri, O., Henderson, N.C., and Saunders, P.T.K. (2022). Single-cell RNA sequencing and lineage tracing confirm mesenchyme to epithelial transformation (MET) contributes to repair of the endometrium at menstruation. Elife 11.

      Li, S.Y., Whiteside, S., Li, B., Sun, X., and DeFalco, T. (2025). Mesenchymal-to-epithelial transition of perivascular cells contributes to endometrial re-epithelialization. Nat Commun 16, 10174.

      Niayesh-Mehr, R., Kalantar, M., Bontempi, G., Montaldo, C., Ebrahimi, S., Allameh, A., Babaei, G., Seif, F., and Strippoli, R. (2024). The role of epithelial-mesenchymal transition in pulmonary fibrosis: lessons from idiopathic pulmonary fibrosis and COVID-19. Cell Commun Signal 22, 542.

      Patterson, A.L., Zhang, L., Arango, N.A., Teixeira, J., and Pru, J.K. (2013). Mesenchymal-to-epithelial transition contributes to endometrial regeneration following natural and artificial decidualization. Stem Cells Dev 22, 964-974.

      Pimeisl, I.M., Tanriver, Y., Daza, R.A., Vauti, F., Hevner, R.F., Arnold, H.H., and Arnold, S.J. (2013). Generation and characterization of a tamoxifen-inducible Eomes(CreER) mouse line. Genesis 51, 725-733.

      Rios, A.C., Fu, N.Y., Cursons, J., Lindeman, G.J., and Visvader, J.E. (2016). The complexities and caveats of lineage tracing in the mammary gland. Breast Cancer Res 18, 116.

      Seishima, R., Leung, C., Yada, S., Murad, K.B.A., Tan, L.T., Hajamohideen, A., Tan, S.H., Itoh, H., Murakami, K., Ishida, Y., et al. (2019). Neonatal Wnt-dependent Lgr5 positive stem cells are essential for uterine gland development. Nat Commun 10, 5378.

      Zeisberg, M., Shah, A.A., and Kalluri, R. (2005). Bone morphogenic protein-7 induces mesenchymal to epithelial transition in adult renal fibroblasts and facilitates regeneration of injured kidney. J Biol Chem 280, 8094-8100.

    2. eLife Assessment

      This valuable study reports that the ALDH-abundant cells display stem cell properties and may play a key role in the endometrial epithelial development in the mouse. The data supporting the main conclusion are solid, although further improvements are needed to strengthen the conclusions. This work will be of great interest to reproductive biologists and biomedical researchers working on women's reproductive health.

    3. Reviewer #1 (Public review):

      The manuscript by Tang et al. characterizes the expression dynamics and functional roles of aldehyde dehydrogenase 1 activity in uterine physiology. Using a combination of in vivo lineage tracing and cell ablation coupled with organoid culture, the authors propose that Aldh1a1 lineage-marked cells contribute to uterine gland development and epithelial regeneration. The descriptive data will be of interest to reproductive biologists and clinicians and will build on established hypotheses in the field. The manuscript is well written and scientifically sound; however, several experimental limitations and interpretation caveats should be addressed.

      The methods surrounding the passage number and duration of culture following sorting prior to transcriptomic profiling should be clarified in the figure legends. Related to this, the representative images in Figures 1D and 1E do not appear consistent with the quantification presented in Figures 1F-H and should be reconciled.

      The conclusion that ALDH1A1+ cells are enriched in populations with stem cell characteristics relies primarily on transcriptomic analysis. Protein-level co-localization should be performed to strengthen this claim.

      The overlap of 19 genes between the data set here and AXIN2 HI data is presented as evidence of shared stemness identity, but no statistical assessment of this overlap is provided. A hypergeometric test should be performed to determine whether this overlap is greater than expected by chance.

      The impact of tamoxifen injection on Aldh1a1 expression should be characterized in the neonatal uterus, as tamoxifen itself has known estrogenic activity that could confound interpretation of the lineage tracing results at early postnatal timepoints. Related to this, while low-dose tamoxifen is shown to label individual cells within 24 hours of injection, the translation dynamics of the label following Cre-mediated recombination can require up to 72 hours. The presence of only a few labeled clones at PND8 but multiple separate clones per cross-section at later timepoints warrants discussion and may reflect labeling kinetics rather than clonal expansion.

      It would strengthen the in vivo ablation data to validate the degree of cell death following diphtheria toxin treatment directly. It is possible that a general decrease in cell number rather than specific loss of a stem cell population is responsible for the observed reduction in gland number and FOXA2 expression (Tongtong et al 2017).

      The lineage tracing data in the postpartum endometrium demonstrate that Aldh1a1-marked cells are present during regeneration, but it remains unclear whether these cells are preferentially activated or expanded in response to tissue injury. Coupling these studies with diphtheria toxin-mediated ablation during active regeneration would more directly test the proposed regenerative role of this population.

      The contribution of stromal Aldh1a1 lineage-positive cells is underexplored in the discussion, given the lineage tracing data showing stromal labeling across multiple timepoints and its potential relevance to mesenchymal-to-epithelial transition.

      Finally, the word 'control' may overstate the functional evidence presented. 'Contribute' may be more accurate given the partial and context-dependent nature of the phenotypes observed.

    4. Reviewer #2 (Public review):

      Tang et al. investigated the contribution of Aldh1a1+ cells, as putative stem/progenitor cells, to endometrial development, maintenance during the estrous cycle, and postpartum repair in mouse models. They employed in vitro organoid formation and in vivo lineage tracing models coupled with RNA-seq to test the stem-ness of Aldh1a1+ cells. They found that mouse endometrial cells with high ALDH activity (using the ALDEFLUOR assay) formed more and larger organoids and were enriched for stem/progenitor cell gene signatures. Similar results were shown using endometrial cells from a human patient sample. Epithelial ALDH1A1 expression was shown to be hormonally regulated, becoming more restricted to the glands, a putative epithelial stem cell niche, under estrogen stimulation. Using lineage-tracing initiated postnatally/prepubertally, Aldh1a1+ epithelial cells were shown to expand, contributing to both the luminal and glandular epithelium into adulthood, whereas adult initiation of labeling showed expansion of stromal Aldh1a1+ cells but not epithelial. Postnatal ablation of single-labeled Aldh1a1+ epithelial cells resulted in impaired gland development. Lastly, Aldh1a1-lineage traced cells (adult labeled) were present during postpartum endometrial repair as were epithelial/mesenchymal transitional cells.

      This study addresses an important area of research in the field of endometrial stem/progenitor cell biology. The authors are commended for their use of multiple complementary methods, including lineage tracing, DTR-mediated cell ablation, organoid assays, and RNA-seq in mouse and human models to assess the stem-like nature of Aldh1a1+ cells. The data support the stem/progenitor phenotype of Aldh1a1+ epithelial cells during endometrial development; however, there are noted discrepancies between organoid formation assays and lineage tracing experiments regarding the stemness of Aldh1a1+ epithelial cells in adults. Specifically, organoids were generated from adult cells and demonstrated in vitro stem cell activity; however, in vivo lineage-tracing of adult cells either during the estrous cycle or postpartum repair does not show expansion of Aldh1a1+ cells, suggesting they do not have stem/progenitor activity. Additionally, the stem-ness of epithelial vs stromal Aldh1a1+ cells is confounded in the study because epithelial cells were not purified for organoid experiments, epithelial cells were not exclusively lineage-traced as stromal cells were also labeled, and mesenchymal-epithelial transition was suggested to occur during postpartum repair. The following specific comments are presented to detail these concerns:

      (1) The statement in the brief summary, "...critical for lifelong endometrial regeneration," is not supported by the data provided.

      (2) AlDH1A1 is not restricted to the endometrial epithelium, and epithelial cells were not purified by flow cytometry for experiments in Figure 1. Figure 2 clearly shows the presence of mesenchymal cells, even using the described method for enriching for epithelial cells. Therefore, contaminating mesenchymal cells with high ALDH activity may confound the experimental results in Figure 1, either through promoting epithelial cell growth or through MET. The authors should provide clear evidence of epithelial purity in organoid experiments or that mesenchymal cells are not contained in the ALDHhi population. These comments also apply to the human organoid experiments in Figure 7.

      (3) Lines 186-187: Susd2 was increased in EpSC clusters, yet this is a mesenchymal stem/progenitor marker in humans. The authors should discuss the implications of this.

      (4) In Figure 5, RFP+ epithelial cells should be quantified as in previous figures to substantiate the statement in lines 279-280, "At PPD5, the proportion of RFP+ epithelial cells had expanded relative to PPD1 and PPD3 (Figure 5E-E')." Especially because in the low mag images (C-E), RFP+ epithelial cells appear to be most abundant at PPD1 and decrease at PPD3 and PPD5, suggesting that they may not be involved in endometrial regeneration/repair (contradicting the interpretation in line 285). Further, if there is in fact a decrease over postpartum repair, then regeneration should be removed from the title of the manuscript. RFP+ stromal cells should also be quantified.

      (5) For Figure 7F, it should be clearly stated in the main text that the results are from one patient sample and the data presented are experimental replicates, so as not to be confused with biological replicates (the same for Supplementary Figure S4). Were B and G in Figure 7 also from one patient?

      (6) Lines 425-427: "Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed a complete restriction of ALDH1A1 to the glandular crypts." In Figure 2 S' ALDH1A1+ cells are visible in the LE (the staining is lighter than in the GE but looks real), contradicting this statement.

      (7) Lines 466-467: "In cycling mice, we found sporadic cells that expressed both stromal and epithelial markers in the ALDHA1+ cells." These data are not presented.

      (8) These data support the role of Aldh1a1+ cells in endometrial epithelial development, but conclusions about their role in repair/regeneration should be tempered as the data are much weaker here.

    5. Reviewer #3 (Public review):

      Summary:

      Tan et al demonstrated the importance of ALDH-high cells in the epithelial development in the mouse endometrium, and these cells displayed properties of stem cells.

      Strengths:

      The findings are solid, supported and validated through a combination of technical methods. I appreciated this combined use of mouse and human endometrial cells to strengthen the findings. Genomic results from a single-cell sequencing dataset were informative as they depicted the different stages of the estrus cycle during the regeneration process. Verification with immunostainings with various markers made it convincing for readers to visualize the cell's location, progression, and status at different timepoints. Utilizing human endometrial cells further demonstrated that the phenomenon observed in mice can be translated to humans.

      This work will greatly advance the understanding of endometrial regeneration for reproductive biologists.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    1. eLife Assessment

      This study introduces the "Training Village," a valuable system for which solid evidence shows that it enables group-housed rodents to autonomously learn complex tasks while preserving natural social interactions. The platform is flexible, allowing animals to learn multiple tasks sequentially and supporting applications in continual learning. This approach is likely to be of broad interest to behavioral researchers using rodent models in systems and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce the Training Village (TV), an open-source and modular system that allows group-housed rodents to live in enriched home cages while individually accessing a single shared operant box for automated cognitive training. The paper reported the animals' activity both in the operant box and in the home cages, which is novel.

      Strengths:

      A major strength of the work is that it moves beyond a proof-of-concept and demonstrates sustained box usage, long-term trial accumulation, and compatibility with different task designs.

      (1) The platform provided a technical contribution in rodent cognitive neuroscience: obtaining large amounts of behavioral data from complex tasks while reducing experimenter intervention and preserving social housing.

      (2) The authors demonstrate that the system can sustain prolonged task engagement (up to 12 months), maintain efficient use of a single operant box.

      (3) The manuscript opens interesting opportunities for studying behavior outside standard session-based training. Because animals self-initiate training while remaining in a group-housed setting, the platform has the potential to illuminate relationships among motivation, spontaneous activity, and task engagement that are hard to access in conventional paradigms.

      Weaknesses:

      (1) One area that would benefit from further clarification is the manuscript's core advance relative to prior automated group-housed training systems, particularly Mouse Academy (Qiao et al., 2018). The authors listed some advantages in the Discussion section; however, those were some minor engineering improvements, and what is more interesting is the scientific question or results that can be asked or obtained from this study. The current study clearly presents a functional and carefully documented platform, but it would help the reader if the authors more explicitly distinguished the present system from earlier related approaches, both in terms of system design and in terms of experimental validation.

      (2) At the system level, several of the claimed advantages could be supported more directly with quantitative data. For example, if the double-detection corridor and alarm system are important distinguishing features, it would be valuable to report measures such as detection accuracy, missed detections, co-entry failures, alarm frequency, and the degree of manual intervention required in practice. Similarly, the welfare-related arguments are plausible and important, but would be strengthened by more direct evidence, such as longitudinal body weight data, water intake, or comparison with group-housed no-task controls.

      (3) At the experimental level, the manuscript would also benefit from a more detailed characterization of training performance. Although three behavioral paradigms are presented, the data currently shown provide a stronger demonstration of feasibility than of training optimization. For a study focused on automated cognitive training, it would be critical to include more information on learning speed, progression across stages, success and failure rates, and variability across animals. Along the same lines, the comparison with manual training is a useful addition, but a broader benchmark including learning curves, time to criterion, and between-animal variability would make the practical value of the system easier to assess.

      (4) The authors claimed that they conducted 3 complex cognitive tasks (3AFC, 2AFC, 2AB) in their setup. However, those 3 tasks are quite basic for rodents and have been demonstrated in many studies, especially comparing tasks implemented in Yu et al., eLife 2025. Therefore, lowering this 'complex' statement is necessary.

      (5) The authors claimed that they have successfully implemented the so-called hybrid mode, but it is only briefly described and not supported by citations or data. Since this may be one of the most broadly applicable use cases of the platform, a more detailed explanation of how the system can be integrated with recording workflows would strengthen the manuscript.

      (6) The manuscript highlights the opportunity to relate task behavior to home-cage activity and to study individualized behavioral patterns. To better support these aspects, it would be helpful to include more subject-level analyses, rather than relying predominantly on population averages, or alternatively to discuss in more concrete terms which features of the dataset may be especially informative for studying individuality. More generally, the manuscript would benefit from clarifying whether different parameter settings within this group-housed framework may be better suited for maximizing training efficiency versus preserving more naturalistic or socially modulated behavior, and what the implications of these choices may be for interpretation.

      (7) In Table S1, 'Touch screen' is task-specific and is not necessarily a metric. 'Testing outside home cage' is also not necessarily an advantage (please clarify if it is). Many other systems implemented different levels of 'Alarm system', which is not reflected in the table.

      (8) Table S3 shows important data that help the reader to evaluate the paper's work, thus is deserved to move to the main text.

    3. Reviewer #2 (Public review):

      Summary:

      The Training Village (TV) is an innovative autonomous system for rodent training. By integrating an operant box with a group-housed home-cage environment, this platform enables animals to learn operant behaviors while preserving their social context and interactions, which is an aspect often overlooked in the field. The flexibility and modularity of the TV system allow training across multiple cognitive tasks in a continual learning framework. Furthermore, its remote accessibility and affordability make it a compelling tool for the broader neuroscience community.

      Comments:

      (1) Social Hierarchy and Access Competition

      Previous studies on rodent social hierarchy (e.g., PMID: 21960531) have demonstrated clear dominance structures within group-housed animals. Based on this, one might expect dominant animal(s) to occupy more sessions and trials than subordinate animals by preferentially accessing the operant box. Therefore, it is somewhat surprising to observe a relatively uniform distribution of operant box occupancy across animals (Figure 2a, 2i). As a control, it would strengthen the manuscript to include an independent assessment of social hierarchy (e.g., tube test, barber assay, or similar behavioral metrics) to quantitatively characterize dominance relationships within the cohort. Correlating these rankings with chamber occupancy and trial frequency would significantly strengthen the validation of the system's equity.

      (2) Behavioral Saving Effects in Continual Learning

      The authors demonstrate that the TV platform allows for the sequential learning of multiple cognitive tasks (Figure S3e). This provides an excellent opportunity to examine a continual learning paradigm. A key hallmark of successful continual learning is the "behavior savings effect", where re-learning a previously acquired task occurs faster than initial learning. For example, if animals are trained sequentially on task A (e.g., 2AFC), then task B (e.g., 2AB), and subsequently re-trained on task A, do they exhibit accelerated re-learning? Including such an analysis would significantly strengthen the claim regarding continual learning capabilities.

      (3) Robustness of Multi-Animal Attempt Detection

      In the TV platform, only one animal can access the operant box at a time under group-housed conditions. This setup inherently introduces the possibility of "multi-animal attempts", as shown in Figure 2j-k and Figure S2c. While the authors address this using pixel-based classification, additional quantitative validation would improve confidence in this approach. For instance, presenting the distribution of pixel counts for single-animal versus multi-animal events would be informative. Moreover, given variability in body size across animals, a fixed pixel threshold may not be sufficient. It would be helpful to include analyses of classification performance (e.g., Type I and Type II error rates) across different animal pairings within the same cohort.

      (4) Protocol Flexibility and Implementation

      It would be helpful to clarify how behavioral task protocols are switched within the TV system. Specifically, are task changes applied globally to all animals sharing the operant box, or can they be assigned individually? Additionally, are task sequences pre-programmed prior to the experiment, or can they be modified dynamically during ongoing experiments?

      (5) Presentation and Readability

      To improve readability, the Discussion section could be streamlined, as it is currently somewhat lengthy and descriptive.

    4. Reviewer #3 (Public review):

      Summary:

      The Training Village (TV) is an open-source automated platform for continuous training and testing of group-housed mice and rats in cognitive tasks. Animals live in enriched multi-compartment home cages and access a single operant box individually through a sorting corridor controlled by RFID identification and real-time video analysis. A Raspberry Pi 5 runs the entire system, manages an adaptive training algorithm, monitors animal welfare, and allows remote supervision via a graphical interface and Telegram alarm system. The system is validated across 12 groups totaling 121 animals, three cognitive paradigms of varying complexity, and experiments lasting up to 12 months.

      Strengths:

      (1) The open-source implementation is probably the paper's strongest point. The authors provide not just code but 3D-printable designs, a full bill of materials with costs (~5500€ total), assembly instructions, and a dedicated website. The estimated build time of 2-7 days is credible. In the current landscape of methods papers, this level of documentation is the minimum necessary to allow other laboratories to actually adopt and propagate the system - and the authors deliver it fully. The compatibility with two operant box designs, three cognitively distinct tasks, and two species - demonstrated empirically rather than merely claimed - makes the modularity argument credible and distinguishes the TV from systems designed around a single paradigm. Finally, the combination of automatic weighing at each exit, temperature and humidity tracking, and a granular Telegram alarm system (Table S2) represents a meaningful practical contribution. For a system operating 24/7 without daily human supervision, this level of welfare monitoring is a necessity, and it seems well implemented here.

      (2) With 121 animals across 12 groups, three distinct cognitive paradigms, two species, and longitudinal data spanning up to 12 months, the validation effort is substantial. The authors acknowledge the limitations of their comparisons - notably that the TV vs. manual training comparison is not a controlled experiment. The rat dataset is limited in scope, but the authors at least demonstrate that the system can be adapted to a second species, which is a useful proof of concept. The demonstration that task engagement increases progressively over 12 months (Fig. 3g) is a novel observation at this temporal scale, with practical implications for the design of long-term experiments.

      (3) The demonstration that operant box usage is distributed nearly uniformly across animals (Gini < 0.15 in all groups) is carefully demonstrated and addresses a question that any laboratory considering this type of system will legitimately ask, e.g., whether dominant individuals monopolize access at the expense of subordinates. This has been shown before in comparable systems, but remains a necessary validation for each new implementation. The control condition removing temporal constraints (Figure S4) adds useful mechanistic insight into the role of the refractory interval. However, the interpretation of this result deserves more nuance than the authors provide - see Weaknesses.

      Weaknesses:

      (1) The TV is more than an automation tool; its architecture makes the most sense if one intends to study how spontaneous home cage behavior relates to individual cognitive performance, and the introduction and discussion explicitly frame this as a key application. Yet the analysis delivers only group-level descriptive results, and the cognitive data are presented almost exclusively as group averages. The individual-level questions that the system is uniquely positioned to address (do stable home cage behavioral profiles emerge across animals, do animals learn at the same rate and using the same strategies, and do these dimensions correlate with each other ) are never asked. This is particularly relevant given that enriched social environments are precisely the conditions under which stable inter-individual differences tend to emerge spontaneously, even among genetically identical animals (Freund et al., 2013, Science), and that comparable systems have already linked such profiles to cognitive and neurochemical phenotypes (Torquet et al., 2018, Nature Communications). The TV clearly has the data to begin exploring this - doing so would substantially strengthen the paper's scientific contribution beyond its methodological value.

      (2) Sustained daytime operant box usage in nocturnal animals deserves more discussion: Box occupancy during the light phase remains around 75% - only modestly below the ~85% seen at night (Fig. S5a-b). The authors conclude this reflects "sustained engagement with the task throughout the circadian cycle," but other explanations are not considered: residual thirst driving animals to seek sucrose water during the day, and the refractory interval mechanically redistributing sessions into the light phase? A more explicit discussion of the consequences of 24/7 unsupervised testing for data quality (daytime sessions may yield noisier behavioral data?) would be useful.

      (3) The finding that all animals access the operant box in roughly equal proportions (Gini < 0.15) is practically important and carefully demonstrated. However, the authors' interpretation that animals self-organize in an egalitarian manner despite known social hierarchies deserves a note of caution. The system design itself constrains monopolization: the refractory interval imposes the same waiting time on all animals regardless of social rank, and session duration determines how often the box becomes available. The no-constraint control (Figure S4) partially addresses this but was run on already-trained animals, limiting its interpretive value. The key practical message, that all animals can access the task regularly under the proposed design, is well supported. Whether this reflects genuine social tolerance or is primarily a consequence of system constraints is a subtler question that the current data cannot fully resolve.

      (4) The rat cohort consists of a single group of 6 female Long-Evans rats, yet species comparisons are drawn across multiple dimensions (daily sessions, task engagement, performance...). Observed differences could reflect group size, sex, strain, reward calibration, or simple individual variability rather than species differences. These results should be presented for what they are: a useful proof of concept showing the system works with a second species, not a basis for comparative conclusions.

    1. eLife Assessment

      This study provides a valuable contribution to our understanding of the neural basis of perceptual decision-making by jointly modeling behavioral outcomes and EEG signals in a contrast comparison task. The methods and analyses are solid, systematically comparing standard models assuming continuous evidence accumulation with models that track evidence without temporal integration (extrema detection). The authors show that behavior and neural signals are equally consistent with both alternatives, highlighting limitations in current modeling approaches and questioning the generality of evidence accumulation mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines whether humans use protracted temporal integration in a noise-free, deferred-response contrast discrimination task, using a covert evidence-duration manipulation combined with EEG (SSVEP, CPP, Mu/Beta). The key finding is that evidence for protracted sampling is behaviorally and neurally supported, but even joint CPP + behaviour fitting cannot fully discriminate a standard integration (DDM) model from a novel "extremum-flagging" non-integration model. The paper is transparent about this outcome.

      Strengths:

      This is a well-conducted and well-written study that makes a genuine contribution to the perceptual decision-making literature by introducing a clean experimental design for probing temporal integration without participants adapting their strategy and demonstrating for the first time that a non-integration model (extremum-flagging) can replicate CPP waveform dynamics that have long been considered hallmarks of evidence accumulation. The transparent treatment of equivocal modelling outcomes is commendable.

      Weaknesses:

      My main concerns relate to statistical power, the under-specification of the and the extremum-flagging mechanism. Addressing these would greatly strengthen the paper.

      (1) The sample of 16 participants (15, after the exclusion of one participant) is described as "close to similar EEG studies" with no formal power analysis. Given that the paper's core claim rests on subtle quantitative differences between two model classes - differences that are, by the authors' own admission, not sufficient to declare a winner - even a modest increase in sample size might yield a more decisive outcome. At a minimum, the authors should report a sensitivity analysis or post-hoc power calculation to indicate what effect sizes the current N could reliably detect, particularly for the rmANOVA comparisons and the neural constraint fitting.

      (2) The Extremum-flagging model is the paper's most novel contribution, yet its physiological basis is underspecified. The model posits that each decision-terminating bound-crossing triggers a stereotyped, half-sine-shaped centroparietal signal, but no neural circuit or computational mechanism is proposed for how the brain could detect the first bound-crossing event in a non-accumulating evidence stream or generate a temporally precise, fixed-amplitude signal in response. Possible connections to P3b theories of context updating and response facilitation are acknowledged, but these are vague functional descriptions rather than mechanistic accounts. I think the discussion should engage more directly with potential neural substrates that could generate this flagging signal, and whether these are consistent with the known generators of the CPP/P3b. Without this, the extremum-flagging model risks being viewed as a mathematical convenience rather than a biologically plausible alternative.

      (3) The Integration model at the preferred neural weighting estimates a high-to-low contrast drift rate ratio of 8.7, whereas the empirical Mu/Beta lateralization slopes suggest a ratio of approximately 3.5. The authors attribute this discrepancy to the nonlinear contrast response function of early visual cortex and the salience of the high-contrast evidence onset, but these explanations are speculative. These outcomes are arguably the most quantitatively damaging result for the integration model, so they deserve more than a brief discussion. I would recommend that the authors (a) estimate what range of contrast response nonlinearities would be required to close this gap, (b) test whether an alternative drift rate parameterization (e.g., scaling drift rates directly by SSVEP amplitude rather than contrast) reduces the discrepancy, or (c) be more explicit about treating this as a point against the Integration account.

      (4) The sensitivity analysis over neural constraint weightings (w = 0.1 to 1000) is thoughtful, but the paper ultimately acknowledges that the preferred weighting is w=10, chosen because it achieves "a good fit to CPP dynamics without substantively sacrificing behavioral fit" - a qualitative criterion. No principled statistical framework is used to select the optimal weighting or to compare models at a given weighting. A Bayesian model comparison could provide a more formal framework for combining behavioral and neural fit components, and would allow a clearer statement about the relative posterior probability of each model.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hajimohammadi, Mohr, O'Connell and Kelly is intended to demonstrate that participants integrate evidence over time to make a decision, even in a noise-free, static decision context. This is validated by the observation that (1) participant accuracy improves with increased exposure to the stimulus; and (2) there is a correlation between participant accuracy and a neural index of evidence accumulation, as measured by centro-parietal positivity (CPP).

      Strengths:

      (1) Joint modelling of accuracy and CPP dynamics is a significant achievement, as behaviour alone often cannot distinguish between competing theories of decision-making. In the case of protracted sampling in particular, the absence of reaction times (RT) due to the delayed nature of the response makes this method highly appealing.

      (2) The experimental manipulations and the method used to extract the different neural indices are well chosen, enabling the mapping of putative cognitive processes such as evidence accumulation and motor preparation onto the recorded EEG with clarity.

      (3) The in-depth discussion of the results clearly articulates those reported by the authors and in previous works.

      Weaknesses:

      (1) One main issue to support the interpretation of the authors toward the need for protracted sampling is the timing of the evidence. By design, participants believe that the signal is present for 1.6 seconds (reinforced by the fact that easy trials were displayed for 1.6 seconds). However, the difference in stimuli is turned off either 1.4, 1.2, 0.8 or 0 seconds before the cue to respond. While this makes sense in the context of the authors' question, it also raises the possibility that participants will focus on the last samples before answering. Even if participants apply equal weighting, this still favours them delaying evidence accumulation until they are sufficiently certain that the evidence should be present (e.g. participants might start accumulating after the stimulus has disappeared in the 0.2 condition). I do not see an easy way to test these alternative explanations outside of running a study in which the evidence is always offset before the go cue.

      (2) Regarding the behavioural models, are these identifiable based on accuracy data alone? This should be addressed using a parameter recovery study, in which a set of parameters is used to generate data, and the same fitting routine used for the real data is used to estimate the parameters. This would enable us to determine what can be inferred from the model comparison presented. This is not a serious problem for the manuscript, as it specifically aims to go beyond behaviour. It is, however, worth noting that such a parameter recovery addition could be used to demonstrate the need for a joint modelling framework to answer the question of protracted sampling on delayed response times (RT).

      Minor comments:

      (1) I would advise authors to fix the D1 parameter and use it as a scaling parameter across all models. Currently, as I understand it, the models are scale-free, meaning the same fit is achieved by multiplying all parameters by two, for example. This makes the fit more complex (bounds on parameter values are required) and means that the models are less comparable in terms of their estimates. Perhaps I'm missing something, but I would have thought that fixing D1 (the common parameter across all models) would solve these issues.

      (2) Why is the snapshot model so bad despite being a good model in Stine et al 2020? Can the authors speculate in the discussion?

      (3) The meaning of the flag width is unclear. Figure 4 provides the reader with an intuitive understanding of the model that the authors have in mind. However, the tables in the appendices report values between 0.2 and 0.9. I understand that these values represent the width of the half-sine in seconds. This suggests that the actual estimated values for these flag events are much broader than those displayed in Figure 4. While this is probably fine for most models, it can be problematic for the extremum-flagging model, as it means that the rise to the peak takes between 0.1 and 0.45 seconds. While strictly speaking, this is still a 'flag' model, such a slow rise to the peak, given the usual expectation of evidence accumulation, would place this model closer to a smooth integration model than to a boundary-crossing flagging mechanism.

      (4) In the modelling section, it is not clear overall (i.e. for G² and R²) how the participant dimension is taken into account. Are these individually fitted models, and if so, how are the secondary statistics generated from the individual estimates? Or were these fitted over all participants?

      (5) On page 7, in the last sentence of the first paragraph of the section titled 'Decision-Related Neural Signals', the authors state that 'this stable contrast-difference encoding suggests that a constant (i.e. non-adapting) drift rate is a reasonable simplifying model assumption'. However, I am not sure how this is true given that SSVEP quantifies encoding, yet the drift rate can vary due to non-sensory aspects (e.g. attention).

      (6) The mu/beta lateralisation does indeed favor the integration model more, but in terms of boundary estimation and starting-point analyses, both models are pretty far apart. Providing an interpretation of this observation, e.g. regarding alternative linking functions for mu/beta, would add to the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to compare proposal models of perceptual decision making using a joint modeling approach, where they fit models to both behavioral outcomes as well as CPP. Most notably, they compare a standard evidence accumulation model with models that track the evidence without integrating it over time (extrema detection). The authors report that the joint CPP-behavioral data do not discriminate between two of their proposals.

      Strengths:

      This is an interesting finding that reinforces the idea that what we believe to see based on aggregation over trials may not be what happens on every single trial. The models are creative, and the simulations are convincing, relating the models to multiple neural markers of decision formation. These include the CPP but also mu/beta power spectra.

      Weaknesses:

      The paper makes some strong points, and the work seems generally well-executed. The weaknesses that I identified are twofold:

      (1) Embedding in the literature/exposition of the main argument.

      The focus in the introduction is on the noise-free nature of the stimulus and the prolonged presentation time. However, after reading the paper, I felt these were mostly experimental design choices that enable comparison of the different models using the CPP. Perhaps my misreading of the goals of the paper stems from two other observations:

      a) The fact that the stimulus is noise-free does not entail that perception is noise-free. Thus, the argument that using a noise-free stimulus precludes the necessity of temporal integration seems not completely valid. Of course, one could argue that noise is limited in this case, but that makes a noise-free stimulus more of a design choice.

      b) The focus on prolonged stimulus presentation, but at the same time the contrast with expanded judgement, did not make sense to me. Perhaps, as a non-native speaker, I am misreading the subtle difference between "protracted sampling" and "longer sampling", but again, the longer duration seems mostly a design choice.

      More could be said about the optimality of the extrema detection methods. In particular, decades of work (centuries?) have shown that evidence integration is an optimal decision-making procedure: For example, the Sequential Probability Ratio Test is Bayes-optimal wrt mean RT (Wald, 1946); evidence accumulation together with collapsing threshold serves to maximize rewards in repeated choices (e.g., Bogacz et al., PsychRev, 2006; Boehm et al. APP, 2020). Given all this work, why would the brain have evolved to adopt a different mechanism? I realize that the paper is not about optimal decision making, but some discussion of this point seems warranted.

      (2) Modeling choices.

      The authors introduce a parameter, sampT, that represents uncertainty in the sampling onset time. It was not clear to me whether this parameter represented an offset of all trials, or a distribution (probably the latter). I wonder how exactly this parameter was integrated into the models, and in particular, if and how it interacts with the starting-point parameters. My intuition is that on a single-trial, IF early sampling occurs, you can model that with either a negative sampT and z at 0, or with sampT at 0 but a shift in z. This would suggest trade-offs between these parameters, making them hard to estimate independently. Since the paper does not depend on the identification of parameter estimates, this may not be a huge problem, but nevertheless it is good to explore the consequences.

      The way the Bounded Integration model (BIntg) is formulated seems very close to the EZ-diffusion model (Wagenmakers et al., PBR, 2007). This model states that the proportion of correct responses Pc = 1/(1+exp(-B*D/s^2), with B and D the bound and drift rate parameters, respectively. However, filling in the numbers for the high contrast condition from Table 2, and assuming that s=2 (because the model description states that dt=2, with s undefined), I get a Pc of 80% for the 1.6H condition. This seems substantially less than what Figure 2 suggests.

      On some occasions, it is unclear to me what modeling choices are being made:

      a) It seems as if the models are fit on accuracy data alone (before introducing the neural data). This seems suboptimal given that the authors do report differences in RT.

      b) Are the models fit on all data combined, or on the data of individual participants? Fitting individual participant data is preferred, as combined or aggregated data may be distorted by individual differences.

      c) The authors seem to suggest that the diffusion coefficient s is estimated (in the section "Integration models"). Most likely, however, this is set to a fixed value. Obviously, it matters for the model comparison using AIC whether this parameter was freely estimated or not.

      Not really a weakness, but I wondered about the effect of stimulus duration on RT. In particular, what hypothesis (or post hoc explanation) do the authors have for these RT effects? I could think of at least three hypotheses that are consistent with the behavioral data:

      a) H1: The shorter the evidence duration, the more likely participants are to require a double-check before response execution, reflecting their uncertainty about their decision.<br /> b) H2: There is a collapsing threshold that initiates at stimulus offset, leading to quicker responses on trials where there is more evidence.<br /> c) H3: motor preparation is correlated with the evidence signal, which leads to faster responses on trials with more evidence.

    1. eLife Assessment

      This fundamental work significantly advances our understanding of the circuit-level implementation of predictive processing by elucidating the functional influence between putative prediction error neurons in layer 2/3 and putative internal representation neurons in layer 5. The evidence demonstrating that neither the hierarchical nor the non-hierarchical variant of predictive processing fully accounts for the presented data is convincing. Moving forward, this line of work would benefit from explicitly comparing different theories, thereby clearly articulating the points raised in this paper.

    2. Reviewer #1 (Public review):

      Vasilevskaya and Keller test different models of cortical function through the lens of predictive processing, a powerful framework for the brain to learn and predict the statistics of the world via generative internal models. The authors use a clever combination of behavioral perturbations in closed-loop and open-loop visuomotor virtual reality assays, a paradigm the Keller lab pioneered and used effectively in the past decade, in conjunction with two-photon imaging of neuronal calcium responses and targeted optogenetic perturbations of activity. They specifically put to test proposed hierarchical vs. non-hierarchical circuit implementations of predictive processing by analyzing the logic of inter-lamina interactions (superficial vs. deep; L2/3 vs. L5/6).

      The authors conclude that both versions of predictive processing architectures they analyze are likely invalid, and instead formulate an alternative novel model of cortical function based on a recently developed machine learning algorithm for self-supervised learning (joint embeddings of predictive architectures, JEPA) and its further refinements. JEPA borrows elements from predictive processing, engaging two encoder networks and training the output of one network to predict the output of the other. In their new model of cortical computations, prediction error neurons in L2/3 compare the deep layers (L5/6) activity, which is taken as a teaching signal, to a local, L2/3 prediction of this latent representation.

      Specifically, the authors build on their previous work and reports from other groups that different sets of L2/3 neurons compute positive prediction errors (fire when sensory stimuli appear unexpectedly with respect to the movements of the animal; e.g., grating onsets in the absence of locomotion) and respectively negative prediction errors (fire when sensory stimuli are absent, while the brain expected them to be present; e.g. mice locomote but visual flow is suddenly halted - visuomotor mismatches). These L2/3 positive and negative prediction error neurons exchange messages with neurons in the deeper cortical layers that, the authors propose, build an internal representation (R) of the sensory stimuli given the animals' movements.

      In the hierarchical model, internal representation neurons (R) are supposed to act as a teaching signal for both types of prediction error neurons; the output of the positive prediction error neurons is assumed to suppress activity of R such that the error between the teaching signal and the prediction is minimized; similarly, in the non-hierarchical version, R serves as a prediction for the prediction error neurons, and in turn it receives excitatory drive from the positive prediction error neurons and negative input from the negative prediction error neurons.

      The authors find that the functional impact of L5 neurons on L2/3 neurons is not compatible with the non-hierarchical architecture they and other groups proposed, but rather in accordance with the hierarchical model. At the same time, the functional impact of L2/3 neurons (positive vs. negative prediction error neurons) on L5 neurons (internal representation) appears not compatible with the hierarchical model, but rather in accordance with the non-hierarchical implementation.

      They further hypothesize that L2/3 prediction error neurons don't use sensory input, but rather the L5 activity as a teaching signal, and test it using perturbations (halts) of optogenetic stimulation of L5 neurons coupled with locomotion (Figure 7).

      All in all, the question is topical, and the new model addresses a decades-long quest to develop a unifying model of cortical function. The findings reported here transform our understanding of cortical computations, opening new, exciting avenues for future investigation. The experimental design and execution are rigorous; the arguments are clearly laid out (in spite of ample potential for confusion given the numerous loops and sign flips). These include a discussion of why the non-hierarchical model proposed by the same group does not hold, as well as potential caveats in interpreting the results and novel testable proposed experiments emerging from the JEPA-like model.

      I have several questions about the interpretations of some of the claims and suggestions for potential additional experiments and analyses.

      (1) Some of the pieces of the puzzle remain to be identified and demonstrated: the existence of internal representation neurons in L2/3 and ascertaining that the L5/6 neurons analyzed function indeed as internal representation neurons. The authors find that stimulation of L2/3 positive prediction error neurons enhances activity of L5 neurons...If L5 neurons hold a latent representation that serves as a teaching signal for L2/3 neurons (as the authors posit), wouldn't one expect that the input they receive from the positive prediction neurons be suppressive, such that the error is further minimized?

      (2) Do the authors envision any specific differences between the representations of the two encoder networks posited to exist in L2/3 and L5 in the JEPA-like implementation? Are they synchronous/offset in their temporal representations, or any other features?

      (3) Where is the prediction coming from onto L2/3 neurons? Is it emerging locally in L2/3 from the putative internal representation neurons, or is it long-range - as work from the authors previously proposed? Or a mix of both?

      (4) What is the role of the indiscriminate L4 input that appears to enhance activity of both positive and negative prediction error neurons in L2/3?

      (5) Does Figure 7D change in a meaningful manner if the authors plot the correlation between optomotor mismatch response and visuomotor mismatch response specifically for the negative prediction error neurons in L2/3 (Adamts-2) rather than for all L2/3 cells sampled?

      (6) Do the optomotor mismatch responses in L2/3 neurons depend on how long the closed-loop coupling of optogenetic stimulation of Tlx3 L5 neurons and locomotion speed has been in place for?

    3. Reviewer #2 (Public review):

      This manuscript reveals the functional connectivity of two different classes of cortical neurons that respond in opposite ways to mismatches between sensory and top-down inputs. These data are very valuable because different theories of information processing in the cortex make different predictions on the patterns of connectivity of these neurons. Therefore, these data strongly constrain possible theories of cortical processing.

      General comments:

      (1) The methods of statistical testing are insufficiently described. I did not understand the description in lines 1105-1119. The authors should provide sufficient details so the reader can reproduce their analyses. For example, it may be helpful to provide specific details of the testing procedure for one of the comparisons (e.g. the first comparison in Table S1).

      (2) The authors should clarify how the problem of multiple comparisons was addressed for comparisons performed in multiple moments of time, where significance is indicated by a black bar (e.g. in Figure 2F).

      (3) It would be helpful to add a figure in the Discussion summarising the functional connectivity suggested by all experiments.

      (4) Throughout the manuscript, the authors use the term "teaching signals", but I am unclear what they mean by it: after reading the definition in lines 45-46, I thought that they corresponded to values (as they are compared to sensory signals). Later (428-430), the text suggests that they correspond to error neurons. But then lines 605-607 say it is not an error signal. The authors should define teaching signals very precisely or remove this term.

    4. Reviewer #3 (Public review):

      Vasilevskaya and Keller set out to experimentally distinguish between two variants of predictive processing: a hierarchical and a non-hierarchical variant. The hierarchical variant assumes a hierarchical organization in which internal representation neurons (believed to be a subset of layer 5 excitatory neurons) serve as a source of a teaching signal for local prediction error neurons as well as for the next higher level of the hierarchy, while simultaneously providing prediction signals to the preceding lower level. In contrast, the non-hierarchical variant posits that these layer 5 internal representation neurons provide local predictions to layer 2/3 prediction error neurons.

      The interaction between internal representation neurons and prediction error neurons differs fundamentally between the two variants. In the hierarchical variant, internal representation neurons excite positive prediction error neurons and inhibit negative prediction error neurons, while at the same time being inhibited by positive prediction error neurons and excited by negative prediction error neurons. In the non-hierarchical variant, this pattern of connectivity is reversed.

      This work is very exciting, timely, and carefully executed. The authors functionally, and later molecularly, identify layer 2/3 prediction error neurons in V1 and probe their interactions with genetically defined neuron types in cortical layers 5 and 6 using optogenetics. They demonstrate that the functional influence of putative prediction error neurons in layer 2/3 onto layer 5 is incompatible with the hierarchical variant, whereas the influence of layer 5 onto putative prediction error neurons in layer 2/3 is incompatible with the non-hierarchical variant. They then test an alternative hypothesis, in which layer 2/3 responses resemble prediction errors with respect to perturbations of artificial layer 5 activity patterns. To investigate this, they designed an experiment in which optogenetic activation of L5 IT neurons was closed-loop coupled to the mouse's locomotion speed in the absence of visual feedback, allowing them to probe the causal influence of L5 activity on layer 2/3 responses.

      Finally, the authors hypothesize that their data are more consistent with a joint embedding predictive architecture (JEPA) and outline experimentally testable predictions arising from this framework.

      While the work is overall convincing and significantly advances our understanding of the circuit-level implementation of predictive processing, there are a few weaknesses that should be addressed or discussed:

      (1) The authors define putative positive prediction error neurons as the 15% of neurons most responsive to grating onset and putative negative prediction error neurons as the 15% most responsive to visuomotor mismatch. While this selection would be expected to overlap with negative and positive prediction error neurons, the criterion is not sufficiently stringent (independent of the exact percentage chosen). In particular, classification of a neuron as a prediction error neuron should ideally be accompanied by evidence that it does not exhibit a significant increase in activity when the prediction matches the sensory input or teaching signal.

      (2) The authors "speculate that the prediction error responses in layer 2/3 may not be computed with respect to sensory input, but with respect to layer 5 activity as a teaching signal." However, it is unclear how this perspective differs from earlier statements in the manuscript. In the Introduction, the authors note that "these signals, typically referred to as sensory signals, we will refer to as teaching signals," and later describe the hierarchical variant as one "in which internal representation neurons act as a source of the teaching signal." Given this framing, it is difficult to identify what is conceptually novel in the updated view. Is the key distinction that layer 2/3 neurons are now proposed to generate predictions in an internal representation space rather than in sensory input space, as briefly suggested in the Discussion? Or are the authors introducing a distinction between an external (sensory) and an internal (cortical) teaching signal? If so, this distinction should be made explicit. Clarifying this point would considerably strengthen the manuscript.

      (3) The authors propose that "L2/3 neurons predict L5 activity, hence making predictions in the internal representation space rather than the input space," and further suggest that, since both deep and superficial cortical layers receive thalamic input, the cortex may function like a JEPA. This idea appears closely related to the model introduced by Nejad et al. (2025), which effectively implements a JEPA-like architecture: L5 activity serves as a target against which L2/3 predictions are compared in a self-supervised manner, with both L5 and L2/3 (via L4) receiving thalamic input. It would be helpful for the authors to clarify how their framework differs from that model, and to specify the key conceptual or mechanistic distinctions between the present proposal and the approach described by Nejad et al..

    1. eLife Assessment

      This study presents a valuable finding on the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer. The evidence supporting the claims of the authors is solid, although inclusion of a larger number of patient samples, more statistical details and sufficient comparison with existing large-scale datasets would have strengthened the study. The work will be of interest to medical biologists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates mutations and expression patterns of zinc finger proteins in Kenyan breast cancer patients. Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues.

      Strengths:

      Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The authors identified mutations in ZNF217, ZNF703, and ZNF750.

      Weaknesses:

      (1) Research scope:

      The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

      (2) Language and Style Issues

      There are many typos and clear errors in the main text (e.g. (ref)).

      Additionally, several statements read unnaturally. For example:

      "Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

      "The research team ..." should be rephrased as "Our team ...."

      (3) Methods and Data Analysis Details

      The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

      (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

      (b) Statistical methods for somatic mutation/SNP detection.

      (c) Details of RNA purification and RNA-seq library preparation.

      Without these details, the reproducibility of the study is limited.

      (4) Data Reporting

      This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

      a) Deposit sequencing data in a public repository.

      b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

      c) Clarify whether raw or adjusted p-values were used for DEG analysis.

      d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

      (5) Mutation Analysis

      Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

      Comments on revisions:

      The revised manuscript hasn't addressed any of these concerns. Careful proofreading is recommended, even if the authors do not intend to make further modifications to the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This work integrated the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer.

      Strengths:

      The mutation landscape of ZNF217, ZNF703, and ZNF750 were comprehensively studied and correlate with tumor stage and HER2 status to highlight the clinical significance.

      Weaknesses:

      The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

    4. Reviewer #3 (Public review):

      Summary:

      This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). Higher expression was observed in HER2-positive tumors, though mutation burden for each gene did not correlate significantly with HER2 status or cancer stage. The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

      Strengths:

      The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines (e.g., Mutect2 for variant calling, DESeq2 for differential expression) and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes. Additionally, it explores gene expression patterns across different cancer stages and HER2 status subgroups, generating targeted hypotheses for future validation and enhancing the reliability of its findings.

      Weaknesses:

      The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

    1. eLife Assessment

      This valuable study characterises the activity of motor units from two of the three anatomical subdivisions ("heads") of the triceps muscle while mice walked on a treadmill at various speeds. Altogether, this is the most thorough characterisation of motor unit activity in walking mice to date, providing convincing evidence for probabilistic recruitment of motor units that differed between the two heads.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observe differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools, and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

    3. Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to characterise the firing activity of individual motor units in mice during locomotion. To achieve this, the team implanted small arrays of eight electrodes into two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Concurrently, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice across five speeds, ranging from 10 to 27.5 cm-1.

      From these data, the authors demonstrate that:

      - Their recording method and adapted spike-sorting algorithm enable robust decoding of motor unit activity during rapid movements.

      - Identified motor units tend to be recruited during a subset of strides, with recruitment probability increasing with speed.

      - Motor units within individual heads of the triceps likely receive common synaptic inputs that correlate their activity, whereas motor units from different heads exhibit distinct behaviour.

      The authors conclude that these differences arise from the distinct functional roles of the muscles and the task constraints (i.e., speed).

      Strengths:

      - The novel combination of electrode arrays for recording intramuscular electromyographic signals from a larger muscle volume, paired with an advanced spike-sorting pipeline capable of identifying motor unit populations.

      - The robustness of motor unit decoding during fast movements.

      Weaknesses:

      - The data do not clearly indicate which motor units were sampled from each pool, leaving uncertainty as to whether the sample is biased towards high-threshold motor units or representative of the entire pool.

      - The results largely confirm the classic physiological framework of motor unit recruitment and rate coding, offering limited new insights into motor unit physiology.

      Comments on previous version:

      I would like to thank the authors for their thorough and insightful revisions. I am particularly pleased with the inclusion of the new analyses demonstrating the robustness of motor unit decoding, as well as the improved transparency regarding spike-sorting yield for each muscle and animal. Additionally, the new analyses illustrating that recruitment within muscle heads is consistent with the presence of common synaptic inputs and orderly recruitment significantly strengthen the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that 1) motor units are recruited differently in the two types of muscles and 2) individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle. Third, the recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique dataset, and the data analysis is convincing and well-executed.

      Weaknesses:

      After the revision, I no longer see any apparent weaknesses in the study.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we can not reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s⁻¹.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      We have included a supplementary to Figure 1 to highlight the effectiveness of our spike sorting.

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following text added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated Author response image 1 to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4)    Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5)    Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment)<sub>fast</sub>/p(recruitment)<sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      Eq. 1:

      Eq. 2:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.”

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements.The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word

      ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis (Author response image 3), which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. eLife Assessment

      The authors use single molecule imaging and in vivo loop-capture genomic approaches to investigate estrogen mediated enhancer-target gene activation in human cancer cells. These potentially important results suggest that ER-alpha can, in a temporal delay, activate a non-target gene TFF3, which is in proximity to the main target gene TFF1, even though the estrogen responsive enhancer does not loop with the TFF3 promoter. To explain these results, the authors invoke a transcriptional condensate model. The claim of a temporal delay and effects of the target gene transcription on the non-target gene expression are supported by solid evidence but there is no direct evidence of the role of a condensate in mediating this effect. The reviewers appreciate that the authors have done a lot of work to strengthen the study. This work will be of interest to those studying transcriptional gene regulation and hormone-aggravated cancers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But the authors have greatly improved the manuscript during the revision work.

      Comments on latest version:

      The authors have done a lot of work for the revision. The manuscript has been greatly improved.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript Bohra et al. measure the effects of estrogen responsive gene expression upon induction on nearby target genes using a TAD containing the genes TFF1 and TFF3 as a model. The authors propose that there is a sort competition for transcriptional machinery between TFF1 (estrogen responsive) and TFF3 (not responsive) such that when TFF1 is activated and machinery is recruited, TFF3 is activated after a time delay. The authors attribute this time delay to transcriptional machinery that was being sequestered at TFF1 becomes available to the proximal TFF3 locus. The authors demonstrate that this activation is not dependent on contact with the TFF1 enhancer through deletion, instead they conclude that it is dependent on a phase-separated condensate which can sequester transcriptional machinery. Although the manuscript reports an interesting observation that there is a dose dependence and time delay on the expression of TFF1 relative to TFF3, there is much room for improvement in the analysis and reporting of the data. Most importantly there is no direct test of condensate formation at the locus in the context of this study: i.e. dissolution upon the enhancer deletion, decay in a temporal manner, and dependence of TFF1 expression on condensate formation. Using 1,6' hexanediol to draw conclusion on this matter is not adequate to draw conclusions on the effect of condensates on a specific genes activity given current knowledge on its non-specificity and multitude of indirect effects. Thus, in my opinion the major claim that this effect of a time delayed expression of TFF3 being dependent on condensates in not supported by the current data.

      Strengths:

      The depends of TFF1 expression on a single enhancer and the temporal delay in TFF3 is a very interesting finding.

      The non-linear dependence of TFF1 and TTF3 expression on ER concentration is very interesting with potentially broader implications.

      The combined use of smFISH, enhancer deletion, and 4C to build a coherent model is a good approach.

      Weaknesses:

      There is no direct observation of a condensate at the TFF1 and TFF3 locus and how this condensate changes over time after E2 treatment, upon enhancer deletion, whether transcriptional machinery is indeed concentrated within it, and other claims on condensate function and formation made in the manuscript. The use of 1,6' HD is not appropriate to test this idea given how broadly it acts.

      Comments on latest version:

      I don't think the response to Reviewer 2's comment on LLPS condensates on TFF1 are adequate and given this point is essential to the claims of the manuscript they must be addressed. Namely, the data from Saravavanan, 2020 actually suggest that condensate formation at the locus is not very predictive and barely enriched over random spots. The claims in the manuscript on the dependence of the condensate being responsible for sequestering transcriptional machinery are quite strong and the crux of the current model. To continue to make this claim (which I don't think is necessary since there are other possible models) the authors must test if the condensate at his locus (1) shows time dependent behavior, (2) is not present or weakened at the locus in cells that show high TFF3 expression, (3) is indeed enriched for transcriptional machinery when TFF1 peaks. The use of 1,6 hexanediol is not appropriate as pointed out by reviewer 2 and is no longer considered as an appropriate experiment by many as the whole notion of LLPS forming nuclear condensates is now under question. Such condensates can form through a variety of mechanisms as reviewed for example by Mittaj and Pappu (A conceptual framework for understanding phase separation and addressing open questions and challenges, Molecular Cell, 2022). Furthermore, given the distance between TFF1 and TFF3 it is hard to imagine that if a condensate that concentrates machinery in a non-stoichiometric manner was forming how it would not boost expression on both genes and be just specific to one. There must be another mechanism in my opinion.

      I would recommend the authors remove this aspect of their manuscript/model and simply report their interesting findings that are actually supported by data: The temporal delay of TFF3 expression, the dependence on ER concentration, and the enhancer dependence.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We are pleased that Reviewer 3 appreciated our findings and found the temporal lag between the expression of TFF1 and TFF3 during signaling particularly interesting. The reviewer also advised us not to overemphasize that this lag arises from phase separation of ERα at the TFF1 locus, as the use of 1,6-hexanediol alone is not sufficient to conclusively establish whether ERα condensates undergo liquid–liquid phase separation. We agree with this assessment and have revised the manuscript accordingly. Specifically, we have modified the title to remove reference to phase separation and have updated the text throughout the manuscript to avoid claiming that the observed condensates are a result of phase separation. The revised title is: “Ligand-dependent Enhancer Activation Indirectly Modulates Non-target Promoters in a Chromatin Domain.”

      With these changes, we are proceeding with the Version of Record using revised version of the manuscript.

      ———

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. eLife Assessment

      This important study provides a detailed characterization of individual sarcomeres' contractility and of their synchrony in spontaneously beating cardiomyocytes derived from human induced pluripotent stem cells. The combination of high-resolution tracking, statistical analysis and mesoscopic modeling leads to compelling evidence that sarcomeres operate as dynamically unstable units, leading to stochastic heterogeneities in their contraction-elongation cycles depending on substrate stiffness. The work will be relevant to scientists interested in muscle biophysics, nonlinear dynamics and synchronization phenomena in biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors present comprehensive experimental observations and a theoretical framework to explain the heterogeneous behaviour of sarcomeres in cardiomyocytes. They show that a stochastic component exists in their contractile activity, which may act as a feedback mechanism regulating physiological function.

      Strengths:

      Experiments and data analysis are robust and valid. The rigorous statistical analysis and unbiased methods enable the authors to draw well-supported conclusions that go beyond the existing literature. Their outcomes inform about cellular activity at the individual level and the authors explain how the transient dynamics of single sarcomeres are governed by a force-velocity relationship and lead to the complex contractile patterns. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of out-of-equilibrium dynamics in cardiac cells.

      Very interesting the suggestion that the interplay between intrinsic fluctuations and the dynamic instability are part of a feedback mechanism for maintaining structural and functional homeostasis.

      The addition of the theoretical model and the new text of the manuscript improves the clarity of the study.

    3. Reviewer #2 (Public review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that a reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomers beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding the collective dynamics of coupled, stochastic sarcomeres.

    4. Reviewer #3 (Public review):

      The manuscript of Haertter and coworkers studied the variation of the length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffness.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping eveents which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness, although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomere is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are, to my knowledge, new and original. The authors also made a theoretical model where each sarcomere is described by a Langevin equation based on a non-linear coupling between force and velocity of the sarcomeres. This model accounts well for the experimental results including the observation of what the authors call popping events.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides a valuable characterization of individual sarcomere's contractility and synchrony in spontaneously beating cardiomyocytes as a function of substrate stiffness. The authors, however, provide an incomplete explanation for the observed heterogeneous and stochastic dynamics, so that the work remains mainly descriptive. The work will be of interest to scientists working on muscle biophysics, nonlinear dynamics, and synchronization phenomena in biological systems.

      We appreciate the reviewer’s insightful comments. A detailed explanation of the described phenomena in the form of a theoretical model and simulations was not included in our manuscript, because we believed it would be most impactful to present a detailed quantitative statistical description of the experiments in one manuscript and then introduce the model, which we already had in preparation, in a separate manuscript to avoid diluting the overall message.

      However, following the reviewers’ advice, we have now included a comprehensive model into the revised manuscript. This model qualitatively and quantitatively explains the experimentally observed phenomena and introduces a novel class of coupled relaxation oscillators based on a non-monotonic force-velocity relationship of individual sarcomeres. We believe that this addition significantly strengthens the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors experimentally demonstrated the heterogeneous behavior of sarcomeres in cardiomyocytes and that a stochastic component exists in their contractile activity, which cancels out at the level of myofibrils.

      Strengths:

      The experiments and data analysis are robust and valid. With very good statistics and unbiased methods, they show cellular activity at the individual level and highlight the heterogeneity between biological networks. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of physiology.

      Weaknesses:

      Compared to the current literature ([24]), the study does not show a high degree of innovation. It mainly confirms what has been established in the past. The authors complemented the published experiments by developing an in vitro setup with stem cells and by changing the stiffness of the substrate to simulate pathological conditions. However, the experiments they performed do not allow them to explain more than the study in [24], and the conclusions of their study are based on interpretation and speculation about the possible mechanism underlying the observations.

      We thank the reviewer for contextualizing our work with the literature. We appreciate the comparison to the study by Kobirumaki-Shimozawa et al. which we cite prominently. They observed stochastically varying beating patterns of individual sarcomeres on a beat-to-beat basis. They propose that this arises from a "titin-based mechanism" operating stochastically, which they interpret as being fundamentally linked to sarcomere-length-dependent effects. This interpretation differs from our model. We feel that the inclusion of our comprehensive model in the revised manuscript will emphasize the significance and novelty of our findings. Our work proposes a distinct alternative mechanistic explanation for the observed stochasticity, grounded in the force-velocity relationship and intrinsic stochasticity, and presents additional novel dynamic phenomena (such as popping and high-frequency oscillations) not reported in the literature yet. We outline the key advancements of our study below:

      (1) Physiologically Relevant Human Model System: Our study utilizes human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs). Using a human cell model provides direct relevance for understanding human cardiac physiology and pathophysiology, overcoming limitations inherent in translating findings from rodent models. The hiPSC-CMs exhibit key physiological differences from the mouse ventricular myocytes observed in [24], most notably beating at a significantly lower frequency (~1 Hz or 60 bpm) compared to mice (~5-8 Hz or 300-500 bpm). This difference in timescale is critical as it allowed us to resolve complex intra-beat dynamics that may be different and also harder to observe in mouse cardiomyocytes.

      (2) Advanced Experimental Methodology and Resolution: We developed a novel assay incorporating our SarcAsM algorithm for high-throughput tracking and analysis of individual sarcomere dynamics. This approach gave us spatial resolution better than 20 nm at significantly higher sampling rates than previous studies, including Kobirumaki-Shimozawa et al. Furthermore, our high-throughput in vitro approach made it possible to analyze vastly larger datasets than, e.g., the study by Kobirumaki-Shimozawa et al. (which reports observations from fewer than 20 myofibrils, encompassing less than 200 sarcomeres in total). While we recognize that in-vivo tissue studies present unique experimental challenges, the substantially greater statistical power of our study is crucial for reliably characterizing the complex, stochastic dynamics we report. The enhanced resolution and statistical robustness are not merely incremental; they enable the detailed identification and analysis of heterogeneous behaviors that were previously inaccessible or could not be characterized with the same level of confidence.

      (3) Novel Observed Phenomena: Our high-resolution data reveals specific dynamic behaviors, such as sarcomere "popping" and high-frequency oscillations during contraction, which, to our knowledge, have not been previously reported or characterized in cardiomyocytes. The resolution limitations and the high beating frequency in mouse models may not have permitted the observation of these subtle, but potentially important phenomena.

      (4) Distinct Mechanistic Explanation and Model: Kobirumaki-Shimozawa et al. propose a qualitative model where sarcomere motion variability primarily arises from length-dependent activation. This view is essentially a static one, based on a long history of isometric skeletal muscle experiments, where time-dependent forces are not relevant. We argue that in highly dynamic cardiomyocytes this may not be the most useful approach. While we acknowledge length dependence can play a role, our integrated experimental-theoretical work proposes a different primary mechanism. Our model demonstrates that the observed stochastic heterogeneity and beat-to-beat variations, including the oscillatory motion and popping, can be quantitatively explained by dynamic instabilities arising from a non-monotonic force-velocity relationship of individual sarcomeres in conjunction with intrinsic sarcomere-level stochastic fluctuations. The model emphasizes the active, transient nature of force generation rather than solely assuming length dependence. Our model provides an alternative explanation for the observed dynamics, and a quantitative, mechanism-based understanding.

      Reviewer #2 (Public Review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze the contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomeres beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding of the collective dynamics of coupled, stochastic sarcomeres.

      Weaknesses:

      Further insight into mechanisms could be provided by additional analyses and/or comparisons to mathematical models.

      We thank the reviewer for the feedback. We have enhanced the manuscript by a comprehensive dynamic model, that we also contrast with previously proposed models.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript of Haertter and coworkers studied the variation of length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffnesses.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping events which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomeres is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are to my knowledge new and original and deserve attention.

      Weaknesses:

      However, I find the manuscript a bit frustrating because the authors only give very qualitative explanations of the phenomena that they observe. They mention that popping could be explained by a nonlinear force-velocity relation of the sarcomere leading to a rapid detachment of all motors. However, they do not explicitly provide a theoretical description. How would the popping depend on the parameters and in particular on the substrate stiffness? Would the popping statistics be affected by the stiffness? It is also not clear to me how the dependence on the soft gel stiffness of the cardiomyocyte cell can be explained by the stochasticity of the sarcomere properties. Can any of the results found by the authors be explained by existing theories of cardiomyocytes? The only one I know is that of Safran and coworkers.

      I also found the paper very difficult to read. The authors should perhaps reorganize the structure of the presentation in order to highlight what the new and important results are.

      We are grateful for this detailed and critical feedback. The observed phenomena (stochastic heterogeneity, popping, high-frequency oscillatory motion) can indeed be explained by a nonmonotonic force-velocity relation along with stochastic fluctuations of individual sarcomeres. At the time of initial submission of this manuscript, we already had a theoretical model in preparation, which both qualitatively and quantitatively explains the observed phenomena. As a result, we included certain interpretations preemptively, which caused some lack of clarity in the absence of the full model. We have now added the model to this manuscript, providing a mechanistic interpretation of our findings. The model is different from prior models in that it emphasizes time-dependent forces, typically disregarded in models built to understand isometric skeletal muscle experiments.

      We have shortened, streamlined and restructured our manuscript to improve the readability and accessibility of our study.

      Recommendations for the authors:

      There is a consensus among reviewers that the link between the stiffness dependence of the observed stochastic dynamics and the proposed tug-of-war mechanism is unclear. More quantitative support and discussion is required, possibly using theoretical modeling.

      We are grateful for the insightful and comprehensive feedback by both editor and reviewers. As suggested, we have now added a comprehensive model explaining the observed phenomena and presenting a new conceptual view on cardiac muscle dynamics.

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed an interesting question related to the dynamics of cardiac cells and their multiscale dynamics. They did a good job in terms of experimental design and data analysis. However, I fear that they do not contribute enough new information to the topic.

      The authors should refer to the study in [24] and explain better the difference between these two studies. Although the different approaches are quite obvious, it is not clear to me what additional insights they add to the problem. They conducted their experiments with different stiffnesses. However, the conclusions they draw from the study are based on speculation (e.g. about the behavior of myosin heads in relation to shortening and relaxation), while their data mainly confirm previous studies. They need to address more explicitly the novelty of their study.

      Novelty and Comparison with Previous Studies: We understand the concern about distinguishing our contribution from prior work, specifically Kobirumaki-Shimozawa et al., 2021.

      As detailed in our public response, these are the key advances:

      Use of a medically relevant human iPSC-CM model vs. mouse cardiomyocytes.

      Superior spatial and temporal resolution via our SarcAsM algorithm, revealing novel phenomena like popping and high-frequency oscillations not previously reported.

      Significantly greater statistical power due to our high-throughput in vitro assay.

      We added a distinct mechanistic explanation based on the dynamic force-velocity relationship and sarcomere-level stochasticity, contrasting with the static, deterministic titin/length-dependence focus of previous studies.

      Interpretation and Speculation: We acknowledge that without the explicit model, some interpretations in the initial submission appeared speculative. As noted in our public response, we had already started to develop a theoretical model explaining our observations at the time of submission, targeting a second follow-up publication. Including interpretations based on this unpublished model prematurely clearly caused confusion. We now include the full model in the revised manuscript.

      Integration of the Theoretical Model: We have now fully integrated the model into the revised manuscript. The model explicitly demonstrates how the non-monotonic force-velocity relationship of individual sarcomeres leads to dynamic instabilities around a critical force threshold. This instability along with stochasticity drives a 'tug-of-war' between coupled sarcomeres, generating complex emergent behaviors.

      Mechanistic Explanation Beyond Length-Dependence: Our model quantitatively reproduces all key experimental findings (stochastic heterogeneity, popping, oscillations) without relying on length-dependent activation effects. This strongly supports our conclusion that the active, transient dynamics of individual sarcomeres governed by the force-velocity relationship are fundamental drivers of these complex contractile patterns. We believe this provides a significant conceptual advance, highlighting a potentially underappreciated aspect of sarcomere dynamics. Previous models focused mostly on length-dependence, historically based on skeletal muscle fiber experiments that were often done under static, isometric conditions. We feel that the new model represents a substantial paradigm shift in understanding highly dynamic muscles such as heart muscle.

      We are confident that the inclusion of the model addresses the majority of the reviewer's concerns.

      Additional comments:

      The authors write of a tug-of-war competition between the sarcomeres, and I'm not sure what they mean by that. I would spend more words explaining this point, especially because it seems to be an important point to describe their results. Similarly, they talked about an all-or-nothing phenomenon when they described the elongation of sarcomeres. What do they mean by this?

      We have revised the manuscript where clarification was needed and now define the terms mentioned more explicitly.

      (1) "Tug-of-War": We used this term metaphorically to describe the mechanical competition between linearly coupled sarcomeres within a myofibril, especially when contracting against rigid external boundary conditions. While it is not a perfect analogy, the metaphor intuitively captures the inherent instability of this interaction: similar to how a team in a real tug-of-war might suddenly yield when one person tires and the rest of team gets overloaded, rather than steadily losing ground, the dynamic instability arising from the non-monotonic force-velocity relationship (detailed in our model, lines 300ff) can cause individual sarcomeres to abruptly change state (e.g., shorten or rapidly lengthen) while under tension from their neighbors. We have removed the term from the title and now use it more sparingly within the manuscript to better reflect its role as an illustrative analogy.

      (2) "All-or-Nothing" Elongation (Popping): The term "popping" describes our experimental observation of sudden, rapid, and extensive elongation of individual sarcomeres. This typically occurs late in the contraction cycle during early relaxation, when overall force may be declining, but individual sarcomeres can still experience significant tension from their neighbors. We described this specific type of rapid elongation in the original manuscript as an "all-or-nothing" phenomenon because, typically, sarcomeres in these events yield rapidly and strongly overshoot their resting length without recovering in a given activation cycle. The speed of popping events is substantially higher than the speed of coordinated gradual shortening observed during systoles that is driven by bound myosin heads. This observation strongly suggests an instability-driven, avalanche-like unbinding of myosin heads from the actin filaments during these events.

      We agree that the term "all-or-nothing" is not precise, and we have removed it, as it is not essential for describing the observed "popping" dynamics.

      The authors claim that the popping frequency increases as a function of stiffness. However, Figure 4E does not really seem to be a common practice in terms of statistical significance. A better description could help to remove this doubt.

      We clarified the presentation of popping frequency data and its statistical interpretation.

      (1) Popping Frequency vs. Substrate Stiffness (previously Figure 4D, now Figure 3G):

      We first corrected that the dependence of popping frequency on substrate stiffness was presented in Figure 4D, not 4E. In the revised, shortened manuscript it can be now found in Fig. 3G. Due to the large number of observations (N) in our dataset, the slight upward trend in popping frequency with increasing substrate stiffness shown in Figure 4D does reach statistical significance using standard tests. For details see Figure captions.

      (2) Popping Frequency vs. Sarcomere Resting Length (previously Figure 4E, now Figure 3H):

      Figure 4E addresses the relationship between popping frequency and the individual sarcomere's resting length. To generate this plot, we binned sarcomeres based on their measured resting length (in intervals of 0.02 µm) and calculated the mean popping frequency within each bin across all conditions. We have now clarified this in the figure caption.

      (3) Interpretation of Length Dependence:

      While Figure 3H clearly shows that longer sarcomeres are more prone to popping, we argue this is likely a modulating factor rather than the sole underlying cause. Two key observations support this interpretation:

      Even very short sarcomeres (e.g., < 1.65 µm resting length) exhibit a non-zero popping frequency (around 5-10%), indicating that popping is not exclusive to long sarcomeres.

      The distribution of resting lengths, now added to the graph, is narrower than the wide range (1.6-2.0 µm) plotted in Figure 3H. Popping still occurs stochastically within a myofibril of sarcomere with relatively similar resting lengths.

      Therefore, while length clearly influences the probability of popping, the phenomenon itself appears to be fundamentally stochastic, occurring across a range of lengths. This is consistent with our model in which dynamic instabilities (driven by the non-linear force-velocity relationship) and stochastic fluctuations are the primary triggers, while length affects probability of occurrence.

      Changes in Manuscript:

      We have revised the text associated with Figures 3G and 3H to clarify the distinction between stiffness and length dependence.

      We have added a statement in the Methods section and figure legends (e.g., Legend for Fig 3) explaining our approach to statistical analysis and interpretation for large datasets where standard p-values may be less informative.

      We believe these clarifications directly address the reviewer's concerns about the data presentation and interpretation in Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting study, which however could and should be extended, see below. The current manuscript contains much less information than its length suggests; its figures contain partially redundant data.

      Taking into account this critical feedback, we have restructured, streamlined and shortened the manuscript to improve readability and accessibility.

      (1) How regular are the cellular contraction cycles?

      Have the authors computed a coefficient of variation of cycle durations?

      Does this regularity depend on substrate stiffness?

      We have substantially improved the detection accuracy of contraction intervals compared to our initial submission (details see SarcAsM, https://www.biorxiv.org/content/10.1101/2025.04.29.650605v1). We calculated the beating rate variability (defined as the standard deviation of cycle durations), and found a low variability of on average less than 0.05 s across the tested conditions. The distribution of this variability is positively skewed, with the majority of values clustering near zero. We have added new panels showing these results to Fig. S2B.

      (2) Which experiments could the authors perform to identify the origin of the apparent 3-Hz oscillations?

      Would these oscillations persist even if the cardiomyocytes would not beat?

      We now address these questions in the revised manuscript.

      (1) Active Nature: The ~3 Hz oscillations are clearly linked to active contraction. They are absent in quiescent, non-beating cardiomyocytes observed under identical conditions, confirming that they are not passive fluctuations or baseline cellular tremors.

      (2) Signal Fidelity: We are confident these are genuine physiological events, not artifacts. Our high temporal resolution (~15 ms frame time) and tracking accuracy (< 20 nm) allow reliable detection because events are well above system noise. This is now explained in the revised manuscript.

      (3) Can the authors augment their study by modeling?

      For example, could the experimental data be fitted by a Kuramoto-type model of the form d phi_i / dt = eps*sin( Omega - phi_i ) + lambda*sin( phi_i - phi_i+1 ) + xi_i, combining phase-locking of sarcomere oscillations with phase phi_i to intracellular calcium oscillations with phase Omega, and anti-phase synchronization between neighboring sarcomeres, as well as noise xi?

      If yes, how would the coupling strength depend on subtrate stiffness?

      We now added a model. While a Kuramoto-type phase model is powerful for studying synchronization, we determined that a more mechanistic approach was required. Crucially, sarcomeres are mechanically coupled in series within a myofibril, and this direct physical linkage is not well-represented by the abstract, phase-based coupling of a Kuramoto model.

      Instead, our model comprises serially coupled sarcomeres, each governed by an underdamped Langevin equation. This framework allowed us to infer the force-velocity relation without any prior assumptions directly from our experimental data, revealing a critical non-monotonic characteristic. As we now emphasize in the revised manuscript, this behavior is mathematically equivalent to a Van-der-Pol relaxation oscillator, which reflects the instability-driven nature of the system.

      Furthermore, and in line with the reviewer's suggestion, our model incorporates a stochastic noise term which we found essential for reproducing the observed phenomena. Without this noise term, the characteristic sarcomere dynamics do not emerge (Fig. 5).

      (4) What is the maximally extended length of titin, and how does this length correspond to the maximal length of popping sarcomeres?

      The force-extension curves of titin have been measured in single-molecule experiments (and the packing density of titin is known) - can the authors use this information to infer the forces acting inside sarcomeres?

      We thank the reviewer for this thoughtful question. While sarcomere length during popping can be measured, inferring the corresponding intra-sarcomeric force is not straightforward in a living, contracting cardiomyocyte. The relationship between extension and force is complex and dynamic, involving multiple molecular components.

      Our data show elongations up to 0.5 μm during popping events. While this magnitude is plausibly within the extensibility range of titin and other mechanically relevant components (Caporizzo & Prosser, 2021; Loescher & Linke, 2023), directly inferring force from this observation is challenging. In such a multi-component system with both active and passive elements, total force comprises several factors that cannot be disentangled from a simple length measurement alone. First, the system is dominated by active, velocity-dependent force generation of cross-bridges, which our model shows is non-monotonic. Second, titin exhibits a restoring force that is strongly strain-rate dependent (Rief et al., 1997), critical during rapid elongation. Third, viscous drag forces within the sarcomere are also highly strain-rate dependent, contributing significantly during rapid length changes. Fourth, other structural elements such as microtubules and intermediate filaments contribute to viscoelastic properties, particularly at high strains (Caporizzo & Prosser, 2021). This complex interplay makes it impossible to map a given sarcomere length to a unique force value using single-molecule titin data alone.

      (5) I urge the authors to make their raw data openly available.

      We agree on the importance of data availability. While the complete raw imaging dataset is several hundred gigabytes and thus impractical to deposit, we have uploaded a comprehensive dataset to Zenodo to ensure full reproducibility. This repository includes a representative subset of raw imaging data (50 cells per condition), with corresponding sarcomere motion data provided in a readable JSON format. Crucially, the deposition also contains the complete aggregated data underlying all figures and statistical analyses presented in the manuscript. All provided data can be programmatically accessed and analyzed using our `SarcAsM` Python API. The data can be accessed at: https://doi.org/10.5281/zenodo.17564384.

      Minor

      (1) How did the authors determine the start and end of contraction cycles when analyzing their data?

      The start and end points of each contraction cycle were identified using ContractionNet, a custom convolutional neural network we developed for this purpose. This method, used for all analyses in the revised manuscript, detects contraction intervals with high accuracy directly from sarcomere dynamics time-series data and significantly outperforms the threshold-based approach used previously. The complete methodology, algorithm description, and validation of ContractionNet are detailed in our companion paper on the SarcAsM analysis software

      (www.biorxiv.org/content/10.1101/2025.04.29.650605v1, see Fig. S6).

      (2) What are the measurement errors in determining Delta_SL?

      The measurement error for the Z-band trajectories is approximately 17 nm. This high tracking accuracy is achieved with our deep-learning-based Z-band segmentation approach, which employs a 3D convolutional neural network (3D U-Net) to leverage both spatial and temporal context for robust Z-band segmentation in noisy, high-speed recordings. A full description of this validation is available in our SarcAsM companion paper (see Figure S3 therein).

      (3) Does popping occur while other sarcomeres are still contracting?

      This is an important point. Yes, popping frequently occurs while other sarcomeres within the same myofibril are still actively shortening. This simultaneity is clearly visualized in the newly added Movie M1, which displays a phase-space plot (velocity vs. length change relative to rest) for all tracked sarcomeres over time. In this visualization, popping events appear as trajectories moving into the top-right quadrant (rapid elongation), while concurrently, other sarcomeres are represented by points in the left quadrants (negative velocity), indicating ongoing shortening. We have included Movie M1 as supplementary material.

      (4) The authors argue that their data on popping sarcomeres is consistent with homogeneous popping probabilities.

      (5) Can the authors assess in simulations how dispersed the popping probabilities of individual sarcomeres could be before they would notice a statistically significant difference to the homogeneous case?

      This question touches on a key challenge in analyzing these complex dynamics. A direct statistical test of popping probability for each individual sarcomere is not feasible, as the number of events per sarcomere over our observation time is too low for robust single-unit analysis. Consequently, our approach relies on testing the cumulative distributions of inter-event spatial distances and temporal gaps across all sarcomeres within a given region (LOI).

      In nearly half of the analyzed LOIs, these cumulative distributions were statistically indistinguishable (p > 0.05) from the geometric distribution expected for a single, homogeneous stochastic process. This provides strong support for our primary conclusion that popping is fundamentally a random phenomenon.

      For the cases that deviate from the homogeneous model, we argue that this does not refute the underlying stochasticity of the events. Instead, we propose this is the expected statistical signature of pooling data from a population of sarcomeres that have slight, intrinsic variations in their individual popping probabilities due to factors like resting length or structural integrity. Even if each sarcomere's popping is a locally random event, a cumulative test performed on a population with varied baseline probabilities is expected to detect a deviation from a simple, homogeneous model.

      Regarding the requested simulation study: While we agree this would be methodologically informative, the sensitivity to detect probability dispersion depends on multiple interacting factors (number of sarcomeres per LOI, observation time, event rates, and the assumed form of heterogeneity). Any single simulation scenario would therefore be highly model-dependent and of limited generality. Rather than introducing additional assumptions, we base our conclusions on the observed agreement with the homogeneous model in approximately half of LOIs and the correlation of deviations with measurable properties (Fig. 4E). A comprehensive statistical analysis would constitute a substantial methodological study beyond the scope of this mechanistically focused manuscript.

      (6) Can the authors measure sarcomere rest length and check if this rest length is correlated with the popping probability of individual sarcomeres?

      Yes, we performed this analysis. As shown in Figure 3H (previously Fig. 4E), we found a positive correlation between sarcomere resting length and popping frequency, confirming that longer sarcomeres have a higher probability of popping.

      Importantly, however, the popping probability remains non-zero even for shorter sarcomeres. As detailed in our response to Reviewer #1 regarding this figure, we interpret resting length as a significant modulating factor that influences popping probability, rather than the sole determinant of the phenomenon.

      (7) Several mathematical models of sarcomere contraction exist (e.g., crossbridge models).

      (8) Could the authors perform computer simulations of several such stochastic sarcomere models coupled in series?

      Alternatively, could the authors discuss this?

      As I understand, references 16-18 model myofibril contraction assuming static variability of sarcomeres, but do not account for stochasticity in the contractility of individual sarcomeres.

      We thank the reviewer for this excellent suggestion. We have performed such simulations, and the theoretical model is a central component of our revised manuscript (new Figures 4 and 5; manuscript lines 316ff).

      As the reviewer points out, previous models (e.g., refs 12 and 14 in our manuscript) have often relied on predefined static variability between sarcomeres to explain heterogeneous behavior. Our work takes a fundamentally different approach. We model the myofibril as a chain of serially coupled sarcomeres, where the dynamics of each unit are governed by an underdamped Langevin equation. This formulation inherently incorporates stochasticity and describes the interplay between a non-monotonic, velocity-dependent active force, a length-dependent passive force, and the mechanical coupling to its neighbors.

      Crucially, the model parameters were not assumed, but were instead inferred by fitting the model directly to our experimental data using a gradient-free optimization algorithm. This data-driven stochastic model was sufficient to quantitatively reproduce key observed phenomena, including high-frequency oscillations and popping events. Our central finding is that these complex behaviors emerge naturally from the coupled system, driven by the non-monotonic force-velocity relationship and intrinsic stochastic fluctuations. This demonstrates that predefined static heterogeneity is not required to explain the observed dynamics.

      (9) The manuscript could be shortened (e.g., lines 52-56 in the introduction provide little extra value).

      We have significantly revised the entire manuscript to improve clarity and readability. We have removed sentences in the introduction as suggested and substantially restructured major sections. One of the main reasons for this was the integration of our theoretical model, which was originally prepared as a separate manuscript. This required us to completely reframe the introduction and reorganize the figures and results.

      We are confident that these extensive changes have resulted in a stronger, more concise and impactful paper that now integrates our experimental findings with a theoretical model.

      (10) Figure 2 is overloaded with data. Several panels could be moved to the SM without compromising the key message.

      Introducing the notation in panels Figures 2A-C does not seem ideal to me; maybe add a cartoon?

      We agree that the Fig. 2 was dense. We have redesigned panels A-F to improve clarity and better guide the reader. We now use a consistent color-coding scheme to link the extrema in the phase portraits (A-C) to the corresponding distributions of individual sarcomeres (E-G). We have also revised the accompanying text to make the figure's logic more transparent.

      We have considered moving panels A-C to the supplementary materials. However, we believe their placement in the main text is crucial for two reasons:

      (1) Revealing Core Dynamics: The length-velocity phase portrait is the first visualization that reveals the underlying near-oscillatory dynamics of individual sarcomeres. This was not an assumed behavior but a critical experimental observation that directly motivated our entire theoretical modeling effort. We now also provide animated versions of these plots (Movies X-Y) to further illustrate these complex dynamics.

      (2) Enabling Model-Experiment Comparison: A phase portrait is a standard tool for comparing experimental data with theoretical models. Retaining it in the main text allows us to directly compare data and model in our new Figures 4 and 5, providing a clear validation of our model.

      (11) Similarly, Figures 4F, G, and H seem dispensable to me.

      (I also wonder how clear the analogy of a coin flip is if a biased coin with probabilities p and 1-p needs to be used.)

      We agree that the previous Figure 4F, which served a purely illustrative purpose, was dispensable and have removed it. The "coin flip" analogy was potentially confusing and we have removed it.

      As part of a broader restructuring of the manuscript, the quantitative analyses from the original Figures 4G and 4H are now presented as Figures 3I and 3J. They provide important supporting evidence for the stochastic nature of the resulting popping events. We believe retaining this quantitative analysis is valuable, and we hope that by streamlining the figure and removing the analogy, we have addressed the reviewer's concerns.

      (12) Equation (1) is unnecessarily complicated. The same holds for Equation (2).

      It might make sense to separate definitions for serial and mutual correlations.

      (This would also simplify the axes labels in Figure 3C.)

      (13) The notation used in Equation (1) is not fully clear.

      I assume t denotes a unit-less time index and T is the unit-less duration of a contraction cycle, measured in multiples of a fixed time interval?

      Regarding comments (12) and (13):

      We thank the reviewer for these helpful suggestions. In response to comment (12), we have separated the definitions for the mutual (r<sub>m</sub>) and serial (r<sub>s</sub>) correlation coefficients, presenting them as distinct calculations rather than as special cases of a single, more complex formula. This makes their definitions more direct and explicit. The calculation for the serial correlation coefficient has also been streamlined into a concise inline definition.

      In response to comment (13), we have clarified the notation in Equation (1). In the manuscript text (lines 208f), we now explicitly state that 𝑡 represents the discrete, unitless time index (i.e., the frame number) within a time-series, and 𝑇 is the total number of frames (i.e., the total duration in frames) of a given contraction cycle.

      While Equation (1) itself is the standard definition for the uncentered correlation coefficient and cannot be algebraically simplified, we have added text to specify this and justify its use. This metric (equivalent to cosine similarity) is appropriate for our analysis as it assesses the similarity in the shape of motion patterns, independent of their mean values.

      Finally, to further streamline the paper, we have removed the velocity correlation analysis and the corresponding parts of Figure 3.

      (14) The authors should make clear in all figures what is experiment and what is simulation.

      We have now clarified the nature of each graph in the figure captions.

      (15) The caption of Figure 3C could be simplified.

      We have simplified all figure captions.

      (16) I found Figure 3A hard to understand.

      We concluded that Figure 3A was confusing and did not add essential information to the manuscript. We have removed it entirely.

      Reviewer #3 (Recommendations For The Authors):

      In conclusion, l think that the manuscript would gain a lot if some more precise and more quantitative interpretation of the results were given. This might require a collaboration with theorists.

      We have integrated a novel theoretical framework into the revised manuscript (new Figures 4 and 5; manuscript lines 300ff as described above.

      This new section introduces a data-driven, stochastic dynamical model that simulates the myofibril as a chain of serially coupled sarcomeres. Each sarcomere's motion is governed by an underdamped Langevin equation, a formulation that inherently accounts for stochasticity. Crucially, our model incorporates a non-monotonic force-velocity relationship inferred directly from our experimental data, rather than relying on predefined static variability between sarcomeres a key distinction from previous theoretical work.

      This integrated model successfully and quantitatively reproduces all major experimental phenomena described in the paper, including high-frequency oscillations and stochastic "popping" events. It demonstrates that these complex behaviors emerge naturally as dynamic instabilities from the coupled system. This addition elevates the manuscript from a descriptive study to one that provides a predictive, mechanism-driven framework for understanding sarcomere dynamics.

    1. eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

    2. Reviewer #1 (Public review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us on the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development, which made it very accessible.

      Comments on revisions:

      I maintain my original favorable assessment of this manuscript.

      I thank the authors for considering my comments and for their thoughtful replies. It would have been helpful to see some of the comments reflected in the text and discussion. I leave this to the authors.

      I appreciate that the authors replaced the figures with higher-resolution versions, but I maintain my assessment that the graphical and aesthetic quality of the figures, especially the size of the legends (which are often tiny and difficult to read), labels, colors, etc., could be improved. Again, I leave this to the authors.

    3. Reviewer #2 (Public review):

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show that a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that does scale quadratically with the average bundle length.

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Comments on revisions:

      I thank the authors for their detailed and thorough answers to the points that had been raised. I have no further recommendations.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

      We think this is an excellent assessment of the article. We suggest adding a sentence after the first one: “The distribution of bundle lengths is not Gaussian but Gumbel, since the bundle length is the length of the longest filament in the bundle.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us of the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development which made it very accessible.

      We are grateful to Reviewer #1 for this very favorable assessment.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N-independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that scales quadratically with the average bundle length.

      Strengths:

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future, how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Weaknesses:

      (1) You present the data in Fig. 3 as arguments against the balance point model. Although I agree that the data is compatible with your description of a bundle of filaments, I think that the range of mean lengths you can explore is too limited to conclusively argue against the balance point model. In most cases, your data extend over half an order of magnitude only. Could you provide a measure to quantify how much your model of independent filaments fits better than the balance point model?

      Indeed, we agree that the experimental data we present, each on their own, provide inconclusive evidence of the scaling predicted by our model. However, in aggregate, as presented in Fig. 3E, the data make for compelling evidence of scaling of the variance with the average length squared, as quantified by the power-law fit. Also, we think that Fig. 3E argues strongly against the Balance Point Model, because the data do not conform with simple linear scaling (indicated by the dashed line in Fig. 3E). Regardless, we agree with the referee that better data is needed to make a more convincing case, and we see this paper as a call to arms to collect such data in the future. The published data we used (other than our own data from experiments on yeast actin cables) is from experiments that were not designed with this question in mind, i.e., how do length fluctuations scale with the mean?

      (2) Concerning your bundled-filament model, why do you consider the polymerizing ends to be all aligned? Similarly to the opposite end, fluctuations should be present. Furthermore, it is not clear to me, where the presence of crosslinking proteins enters your description. Finally, linked to my first remark on this model, why is the longest filament determining the length of the bundle in all the biological examples you cite? I am thinking in particular about the actin cables in yeast.

      In the case of the yeast actin cables (which grow from the bud neck into the mother cell), we know that the formins that polymerize the actin filaments are spatially aligned at the bud neck. In the cases of stereocilia and microvilli, again the polymerizing ends of the actin filaments are well-aligned at the growing tips of these bundled actin structures, as indicated by classic EM studies from Lew Tilney and others. The alignment of polymerizing actin filament ends is more difficult to assess at the leading edge of lamellipodia, because of undulating shape of the polymerization (membrane) surface. In fact, this could be the reason why data from the lamellipodia experiments deviate from the line in Fig. 3E, in contrast to the data from the other three structures (this is discussed in some detail in the Supplement). Regarding the actin crosslinkers, the only role they play in our model is keeping the filaments connected in the bundle. As far as the question of why the longest filament in the actin cable is the one that specifies the length of the cable, this is addressed in more detail in our McInally et al., 2024 (PNAS) paper, where we measured cable length by segmenting the fluorescence signal of the cable. Therefore, the filaments in the bundle that extend the furthest define the reported length. Also, given the function of the cables for transporting vesicles, the furthest reach of the filaments in the bundle defines the area from which the vesicles are collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      An important result of the model proposed by the authors is that the relationship between bundle mean length and variance should also inform the number of filaments in the bundle (Equation 13). In the SI the authors thus predict from fitting experimental results that bundles should be made of around 173 filaments, which is larger than most values proposed in the literature (and quoted in this work), except for stereocilia. Can the authors comment on this?

      This is an interesting point that we have been thinking about. Indeed, the model does relate the number of filaments to the variance of the length, but this dependence is logarithmic and therefore insensitive to changes in the number of filaments. Consequently, the number 173 comes with very large error bars and should be thought of more like a few hundred filaments in terms of the precision with which we can extract this number from data. We make this point more clearly in the revised SI, where we now say that based on the data the best we can do is say that the number of filaments is between 80 and 400.

      Along the same lines, in their derivation of Equations 12 and 13 (a key result of the manuscript) the authors make some approximations that are only valid for large N (number of filaments in the bundle). Is this approximation valid for actin cables or filopodia, estimated to comprise only around 10 filaments?

      Indeed, even for N=10 filaments the approximate formulas have errors that are well below what can be measured. We consider the details of the approximation in deriving Equations 12 and 13 from the exact distribution (Equation 11) in the Supplemental section “Distribution of bundle lengths when individual filament lengths are exponentially distributed”. For example, the exact result involves the harmonic number which for N=10 is 2.88, while the approximate formula ln(N) + gamma we use yields 2.92, a fractional error that is < 2%.

      A key assumption of the model is that the bundle length corresponds to the maximum individual filament length inside the bundle. Couldn't bundles comprise several filaments one after another, head-to-tail? What do the authors expect then?

      Excellent point. Indeed, this is precisely the geometry of the yeast actin cable. In our previously published McInally et al., 2024 (PNAS) paper we worked out the math in that case and found that the main result about the variance holds. In this paper we presented a simpler, model that retains the same features of the one presented in the PNAS paper to better accentuate the origins of the scaling of the variance with the mean length, which is simply the result of bundling and identifying the length of the bundle with the length of the longest filament (or, more precisely, furthest extending filament) in the bundle.

      The model also allows us to relate the bundle length fluctuations and average to the individual filament characteristic length (Equations 12 and 13 again). Can the authors comment on the values of 〈l〉 they would obtain for experimental data?

      It is hard to give a precise number, as we would need to know also the number of filaments in the bundle, and for that we would need better electron microscopy data (which has proven difficult for the field to obtain). Still with typical numbers in the 10s to 100s the expected average filament lengths are roughly, ln(10) – ln(100), or 2-5 times smaller than the average bundle length.

      I find the Methods section a bit underwhelming. In particular, can the authors give more details on their treatment of experimental data? Bootstrapping sampling is mentioned but there is no information on the size of the original data sets, which could affect the validity of such a method.

      Thanks for the criticism. We have added details regarding the sizes of the data sets used in the analysis in the Methods section.

      Along the same lines, is the graph in Figure 1E the result of a simulation like the ones the authors used to obtain their result or is it just a schematic? If the first, I would suggest replacing it with an actual simulated length trajectory. In general, I think this work would benefit from more detailed explanations and examples of how stochastic trajectories were computed and analysed.

      This is also a good point. We still prefer to keep the schematic in this figure since our goal here is to define the question before we commence with computations and data analysis. The stochastic trajectories were generated using the standard Gillespi algorithm and the statistics of length were gathered once the dynamics of length reach steady state. We explain this in the Methods section and give more details in the Supplement.

      Finally, while I find the writing in this manuscript to be excellent, I think the figures require some work. The schematics and drawings, which are very low resolution, the font size for the axes, and the choice of colours all make it more cumbersome than necessary to understand what is being shown.

      Thank you for pointing this out. We have made better versions of the figures.

      Reviewer #2 (Recommendations For The Authors):

      "In this case, the length distribution of the bundle derived from extreme value statistics, leads to a peaked non-Gaussian distribution, even when filaments within the bundle are unregulated and exponentially distributed."

      You mention "extreme value statistics" only once, in the introduction. I would suggest that you come back to this notion and explain how your results connect to extreme value statistics or delete it from the manuscript.

      Good point. We added a sentence to draw the reader’s attention to the fact that our result is an extreme value distribution (Equation 11 is the Gumbel distribution) used in statistics of extreme events.

      This is a follow-up of one of my major points of criticism: Fig. 3A: why do you fit (if I understand correctly) the blue and orange data points with the same power law? For (A-- D) The data extend over less than an order of magnitude. Why is a power law fit appropriate? Can you quantify how much better your fits are compared to a linear dependence? Bundling the data of all structures yields a common matter curve (with the exception of filopodia). This is quite remarkable, I think, and merits some more discussion than currently given in the manuscript.

      Good point. We should have been more clear. In Figures 3A-D we show individual data sets for the different bundle structures and compare the prediction of the Balance Point Model (dashed line) to the data. We also do a fit to a power law to show that the data is consistent with the Bundle model. This comparison is made much more clear in Figure 3E.

      Fig 1B, right does not show the addition and removal of subunits - Fig. 1C does. Panel C is not explained in the caption. The second appearance of (D) in the caption could be omitted.

      Good points. We fixed these issues in the new version of the Figure and caption.

      "For individual actin filaments (...)" I found this and the following paragraph slightly confusing at first reading: as long as you write about single filaments, do you have annealing in mind, where two filaments merge and form a longer filament? In case you consider a bundle, do you consider a filament that is cross-linked to other filaments and thereby added to the bundle? Similarly for removing filament segments (severing or unbundling)? Probably, my confusion is a consequence of you seemingly using filament to describe bundles as well as single actin filaments.

      Sorry for the confusion. We tried to be consistent throughout the text and use “filament” to denote a single actin filament and “bundle” a collection of parallel filaments crosslinked together. The assembly and disassembly dynamics of the filaments in the bundle are only relevant to the extent that they affect the length distribution of individual filaments. The main result is largely independent of that (as demonstrated in the Supplement by considering different single filament distributions) once we decide that the length of the bundle is given by the length of the longest filament in the bundle. This is the point of extreme value statistics where a universal, Gumbel distribution for the length of the longest filament in the bundle arises independent of the length distribution of a single filament (this result is akin to the Central Limit Theorem which predicts a Gaussian distribution of the mean of a large number of random numbers irrespective of the distribution they’re drawn from.)

      In Figure 4D, the variance of the filopodia lengths" Probably Figure 3D?

      Yes. Thank you. We fixed this.

      "The filopodia data seemingly has the same slope (...) but with variances higher than what is measured for other actin structures." This finding does not contradict the main statement of a nonlinear scaling of the variance with the mean length, right? I therefore find this discussion slightly peripheral and also confusing. Also, what is the reason to assume that EM might get the actual length of filopodia wrong by a factor of 2 to 3?

      The issue with filopodia is that the way the lengths are measured is by the extent to which the structure as a whole protrudes from the cell. This leaves unresolved the lengths of the actual filaments in the structure, and we suspect that they are longer as they extend into the cytoplasm. This would contribute to the shift off the common curve in the direction that is observed (larger variance associated with smaller average length). We have no way to justify that this would lead to a 2-3 factor other than that would be enough to collapse the data onto the common curve. Clearly more careful experiments are needed to resolve the issue. We added some clarifying remarks to this effect into the discussion.

      Eq.(14) What is Z?

      Thanks for pointing out this omission. Z = L/<L> and we have added that in the formula where Z appears.

      LIST OF CHANGES

      Here we summarize the changes we made to the manuscript and the Supplementary material in response to the reviewers.

      (1) Fixed typo: Figure 1 legend had two parts labelled D which has been changed into a D and a C. The explanation of panel C has been added.

      (2) Fixed typo: The incorrect call to Figure 4D is now corrected to Figure 3D.

      (3) In the Supplementary material we made more precise our estimate of the number of filaments. The wording “From this we can estimate the number of filaments. We find, with a confidence interval of…” we have changed to “From this we can estimate the number of filaments to be between 80 and 400 which compares favourably to the typical number of filaments in the different actin structures that were analyzed.”

      (3) In the Methods section we added the number of measured filament lengths in the different data sets used in the analysis.

      (4) We made better (higher resolution) versions of all the Figures.

    1. eLife Assessment

      This valuable study explores changes in the Drosophila microbiome in response to environmental temperature over more than ten years. The evidence showing that temperature leads to diversification of bacterial clades is solid, but additional information would help clarify how subspecies competition impacts microbiome composition and the host. The work will interest researchers working with microbiomes, microbial ecology, and evolutionary biology.

    2. Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example. While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia. It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model.

      Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

    4. Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example.

      We thank the reviewer for the honest concern and take this opportunity to defend our claim of "subspecies competition used across the manuscript. As the reviewer states, subspecies competition requires tracking the competitive outcomes between the three clades, and this is what we did by sampling and sequencing across ten years of experimental evolution (Figures 4 and S3). For this reason, we point that the subspecies competition assessment comes from the direct observation of changes in relative abundance across the time series, and not from the follow-up experiments in vivo or in vitro.

      While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      We thank the reviewer for the alternative hypothesis that could explain the observed subspecies dynamic. We rule out that dominance of clade C in the cold occurs because the other two clades cannot grow in this regime based on three pieces of evidence:

      (1) In the time series, clades H and U decrease, but never disappear (Figures 4 and S3), even showing some peaks of abundance in specific replicate populations (Figure S3).

      (2) We isolated individuals belonging to clade H in the cold-evolved populations, as shown in figure 2. This is a direct evidence that clade H prevails in the cold-evolved populations, although in low abundance.

      (3) We did grow the three taxa in fly food petri dishes incubated at both temperature regimes, observing growth in all cases.

      We will include the food growth experiment in the revised manuscript as further supporting evidence for growth in both regimes.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      We acknowledge that a strong fitness cost of clade C is observed in axenic D. melanogaster. In the native host, D. simulans, with reduced microbiome, we observed delayed development that could even be an advantage depending on the situation, as pointed out by reviewer 3 in the recommendations.

      Even if we assume that flies colonized with clade C are less fit in the experimental evolution, another caveat is whether the flies can actively select for the L. plantarum clade. Under this assumption, a clade that imposes a fitness cost to the fly (clade C) should be selected against over time because the flies colonized by this clade will have less offspring, or develop later than the rest. Alternatively, as the microbiome is shared among all the individuals in the population, the host might not be able to “purge” the pernicious clade, and L. plantarum dynamics might be controlled solely by the relative fitness between clades in the given experimental treatment. We will discuss this hypothesis in the revision as a way to explain the relationship between the abundance of each clade and the effect on the host.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      We thank the reviewer for the references. These observations will be compared to our phenotypic results and discussed in the revised version of the manuscript.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia.

      We appreciate the thorough reading of the methods by the reviewer. The competitive mapping comprised two steps: first we discarded the reads that mapped to Drosophila, Wolbachia and additional potential contaminants from sequencing facitilies (human, dog...). This step leaves the reads originated from whole the external microbiome of the flies, including L. plantarum. The second competitive mapping step recruits the reads that map any clade of L. plantarum.

      It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

      We acknowledge that the rest of the Drosophila microbiome is not addressed in this study, as we wanted to focus the storyline around the intraspecific dynamics found in L. plantarum. We consider that a complete characterization of the whole Drosophila microbiome would unnecessarily elongate the paper and thus we treat it as a constant biotic factor.

      We must point out that our dataset is not the one reported by Mazzucco & Schlötterer, which was done in D. melanogaster, rather than D. simulans. Nevertheless, both experiments share the same infrastructure, temperature regimes and fly maintenance.

      We will include a list of taxa that were isolated from the populations, as well as to report L. plantarum prevalence and abundance across the experiment in order to provide context of the microbiome, beyond L. plantarum, to the readership.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model. Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      As the reviewer has correctly noted, our experimental setting is not exempt from limitations. Lab-reared flies are fed with a defined standard diet. Furthermore, although the system is not completely close to bacterial migration, this is limited as replicate populations are not allowed to mix during the maintenance of the flies. For this reason, we consider our laboratory setting as a compromise between observing wild populations, which undergo all biotic and abiotic stresses but cannot be manipulated, and evolving the bacteria in absence of the host, or in gnobiotic hosts, in which biotic interactions are not fully considered. We will extend on this in the new version of the manuscript.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

      The reviewer's point was briefly addressed in the Results chapter: "Phenotypic differences in liquid culture".

      Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      We agree with the reviewer in that the use gnobiotic animals is a limitation, as by "tuning" the flies' microbiome we are modifying the interactions between members, which can potentially change the phenotypic outcome. Nevertheless, we use it as a complementary approach, rather than the only inference in our study.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      We acknowledge that our study does not fully resolve the mechanism by which a different clade ends up dominating each temperature regime. The MRS liquid experiment was an attempt to answer whether differences in optimal growth temperature could explain the temperature-specific abundance of the two clades. Our experiments showed, however, thatthis was not the case. Beyond this point, it is hard to disentangle the role of the temperature, as it could also act indirectly on the bacteria, for example, through the host or the food.

      A second observation in our time series was that a third clade, U, was unfit in both regimes despite starting the experiment in high abundance. For this reason we also studied what made this clade less fit. Based on our analyses, we propose that the decrease of clade U was driven by the shift to a laboratory diet, shared by all experimental populations.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

      We consider the reviewer's concern and will tone down the phrasing when reporting our findings in the revised version of the manuscript.

    1. eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

    2. Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.

      Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.

      Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3. Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      Comments on revisions:

      Revised ms clarified all my points, including those I previously misunderstood.

    3. Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Comments on revisions:

      The revised version with revisions to figures, text and new data has addressed all of our prior comments.

    4. Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      Weaknesses:

      The work is limited in scope and does not provide major insights into the mechanism of transcription. The main addition to current models of transcription is that tethering of Kin28 to Tfb3 may limit kinase action from occurring downstream from the initiation site.

      The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3 is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript, although the experiment apparently motivated the subsequent studies reported here.

      Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. It will be interesting to have this idea tested more thoroughly as more molecular evolutionary data becomes available.

      Comments on revisions:

      For the most part, the authors have satisfactorily addressed my previous critique. In particular, they have added to their discussion of evolutionary implications, and performed an experiment casting doubt on the assertion of a dominant negative effect, and as a consequence removed this claim from the manuscript. I also pointed out that the fusion experiments that lead off the Results section are missing the crucial control of including a Tfb3-Kin28 fusion. The authors have elected not to perform this control experiment, pointing out that even this control would be imperfect in some respects, and agreeing that this experiment is somewhat disconnected from the rest of the paper. The reason for including it, in spite of its somewhat tangential nature, is that it provides something of a rationale for the experiments that follow. I don't so much mind their retaining the experiment, as the absence of this control (and indeed, the results) does not so much impact the later results. However, I think if it is to be included, this shortcoming should be explicitly recognized, especially as a service to younger scientists who could benefit from an exposition that includes a thorough consideration of potential control experiments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      (1) Page 4

      PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Figure 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not in other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      (2) Page 8

      Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as the free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function of the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Figure 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      (3) Page 11

      Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward, and the models for coupling initiation and CTD phosphorylation and for the evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Weaknesses:

      Additional data that should be easily obtainable and analysis of existing data would enable an additional test of the models presented and extract additional mechanistic insights.

      We thank the reviewer for the positive assessment and address their specific suggestions below.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module and of Ser5 phosphorylation on the CTD of Pol II is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled, and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      (1) The work is limited in scope and does not provide any major insights into the mechanism of transcription. One indication of this limitation is that in the Discussion, published structural and functional results on transcription are used to support the interpretations of the results here more than current results inform previous models or findings.

      The story we present here is pretty simple, so in that sense we agree it is limited. However, we believe the findings do have mechanistic implications. That the Tfb3/Mat1 tether not only targets kinase activity to the 5’ end, but also somehow limits it from acting downstream seems significant. As for the Discussion, in our papers we always attempt to tie in our results and models with as much of the relevant published literature as possible. We believe this is more interesting, useful, and convincing than simply summarizing the Results section.

      (2) The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3, is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript.

      Our original motivation for the experiment in Figure 1 was to develop a system where we could plug different kinases into the CTD-proximal position. This didn’t work, so it is true that this negative result is somewhat unconnected to the rest of the paper. We choose to include it because it produced the unexpected observation that the Tfb3 C-term domain was not essential for viability, contradicting an earlier report. As for the suggested control of fusing Kin28, please see our reply to the editor’s comments below.

      (3) Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. This idea is supported by a single example from the literature (T. brucei). A more thorough evolutionary analysis could have tested this idea more rigorously.

      Please see our full reply to Point 5 in the editor’s comments. In short, T. brucei was the only primitive eukaryote for which h we found an actual biochemical analysis of TFIIH. However, we now cite some papers reporting protein sequence comparisons for organisms not having a consensus CTD, which lend further support to our idea of fusion of a CDK to TFIIH co-evolved with the CTD during very early in eukaryotic evolution.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Suggestions for Improvement:

      (1) Analyze existing Pol II ChIP-seq data to determine whether reduced TSS-proximal vs. gene-body occupancy observed with the split Tfb3 alleles reflects initiation defects, and whether different gene classes (high vs. low expression, stress-induced genes) show differential effects of splitting Tfb3.

      Thanks for the suggestion. The new analysis is included as Supplemental Figure S6. Several factors indicate an initiation defect rather than an elongation defect (either elongation processivity or elongation rate). First, the shape of the RNApII occupancy trace is flat in all mutants, arguing against a processivity defect, which would have led to a downward slope due to RNApII progressively dropping off from the gene. Because this effect is best seen on long genes (more than 2kb), we generated metagene profiles on long, well-expressed genes only, which led to the same conclusion (see Sup Fig 6A). Second, the mutants lead to decreased RNApII occupancy, arguing against a strong decrease in elongation rate, which -if anything- would have led to an increase in RNApII during early transcription. While we cannot completely exclude the possibility of a mild decrease in elongation rate, such an effect doesn’t fit the patterns we observe. The overall decrease of RNApII occupancy is rather a strong indication of a decrease in early steps (PIC assembly or initiation).

      As requested, we looked at potential differences between gene classes two ways. First, we generated RNApII metagenes on RNApII occupancy quintiles (Q1-Q5). As shown in Sup Fig 6B, RNApII occupancy is similarly decreased in all mutants for all quintiles, demonstrating that the effect of Tfb3 splitting on transcription is not linked to expression level. Second, we generated RNApII occupancy metagenes for TFIID-regulated genes and coactivator redundant (CR) genes. This classification from the Hahn lab (doi:10.7554/eLife.50109) is very similar to the one developed by the Pugh lab (doi:10.1016/s1097-2765(04)00087-5). TFIID-regulated genes are enriched for housekeeping genes and are typically devoid of a TATA box, while the CR genes tend to be highly regulated and to contain a TATA box. As shown in Sup Fig 6C, the effect of the Tfb3 split mutants is similar on both gene classes.

      (2) Determine whether Kin28 abundance in whole cell extracts is reduced by splitting Tfb3, as a factor in reducing its occupancies at gene promoters.

      We actually did test for Kin28 and Ccl1 levels in the extracts when we did the IP experiment shown in Fig 3. We ran the extracts next to the precipitated factors. Unfortunately, as can be seen on the bottom blot, our antibodies were not strong enough to detect either Kin28 or Ccl1 in extracts, even with WT Tfb3. Although we don’t include this inconclusive result in the final paper, we show it in Author response image 1 (note that extracts are labeled as “IgG input”).

      Author response image 1.

      (3) Include the key positive control construct of replacing the C-term of Tfb3 with Kin28 in the experiments of Figure 1.

      We elected not to do this experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but we feel it’s not that useful to pursue it in this paper.

      (4) Provide direct evidence for the claimed dominant negative effect of the N-term-Linker construct by extending results in Figure 2C to compare growth of WT TFB3 cells expressing this construct vs. vector alone.

      We thank the reviewers for this suggestion. We tested this by transforming high copy plasmids expressing the different Tfb3 truncations into cells expressing the WT Tfb3. We did not see a clear dominant negative effect (some colonies were small, but many looked normal). Accordingly, in the absence of a reproducible effect, we removed this claim from the paper. In Fig 2C, the WT plasmid was transformed into cells already expressing the truncation on a high copy plasmid (the opposite order of our new experiment). It’s possible that phenotypes vary depending on which plasmid was there first (2 micron plasmids have variable copy number and can compete with each other for replication and passage during cell division). In any case, in the face of ambiguous results we no longer claim a dominant negative effect of the N-term-Linker protein. This was a minor side-point of the paper and does not affect any of our other conclusions.

      (5) Expand the evolutionary analysis to provide evidence beyond the case of T. brucei that the Tfb3-mediated connection between core and kinase modules is an evolutionary addition to the ancestral state.

      We note that the two papers we cited for the lack of a CAK module in T. brucei reached that conclusion based on purification of its TFIIH complex. We were unable to find similar biochemical studies in other primitive eukaryotes. Another way to expand the evolutionary comparison would be through sequence homology searches. We attempted to do this using various tools available at NCBI and EMBL. These show that Tfb3/Mat1 is found extensively throughout eukaryotes. Unfortunately, because the NTD of Tfb3 is a RING domain, homology searches in primitive eukaryotes yield a number of weak matches in the zinc binding motif, but no way of knowing if any of these are related to TFIIH. Similarly, searches with Cdk7/Kin28 or Cyclin H/Ccl1 pulls up all CDKs and cyclins, with roughly equal statistical similarity to the yeast kinase/cyclin. Someone with more experience with evolutionary analysis would likely have better luck, but our efforts were inconclusive. However, we did find two papers from Guo and Stiller (2004 and 2005) that analyzed genome sequences available at the time and reached the conclusion that both concensus CTD and the CAK module are absent in the evolutionary branch of primitive eukaryotes that contains T. brucei and Giardia lamblia. We also found papers identifying a putative Mat1/Tfb3 in Plasmodium falciparum, although this protein was not yet shown to be associated with TFIIH. We now cite these papers in the discussion of our evolutionary hypothesis.

      (6) Include Western blot analysis of the Tfb3 chimeras and truncations analyzed in Figures 1-2 to determine if poor expression contributes to any of the poor-growth phenotypes.

      The western blot of the Tfb3 fusions used in Figure 1 is shown in Sup Fig 1. The Tfb3 truncations are shown in the Input panel of Fig 3A (although some of these are TAP fusions, the growth phenotypes did not change with TAP-tagging). In general, all the fusions and truncations are detectable but possibly reduced relative to WT Tfb3. Note that the anti-Tfb3 antibody is a polyclonal made against recombinant Tfb3, and we don’t know that the reactive epitopes are distributed equally throughout the protein, so it’s difficult to be confident about relative quantitation with partial Tfb3 proteins.

      (7) Provide direct evidence that the N-terminal Tfb3 segment interacts exclusively with the core TFIIH module and not Kin28, analogous to the opposite results shown in Figure 3B and 4A-B for the C-terminal domain.

      This could be interesting, but we elected not do this experiment due to time and manpower limitations. Since the N-term is unambiguously essential for viability, we can assume it retains at least some interactions with core TFIIH (unless the N-term has some other essential function that hasn’t been discovered).

      (8) Confirm that the Ser5P phosphorylation levels given by the different Tfb3-TAP immune complexes are all much higher than the background level observed with control complexes prepared with extracts expressing WT, untagged Tfb3.

      We should have done this control in Sup Fig 2B, especially since we did pull down the beads from the untagged strain as shown in panel A. We haven’t seen appreciable kinase activity when we’ve done this control in the past, so we feel confident the signals seen are not background. Therefore, we elected not to repeat this experiment.

      (9) Conduct an in vitro reconstitution comparing the activity of free kinase module and intact TFIIH on elongating RNA polymerase II in directing promoter-localized vs. downstream Ser5P accumulation.

      This would be a nice experiment, but would require a substantial amount of work that is beyond our resources at the time.

      (10) Revise the text to better emphasize any novel mechanistic insights afforded by the work and address all other minor comments/criticisms.

      Done, as addressed in all the other comment replies.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors suggest that their results support model 3, in which intact TFIIH restrains kinase activity outside the PIC. Directly testing this model would be a significant addition and would strengthen the proposed mechanism. An in vitro reconstitution comparing the activity of the free kinase module and intact TFIIH on elongating RNA polymerase II (or, at a minimum, purified Pol II) would directly test the mechanism underlying downstream Ser5P accumulation.

      Sup Fig 2 addresses this point to some extent, since we the TAP pull-down of full-length Tfb3 precipitates at least some intact TFIIH, whereas the split C-term TAP constructs do not (as shown in Fig 4). However, this is not a very quantitative assay and we agree with the reviewer that a careful reconstitution, especially in the context of real transcription, would be far better. Unfortunately, this is currently beyond our capabilities. However, in the Discussion we do cite some published data arguing that association of the core TFIIH does have some inhibitory effect on the kinase module. First, in our 2002 MCB paper (Keogh et al., see Fig 7) using a GST-CTD kinase assay, we found that free kinase module (called TFIIK there) was strongly active even with a non-phosphorylatable mutation in the activating T-loop. In contrast, the same mutation inactivated CTD kinase activity in the intact TFIIH. Similarly, the Taatjes lab (Rimel et al., Genes Dev. 2020) found that free CAK was active on multiple substrates that were not phosphorylated by the full TFIIH complex.

      (2) Experiments from Carl Wu's laboratory (Nguyen et al., 2021) showed that there is a significant amount of apparently free Kin28 as well as free TFIIH in cells. Please reference and comment on this when discussing the model, suggesting that TFIIH is mostly sequestered at promoters.

      Good point. We added this to the discussion where we discuss the arguments against a sequestering model.

      (3) The existing ChIP-seq data could be analyzed more thoroughly to extract additional mechanistic insights. Specifically: (i) quantify TSS-proximal vs. gene body Pol II to determine if reduced occupancy reflects initiation defects (ii) analyze whether gene classes (high vs. low expression, stress-induced genes) show differential effects.

      Thanks for the suggestion. We did this and show the results as a new Supplemental Figure 6. No differences were found. Please see our response to the Editor’s comment #1 for a fuller description.

      (4) The complete loss of Kin28 ChIP signal in mutant strains (Figure 5B) could reflect kinase mislocalization or reduced protein abundance. Figure 3B examines TAP-purified material but does not address total cellular protein levels. Examining whole-cell extracts for Kin28 and Ccl1 in all strains would strengthen the interpretation of the ChIP results.

      As described in our response to Point 2 in the Editor’s comments section, we did do this control. Unfortunately, the Kin28 and Ccl1 antibodies were not strong enough to detect these proteins in extracts before precipitation.

      Reviewer #3 (Recommendations for the authors):

      (1) The experiment of Figure 1 should be repeated with a Tfb3-Kin28 positive control or dropped from the manuscript.

      This could be an interesting experiment, but please see our response to Editor comment #3 for why we decided to keep the figure as is.

      (2) Figure 2C legend doesn't mention linker C-term low copy construct.

      Thanks for catching that error. It is now fixed.

      (3) The claim that the N-term linker has a dominant negative effect (Figure 2C) requires direct comparison (growth on the same plate) of TFB3+ cells with and without expression of the N-term linker.

      As detailed in the response to the Editor’s comment #4, we did this test. The results did not support a dominant negative phenotype, so we removed this claim. Thanks for helping us avoid a mistake.

      (4) Page 7, "Supplementary Fig. S4A, B, promoters in green boxes" should read "Supplementary Fig. S5A, B, promoters in green boxes".

      Thanks for catching that error. It is now fixed.

      (5) Readers might be concerned that the ChIP-seq signal observed in Figure 5 and S5 could reflect an artifactual signal over highly transcribed regions. The different distributions of Rpb1, Ser5p, and Ser2p argue against this. This might be worth mentioning in the text.

      Thanks for raising this issue. “Hyper-ChIPpable” genes can be a problem in metagene analysis. We now include the analysis suggested by Reviewer 2 where we separately look at genes with different transcription frequencies. Seeing the same relative patterns regardless of expression level makes us confident that the results are not artifactual.

      (6) p. 12, "the Tfb3 the linker"; "In contrast, The N-term linker"; "suggest" should be "suggests"

      We appreciate the reviewer’s careful reading of the manuscript and have corrected these typos.

    1. eLife Assessment

      This manuscript addresses an important question in clinical neuroscience: the use of the theta/beta ratio as a biomarker of attention deficit hyperactivity disorder (ADHD). The study takes an exceptional "multiverse" analysis approach to show that aperiodic activity differences between healthy controls and people with ADHD are driving the apparent theta/beta ratio differences. From a neuroscientific perspective, this is a critical finding because it has a major impact on guiding research on the diagnosis and treatment of ADHD.

    2. Reviewer #1 (Public review):

      Summary:

      The authors address whether theta/beta ratio /TBR) can be used as a clinical biomarker for ADHD.

      Strengths:

      The data were acquired independently from 2 separate datasets, and there are sufficient subjects for adequate statistical power. The authors applied up-to-date EEG data preprocessing, state-of-the-art feature extraction, and statistical analyses, using a multiverse approach. By testing and comparing all meaningful approaches, defined a priori in the previous meta-analysis, the author convincingly demonstrates that TBR cannot be used as a clinical biomarker, and previous positive results can be explained by interactions between different factors (alpha peak frequency, aperiodic component, age).

      Weaknesses:

      There are no apparent issues with data, separate datasets, large sample sizes, and state-of-the-art data analysis.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines whether the theta-beta ratio as derived from EEG data relates to ADHD diagnoses. To do so, it performs a multiverse analysis across a large number of analytical choices, applied to a large EEG dataset, and corroborated in an additional validation set. The results overall show that the TBR is not a reliable indicator of ADHD diagnosis. In discussing the patterns of results across analytical choices, the authors also demonstrate some key points about what appears to be driving the ratio measures, noting that significant results appear to be driven by choices regarding aperiodic-correction and the use of individualized alpha frequencies, suggesting TBR measures can be affected by these features rather than reflecting theta and/or beta activity.

      Strengths:

      This manuscript addresses a clearly posed and important question in the literature, addressing a longstanding discussion on the relationship between TBR and ADHD, and uses a large dataset and an expansive analysis approach to provide a definitive answer. The strengths of the approach allow for a clear answer, providing a notable contribution to the field.

      Weaknesses:

      I find no notable weaknesses in the current manuscript nor any major issues that I think challenge the key findings of this manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Strzelczyk, Vetsch, and Langer tackle an incredibly important question in clinical neuroscience: the use of the theta/beta ratio as a biomarker of attention deficit hyperactivity disorder (ADHD). The theta/beta ratio is argued to be so reliable as an ADHD biomarker that, in the United States, the Food and Drug Administration has approved its use as a biomarker for ADHD diagnosis. However, there is mounting evidence that the theta/beta ratio is likely not really measuring the relative power between two oscillations - the theta rhythm and the beta rhythm - but rather reflects differences in a singular, non-oscillatory aperiodic process. In this very convincing study, Strzelczyk and colleagues take a "multiverse" analysis approach to show that aperiodic activity differences between healthy controls and people with ADHD are driving the apparent theta/beta ratio differences. While in a vacuum, where a measure is a measure and if it's related to a diagnosis it's still useful no matter what, this distinction might not seem important, from a neuroscientific perspective this is a critical distinction, because the ratio between two oscillations has fundamentally very different underlying physiological mechanisms than aperiodic differences, and this framing has a major impact on guiding research on the diagnosis and treatment of ADHD.

      Strengths:

      While smaller studies and analyses have already hinted at similar results as shown here, the current study's multiverse analysis approach is comprehensive, convincing, and very well done. The large sample size of 1,499 participants is very impressive, as is the use of an independent validation sample of 381 participants.

      Overall, the technical and statistical aspects are very well done: the multiverse approach, the validation set, the resampling methods, and even the shiny apps. The authors should be applauded for being so thorough and making their data and analyses publicly accessible.

      Weaknesses:

      To be clear, I see no breaking weaknesses in the theoretical foundations, methods, statistical analyses, or interpretations. All of my recommendations below are for the sake of clarity, which I believe is especially important because this is such an important paper that many people should read.

      Comments:

      (1) Some figures are mislabeled. For example, Supplementary Figure 1 says (C) are scalp topographies, but those are (A), while (C) shows power spectra, but it's unclear what (C) is. I assume it's only the aperiodic part of the spectrum (oscillations removed)? But it would be better to plot on a log-log scale if so.

      In fact, I recommend showing all spectra on a log-log scale.

      (2) Supplementary Figure 6 is also mislabeled, saying (A) shows age (it does not) and so on.

      (3) In Supplementary Figure 7, is (B) the aperiodic-removed spectrum? The authors are very inconsistent with what they're showing in these spectral plots, and not actually explaining what they're showing: raw spectra, semi-logged or not, aperiodic-removed or oscillations-removed, etc.

      (4) For the HBN data, it is said that, "electrode impedances were kept below 40 kΩ, lower than EGI's standard recommendation of 50 (Net Station Acquisition Technical Manual)." For the validation data: "... electrode impedances were maintained below 5 kΩ." These are big impedance threshold differences. Of course, these recommendations differ by recording system, the use of active electrodes, and so on. But such differences can certainly influence signal-to-noise. The fact that the results are so consistent between them is a strength that perhaps should be explicitly called out.

      (5) The authors cite a lot of foundational / related work here, such as Finley et al, but they should also cite several other highly relevant ones:

      - Saad et al., "Is the Theta/Beta EEG Marker for ADHD Inherently Flawed?", J Atten Disord, 2015

      - Donoghue, Dominguez, Voytek, "Electrophysiological frequency band ratio measures conflate periodic and aperiodic neural activity", eNeuro, 2020

      - Karalunas et al., "Electroencephalogram aperiodic power spectral slope can be reliably measured and predicts ADHD risk in early development", Develop Psychobiol, 2022

      - Donoghue, "A systematic review of aperiodic neural activity in clinical investigations", Eur J Neurosci 2025

    1. eLife Assessment

      This valuable study presents a comprehensive comparison of human and macaque monkey behavior across a range of visual perceptual phenomena. The use of a unified oddball visual search paradigm enables direct cross-species comparison while minimizing task-related confounds. It provides solid evidence that visual perception is largely similar between these two species, with some interesting exceptions. These insights into qualitative and quantitative differences between species are relevant for evaluating macaques as a model organism for understanding human vision.

    2. Reviewer #1 (Public review):

      Summary:

      The authors set out to conduct a behavioral comparison of macaque and human vision across a wide range of visual properties. Such a comparison is critical for evaluating the use of macaques as a model system for understanding human vision and the underlying neural mechanisms. This goal represents a unique endeavour since prior studies have typically focused on only highly specific tasks. While the authors found consistent coarse representational structure for objects, evidence for Weber's Law and amodal completion, there was divergence for mirror image confusion and the use of global or local image properties.

      Strengths:

      There are three major strengths of the study. First, the authors employed a behavioral paradigm (oddball search) that allowed them to test multiple perceptual phenomena without having to train the macaques on the specific type of stimuli tested. Second, humans and macaques could be tested in an identical manner. Third, the authors tested a range of different visual properties and phenomena, allowing a broader comparison between species.

      There are also some weaknesses to the study (described below), but that doesn't change the fact that the authors have demonstrated and validated a novel approach for systematic and comprehensive comparisons of vision across species.

      Weaknesses:

      The weaknesses of the study arise in part because of the breadth of the work. In cases where there are divergences between the two species, it would be helpful to know what might account for such divergence, to have more depth. Is it really a species difference, or could there be a different account? For example, does the difference in mirror image confusion arise because the stimuli were objects that would have been highly familiar to the humans but not the macaques? Further, the authors often used small sets of stimuli (e.g. 8 objects only in the test of object similarity; a small set of highly specific occlusion stimuli), and how well the findings will generalize beyond those stimuli is unclear.

      The authors discuss the implications of training macaques to perform specific tasks on specific stimuli in comparing across species. While I agree that extensive training in monkeys could change perception, it is important to also consider that humans have been extensively trained through the types of visual tasks we conduct throughout our lives, so I'm not sure it is universally true that the best comparison is between humans and untrained monkeys. But this just consideration just highlights the general problem of comparing across species.

    3. Reviewer #2 (Public review):

      Summary:

      The macaque monkey is often considered as the animal model of choice to study the neural correlates of visual perception, due to the close similarities to humans in terms of anatomy, physiology and behaviour (Van Essen and Dierker, 2007; DiCarlo et al., 2012; Roelfsema and Treue, 2014; Picaud et al., 2019; Van Essen et al., 2019; Hesse and Tsao, 2020). Quite some studies have been performed to compare the behaviour of macaque monkeys and humans on visual perception tasks. However, it remains difficult to compare the results of these studies as the methods that are used differ significantly between these studies. Furthermore, behavioural studies of macaque monkeys often involve extensive training as the tasks were relatively hard, making it difficult to compare the results with humans, who generally require very little training. The authors present a set of experiments to compare visual perception between macaque monkeys and humans, using the exact same behavioral task that is easy to learn and therefore requires very little training. As expected, they overall find that the two species behave similarly. However, they find a number of interesting exceptions.

      Strengths:

      A major strength of the current study is the relatively large number of tasks that were tested in the same subjects. This is made possible by using the oddball visual search task, which macaque monkeys can learn very quickly. This means that few trials are sufficient to obtain a significant difference between conditions, minimizing learning effects. Although this type of task has been used in previous studies (Sablé-Meyer et al., 2021), the current manuscript makes better use of the advantages and explains them more explicitly.

      In addition, the study finds a number of interesting differences between macaque monkeys and humans. In particular, while humans can dissociate horizontally mirrored images better than vertically mirrored images, monkeys show no difference between these two conditions (Experiment 4). Also, while humans dissociate images better based on the global shape of a stimulus, monkeys dissociate images better based on local elements of a stimulus (Experiments 5 and 6). Although these findings are largely a replication of previous results, they have not yet been studied together with other tasks within the same individual subjects, and the low number of trials avoids any learning effects.

      Weaknesses:

      A weakness of the study is that while the objects that were used can be considered to be familiar to humans, they are not familiar to macaque monkeys.

      In Experiment 4, humans can be expected to have 3D representations of familiar objects such as a Roman helmet or an office chair. Humans can therefore be expected to have view-invariant representations of these objects, predominantly for rotations around the vertical axis of the object (as movements are most common in the horizontal plane). This can explain why only humans confuse objects more often when mirrored vertically than when mirrored horizontally.

      Similarly, in Experiment 5, humans can be expected to be familiar with abstract geometric shapes such as squares and circles, while monkeys likely are not. This could explain why monkeys find it hard to recognize these geometric shapes in the global shape of the stimuli, even when thin grey lines are drawn to connect the local elements that constitute the global shape (Experiment 6). Instead, the combination of local shapes can be expected to form a texture that might be more easily recognized by the monkeys.

      More generally, as proposed by Fagot et al, it might well be that monkeys tend to conceive stimuli as a combination of low-level visual features, instead of as references to objects in the outside world, as humans have learned to do (Fagot et al., 2010). This line of critique would be relevant to take into account.

      Another weakness could be that only three monkeys are tested, while 24 human subjects are tested. According to some theoretical work, a finding in 3 animals is not sufficient to make a claim about an animal species (Fries and Maris, 2022). However, it seems that the results are largely consistent between the different monkeys. Moreover, the results generally agree with the results from previous literature.

      The conclusions by the authors are therefore largely supported by the results. Some results could be strengthened by additional experiments, or at least a more extensive discussion of the potential weaknesses.<br /> The potential impact of the paper is significant, as a start of a comprehensive comparison of visual perception between humans and macaque monkeys, which can inspire other labs to contribute to. This comparison can also be extended to other animal species (e.g. crows and rodents), as well as to different types of artificial neural networks (Leibo et al., 2018).

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Cherian and colleagues compare visual perception between humans and monkeys using a common oddball visual search task across a battery of perceptual phenomena. By keeping the task constant and varying only the stimulus sets, the authors aim to isolate perceptual similarities and differences between species. Across six experiments, they report that monkeys and humans share similarities in coarse object representations, Weber's law, and amodal completion, but differ in mirror confusion and global/local processing.

      Strengths:

      A major strength of the study is the unified experimental framework. The authors designed the experiments such that the task procedures are identical across conditions and species, differing only in the images shown. This is a significant methodological advantage, as it minimizes task-related confounds that often complicate cross-species and cross-experiment comparisons. As a result, observed similarities and differences can be more directly attributed to perceptual processes rather than differences in training or task demands. This allows for a more comprehensive evaluation of visual perception than is typical in the literature, where individual studies often focus on a single effect with specialized training. The data are carefully collected, and the analyses are systematic and appropriate for the questions posed.

      Weaknesses:

      Despite its strengths, the study is largely descriptive and provides limited mechanistic or theoretical explanation for the observed similarities and differences. While the authors document several convergences and divergences between humans and monkeys, there is relatively little discussion of why these patterns arise or how they relate to existing theories of visual processing. As a result, it is difficult to assess the broader implications or generalizability of the findings beyond the specific task and stimuli used.

      Relatedly, the rationale for selecting the particular set of perceptual phenomena is not fully developed. Some tasks appear motivated by prior work comparing humans and deep neural networks, but it is unclear whether this set constitutes a representative or theoretically grounded sampling of visual perception. Without a clearer justification, it is difficult to interpret the absence or presence of specific effects (e.g., mirror confusion or global advantage) as reflecting fundamental species similarities/differences.

    1. eLife Assessment

      This valuable study examined the geometry of visual object representations across hierarchically organized stages of the mouse visual cortex. The use of large-scale training and recording techniques provides solid evidence for changes along the hierarchy that may contribute to invariant object recognition. These findings, particularly if they could be supported by further analyses and clarifications to rule out alternative explanations, including influences of low-level features on behavior and neural activity, help establish the potential usefulness of the mouse to understand the neural basis of object recognition.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a complex series of studies that measure and explain object recognition in mice. The authors demonstrate that mice are capable of solving an object recognition task, that object identity is decodable in different regions of cortex, and the decodability, to some extent, can be captured by extant theory on object manifolds in deep neural networks. The authors further add some correlational analysis of the time courses of object discriminability to bolster their claim of an object processing hierarchy in the mouse cortex.

      The behavioral and neural data described in this paper are likely to be of interest to the general neuroscience community. That said, I have some issues with the analyses, modeling, and image dataset that I'll detail below.

      Strengths:

      (1) The behavioral work is incredibly cool. Getting mice to solve this task is a real achievement and opens up new avenues for mice as a model for complex visual tasks.

      (2) Similarly, the neural recordings are astounding in their scale.

      (3) This could be the most complete demonstration of a primate-analogous object processing network in the mouse.

      Weaknesses:

      No new weaknesses were noted by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      The paper argues that mice are capable of some view-invariant object recognition and that some of their visual areas (especially LM, LI, and AL) carry linearly-decodable signals that could, in principle, help in this process. Further, it argues that the population code in those areas makes linear decodability easier in two ways (fewer dimensions and a smaller radius).

      Strengths:

      It is very useful to see the performance of the mice in this difficult task, and to compare it to the performance of neurons in the mouse visual system. It is also useful to see analyses of the neural code that seek to understand how the code in some visual areas may be particularly suited to decoding object identity.

      Weaknesses:

      Though the paper has improved from the previous submission, there are still some open questions, especially about whether some lower-level properties of the neurons (such as receptive field location) might explain the differences between visual areas. This and other concerns are outlined below.

      (1) Do the signals from the visual areas outperform or underperform the mice? It is hard to tell, because for mice we get numbers in percent correct (Figure 1e, based on 2 alternatives), whereas for visual areas we get numbers in bits (Figure 2c, where it is not clear whether there are 2 or 4 alternatives). This makes it very hard to compare the two. The authors should provide a statement or figure where readers can compare the two. Also, if the behavioral data are obtained with 2AFC, why not run the analyses as 2AFC too?

      (2) Differences in discriminability across objects (Figure 1f). Are these differences also seen for the model based on the difference of Gaussians? (The authors should add those predictions to the plot.) If so, this could further point to possible low-level explanations. It is already quite interesting that the difference of Gaussians model predicts ~58% accuracy, which is not far from the ~65% accuracy of the mice.

      (3) Similarly, in a later figure about decoding visual cortical activity, the authors should show a similar breakdown by object. Are certain objects more decodable than others?

      (4) Number of neurons. It is wonderful to see so many neurons (489182, i.e., an average of ~15k per mouse). But might the same neurons have been recorded multiple times? Has a tool like ROICat or similar been run to exclude this? If not, that is ok, but the authors should add a sentence in Results to indicate that these are not unique neurons (some neurons may be duplicates or triplicates).

      (5) Retinotopy: "within the same ∼20o area of visual space". This is a useful analysis, but which 20 deg area was considered? Was it the one in front of the mice? This would be surprising, because some of the regions do not cover that area (Zhuang et al, eLife 2017). Was a different area chosen? What are its coordinates in azimuth and elevation? And how does it compare to the region where the stimulus was shown during imaging? The Methods do not explain where the stimulus was placed (only that it was in front of the left eye). This information should be added. Also, the screen covered ~120 deg of visual space (63 cm monitor placed 15 cm away), so the emphasis on a 20 deg area is not clear. The authors should provide a figure showing coverage of the screen by each visual area and the position of the stimuli presented during imaging.

      (6) If during imaging the stimuli were presented slightly above the horizontal meridian, then a possible explanation for the superiority of LM, AL, and LI is that their receptive fields tend to be in the upper visual field, whereas the rest of the higher visual areas tend to have receptive fields in the lower visual field (Zhuang et al, eLife 2017).

      (7) Dimensionality: "number of directions in which this variability is spread". Unless I missed the explanation, the Methods don't provide any information on how the dimensionality is computed. Is it done with cross-validation? If not, noise can be interpreted as having high dimension. There are methods to estimate dimensionality with cross-validation, thus excluding the contribution of noise (e.g., Stringer et al 2019). The authors should confirm that this was done with cross-validation and provide information in the Methods.

      (8) Temporal dynamics: "evidence for temporal integration during a trial". Are there really dynamics in the visual responses that last on the scale of seconds? This would be remarkable. Image recognition is usually thought to be done in 100 ms. The long scales presented here are more likely associated with behavioral responses or state responses, or similar. Might there be different brain state correlates in the different cases? For instance, pupil dilation might be different.

      (9) Methods: "to ensure animals were in an attentive state (eyes clear and open)". This sounds peculiar. Did the mice ever close their eyes? If so, that's a discovery. Mice keep their eyes open at all times, even when they are sleeping. So, using eye closure for online detection of "inattentive states" does not seem to make sense. (Also, and this is a minor point: why stop a scan when the animal is "inattentive"? Wouldn't one want to acquire the associated data for comparison? Is the point to save disk space?). This whole set of statements is a bit concerning.

    1. eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated their new roles in causing the tumor microenvironment in lung cancer. The solid findings of this study provide an essential foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

    2. Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Comments on revisions:

      The author has revised accordingly. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Comments on revisions:

      The authors have addressed my concerns satisfactorily

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We carefully considered each comment from these two reviewers and revised the manuscript accordingly. Below, we provided a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles were crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We have expanded these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we carefully checked the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We have added labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we supplemented the additional immunological experiments and validation based on pathological tissue sections of lung adenocarcinoma patients. In addition, we elaborated on the limitations of our study in the Discussion section and provided reasonable explanations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract includes a lot of abbreviations, which makes it difficult to follow. For example, "IFN" is not defined. And "HRR" is defined but used only once in the abstract. This issue also appears in other parts, such as "OAK" on page 5, line 114; "DFS" on page 15, line 398; and "DSBs" on page 20, line 558. Please try to avoid unnecessary abbreviations.

      Thank you for highlighting this. We have revised the manuscript to minimize the use of abbreviations. Specifically, we have now defined all necessary abbreviations upon first mention (including 'IFN') and have removed or spelled out those used infrequently to ensure the text flows more smoothly for the reader.

      (2) Page 5, line 129, what data type is used in this part analysis?

      We apologize for our negligence. The whole exome sequencing data used here has been added in the revised manuscript.

      Materials and methods, page 6, lines 131-132: “The raw reads (fastq) of whole exome sequencing were pre-processed and trimmed with fastp (Version: 0.23.4) based on default parameters.”

      (3) Page 6, line 138, Add citation for ANNOVAR.

      Thank you for your suggestion. We have added a citation for ANNOVAR in the revised manuscript.

      (4) Page 8, line 211, what cutoff is used to define the significant makers?

      Thank you for your insightful comment. We provided the cutoff used to define significant markers.

      Materials and methods, page 8, lines 213-215: “Differential expression genes for specific clusters were identified using the “FindMarkers” function, with a threshold of |avg_log2FC| ≥ 0.5 and adjusted P-value ≤ 0.01.”

      (5) Page 11, line 276, HEK293T is not a lung cancer cell line. It would be better to label the details of this cell line.

      Thank you for your correction. We have now clarified HEK293T in the text by stating: 'human embryonic kidney cell line HEK293T'.

      Materials and methods, page 11, lines 277-278: “The human lung cancer cell line A549 (#SCSP-503) and the human embryonic kidney cell line HEK293T (#SCSP-502) were purchased from the Type Culture Collection of the Chinese Academy of Sciences, China.”

      (6) Page 16, line 415, what samples and how many individuals were used for the exome sequencing?

      We agree that specifying the sample set is crucial. The exome sequencing was conducted on 2 individuals (four samples). The samples used were tumor tissues (2 samples) and matched blood (2 samples). This information has been clarified in the revised manuscript.

      Results section, page 16, lines 415-416: “Exome sequencing was performed on four samples from two individuals: two tumor tissues and two matched blood samples.”

      (7) Page 17, line 468, Replace "Differently" with "In contrast" (more appropriate for scientific writing).

      Thank you for pointing this out. We agree that "In contrast" is more appropriate for scientific writing. Accordingly, we have replaced "Differently" with "In contrast" in this sentence (Results section, page 18, line 483).

      (8) Page 18, line 489, what is HMG?

      Thank you for pointing this out. HMG stands for High Mobility Group. We have clarified this by writing out the full term upon first mention in the manuscript (Results section, page 19, line 503).

      (9) Page 19, line 527, check the grammar for this sentence.

      We appreciate your careful reading. We have carefully rephrased this sentence to ensure clarity and grammatical accuracy.

      Results section, page 20, line 540: “Based on pseudotime order, we divided trajectories into 10 bins and analyze the activity changes of related features.”

      (10) Page 20, line 541-546. It would be better to split this long sentence into smaller ones.

      Thank you for your insightful comment. We have revised the text, splitting the long sentence into smaller ones for better clarity.

      Results section, page 20, lines 554-559: “MHC class I and II molecules showed increased activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. 4G-I). This pattern was also reflected in the cell density analysis (Fig. 4J). Furthermore, CD8<sup>+</sup> Tcm and Th1 signatures exhibited higher activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. S5F-G). These findings suggest a differential association with CD8<sup>+</sup> versus CD4<sup>+</sup> T cell engagement.”

      (11) Page 20, line 550, remove "." after "of".

      Thank you for catching this. We have removed it (Results section, Page 21, line 563).

      (12) Page 22, line 592, what is "LME"?

      Thank you for pointing this out. "LME" was indeed redundant in the original manuscript, so we have removed it in the revised version (Results section, Page 22, lines 607-609).

      (13) Page 24, line 674, Replace "suggest" with "suggested"?

      We apologize for our negligence. In the revised manuscript, we have replaced "suggest" with "suggested" (Results section, Page 25, lines 691-693).

      (14) Page 35, Figure 1I, Use "B cells" instead of "B".

      Thank you for your detailed review. We have changed to the appropriate label in Figure 1I.

      (15) Page 36, Figure 2H, the statistics and p-value are needed to show.

      Thank you for your suggestion. We have added the statistical analysis for Figure 2H, and the p-values were indicated in the revised Figure.

      Special thanks to you for your kind comments.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) Line 44. In the Introduction section, a brief description of the prevalence of HRD or BRCA1/2 mutations in lung cancer patients should be included to highlight the significance of the study.

      This is an excellent suggestion. We revised the Introduction section (page 3, lines 61-64) to include a brief overview of the prevalence of BRCA1/2 mutations specifically in lung cancer patients. We believe this addition will strengthen the background for readers.

      Introduction section, page 3, lines 61-64: “Among the key genetic mutations that drive LUAD, BRCA1 and BRCA2 mutations (with prevalence rates of approximately 4% and 5%, respectively) have been increasingly implicated in the pathogenesis and progression of lung cancer [9, 13].”

      (2) Line 302-355. There are relatively serious grammatical issues, and substantial revision of the text is recommended.

      We acknowledge the grammatical issues in the original text. We have now carefully revised the Materials and methods section of the manuscript (pages 11-14, lines 277-358) to correct these issues and improve the overall readability. We believe the revised version is significantly improved.

      (3) Line 375. The Results section lacks detailed information on the specific BRCA1/BRCA2 mutations and data explaining how these mutations lead to functional alterations of BRCA1/2.

      Thank you for your insightful comment. In the revised manuscript, we added the amino acid changes caused by the specific BRCA1/BRCA2 mutation sites and expand the text to discuss the predicted and known pathogenic mechanisms of these variants (Results section, page 16, lines 420-433).

      Results section, page 16, lines 420-433: “Exome sequencing data show that these two types of tumor tissues harbor somatic nonsynonymous single nucleotide variants (SNV) in BRCA2 (p.N372H) and BRCA1 (p.E991G, p.S1566G, p.K1136R, p.P824L, and p.Y809H), respectively (Table S1). The BRCA2 p.N372H variant lies within the BRC3 or BRC4 motifs critical for RAD51 binding. It may alter binding affinity, impair high-fidelity homologous recombination repair, and promote genomic instability [39-41]. In BRCA1, mutations are distributed across two key functional domains: the Coiled-Coil domain (e.g., p.E991G, p.Y809H, p.P824L) and the BRCT domain (e.g., p.K1136R, p.S1566G). Coiled-Coil mutations disrupt BRCA1-PALB2-BRCA2 complex assembly, impairing localization to DNA damage sites and subsequent RAD51 recruitment; BRCT domain mutations compromise phospho-protein recognition and G2/M checkpoint control, leading to defective DNA damage response and unchecked proliferation of damaged cells [42-44]. Together, these defects promote the accumulation of genomic scars and chromosomal instability.”

      (4) Line 492-498. Changes in genes associated with BRCA1 and BRCA2 mutations should be validated by immunofluorescence.

      Thank you for your insightful comment. Immunofluorescence would provide valuable orthogonal validation of the protein-level consequences of these mutations. To address this, we obtained pathological tissue sections from patients carrying BRCA1/2 mutations and performed immunofluorescence staining for S100A10, a risk gene associated with BRCA1 mutations. We found that S100A10 was upregulated in BRCA1-mutated tumor tissue compared to adjacent non-cancerous tissue.

      Results section, page 24, lines 673-675: “Immunofluorescence experiments on patient tissue sections revealed that S100A10 was upregulated in BRCA1-mutated tumor tissue relative to adjacent non-cancerous tissue (Fig. S11D-E).”

      (5) Line 538. Although both BRCA1 and BRCA2 deficiencies impair DNA damage repair, BRCA1, but not BRCA2, activates the cGAS-STING pathway. This is a particularly interesting observation and should be validated by immunofluorescence experiments.

      Thank you for highlighting this observation. To address this, we conducted immunofluorescence experiments to quantify STING, the key protein of cGAS-STING pathway, in BRCA1- and BRCA2-deficient tissues to confirm this phenotype. We have included these results in the revised manuscript.

      Results section, page 21, lines 578-584: “Furthermore, our results revealed that BRCA1-mutant tumors showed higher activity of cGAS-STING signaling and STING mediated induction of host immune responses compared to BRCA2-mutant tumors (Fig. 5G and Fig. S6F). Also, cGAS-STING signaling gens, including cGAS, STING1, and downstream factors STAT1 and CCL5, were upregulated in BRCA1-mutant tumor cells (Fig. 5H). This observation was validated through immunofluorescence staining experiments on patient tumor tissue sections (Fig. 5I-J).”

      (6) Line 599. "CD8+ Trm cells were more abundant in BRCA1-mutant sample, whereas CD4+ Trm cells were higher in BRCA2-mutant sample". This part is also recommended to be validated using immunofluorescence or more rigorous flow cytometry analyses.

      We sincerely appreciate this insightful suggestion. To address this, we performed immunofluorescence staining to quantify the abundance of CD8<sup>+</sup> and CD4<sup>+</sup> Trm cells in BRCA1- and BRCA2-mutant tissues. We have included these results in the revised manuscript.

      Results section, page 22, lines 614-617: We identified two tissue-resident memory T cell (Trm) subsets, CD8<sup>+</sup> Trm and CD4<sup>+</sup> Trm, both predominantly derived from tumor tissues (Fig. 6B). “Interestingly, our analysis revealed that CD8<sup>+</sup> Trm cells were more abundant in BRCA1-mutant tumor, whereas CD4<sup>+</sup> Trm cells were more abundant in BRCA2-mutant tumor (Fig. 6B-D, Fig. S7D, and Fig. S8A-B).”

      (7) Line 643-676. The authors identified four risk genes associated with BRCA1 mutations-S100A10, LDHA, MYL12A, and GAPDH; however, MYL12A was not validated in the subsequent in vitro experiments. The authors state that "S100A10 can promote cancer metastasis by recruiting MDSC cells, and increased LDHA activity contributes to tumor immune escape." However, because immune cells were not included in the in vitro assays, these results instead suggest that these genes may directly suppress tumor cell proliferation.

      We thank the reviewer for this insightful observation. Our intention was not to suggest that the reduction in proliferation observed in our in vitro assays was caused by the disruption of immune cell recruitment or immune escape pathways. As the reviewer correctly points out, those mechanisms are irrelevant in a system lacking immune cells. Our results showing that "Knockdown of S100A10, LDHA, and GAPDH reduced LUAD cell proliferation in vitro (Fig. 7D-E)" strongly suggest a direct, cell-autonomous role for these genes in regulating LUAD cell growth. For the MYL12A gene, the existing study have shown that BRCA1 transcriptionally regulates this gene involved in breast tumorigenesis (PMID: 12032322). In view of the characteristics of MYL12A in lung cancer, we will conduct in-depth in vitro and in vivo validation experiments in future studies.

      (8) Line 677. The authors should emphasize the limitations arising from the small sample size and the lack of in vivo validation models in the Discussion section.

      Thank you for highlighting these important limitations. We agree that the small sample size and the lack of in vivo validation are significant limitations of the current study. We have explicitly addressed these points in the Discussion section (page 27, lines 740-750) to ensure the interpretation of our data is appropriately qualified and to provide transparency regarding the scope of our conclusions.

      Discussion section, page 27, lines 740-750: “Although we included both tumor tissues and matched paracancerous and blood samples, the sample size remains modest, which may limit the statistical power and generalizability of our findings. Therefore, our results should be interpreted as preliminary, and further studies with larger, independent cohorts are required to validate these observations. Single-cell RNA-seq and TCR-seq analyses in this study provide high-resolution insights into the cellular and clonal dynamics of the TME, the functional validation of key mechanisms remains largely correlative. While our in vitro experiments provide valuable mechanistic insight, the lack of in vivo validation, which cannot fully recapitulate the complex TME. Future studies utilizing murine models or patient-derived organoids are essential to establish causal relationships and elucidate the underlying molecular pathways.”

      Minor:

      (1) Line 163: cell/μl should be corrected to cells/μL.

      Thank you for catching this. We have corrected it in the revised manuscript (Methods section, page 7, line 165).

      (2) Line 388: Please clarify how the HRD score, tumor mutation burden, and neoantigen load were calculated.

      We thank the reviewer for this request for clarification. In the revised manuscript, we have expanded the Methods section (page 5, lines 117-121) to provide a detailed description of how these metrics were calculated. HRD score was calculated as the unweighted sum of loss of heterozygosity (LOH), telomeric allelic imbalance (TAI), and large-scale state transitions (LST). Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations. The data for these three indicators all obtained from a previous study (PMID: 29628290). We believe these additions provide the necessary transparency and reproducibility for our study.

      Methods section, page 5, lines 117-121: The HRD score was determined by summing specific genomic alterations, including loss of heterozygosity (LOH), large-scale state transitions (LST), and telomeric allelic imbalances (TAI). “Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations.”

      (3) Line 421: BRCA12 should be corrected to BRCA2.

      Thank you for your detailed review. We have revised it.

      (4) The order of Figures 7D and 7E should be reversed.

      Thank you for your insightful comment. According to your suggestion, we reversed the order of Figures 7D and 7E in the revised manuscript.

      Special thanks to you for your kind comments.

    1. eLife Assessment

      This study examines the role of the fungal pathogen Candida albicans in the progression of colorectal cancer, a relevant and urgent topic given the global incidence of colon cancer. While the findings are useful and provide solid experimental work and insight into how Candida may contribute to tumor progression, the small patient sample size, reliance on in vitro models, and absence of in vivo validation may limit its impact. This work will interest scientists studying cancer progression and the role played by pathogens.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed most of the comments raised in the previous round of review.]

      Summary:

      This study addresses the emerging role of fungal pathogens in colorectal cancer and provides mechanistic insights into how Candida albicans may influence tumor-promoting pathways. While the work is potentially impactful and the experiments are carefully executed, the strength of evidence is limited by reliance on in vitro models, small patient sample size, and the absence of in vivo validation, which reduces the translational significance of the findings.

      Strengths:

      (1) Comprehensive mechanistic dissection of intracellular signaling pathways.

      (2) Broad use of pharmacological inhibitors and cell line models.

      (3) Inclusion of patient-derived organoids, which increases relevance to human disease.

      (4) Focus on an emerging and underexplored aspect of the tumor microenvironment, namely fungal pathogens.

    3. Reviewer #2 (Public review):

      The authors in this manuscript studied the role of Candida albicans in Colorectal cancer progression. The authors have undertaken a thorough investigation and used several methods to investigate the role of Candida albicans in Colorectal cancer progression. The topic is highly relevant, given the increasing burden of colon cancer globally and the urgent need for innovative treatment options.

      Strengths:

      Authors have undertaken a thorough investigation and used several methods to investigate the role of Candida albicans in Colorectal cancer progression.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the emerging role of fungal pathogens in colorectal cancer and provides mechanistic insights into how Candida albicans may influence tumor-promoting pathways. While the work is potentially impactful and the experiments are carefully executed, the strength of evidence is limited by reliance on in vitro models, small patient sample size, and the absence of in vivo validation, which reduces the translational significance of the findings.

      Strengths:

      (1) Comprehensive mechanistic dissection of intracellular signaling pathways.

      (2) Broad use of pharmacological inhibitors and cell line models.

      (3) Inclusion of patient-derived organoids, which increases relevance to human disease.

      (4) Focus on an emerging and underexplored aspect of the tumor microenvironment, namely fungal pathogens.

      Weaknesses:

      (1) Clinical association data are inconsistent and based on very small sample numbers.

      We thank the reviewer for this important comment. We have investigated 4 colorectal tumors (2 in early stage and 2 in late stage), and we observed Candida albicans in the 2 late-stage samples while not in the early-stage ones. This result is consistent with TCGA data (which is large-scale) that Candida albicans mainly detectable in the late-stage colorectal tumors (Fig. 1c) and suggests that Candida albicans contributed to colorectal cancer progression, which is the main research direction of this study.

      (2) No in vivo validation, which limits the translational significance.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of in vivo models, our current institutional biosafety protocols and animal facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study. To validate the findings using cell lines, we have performed Candida albicans infection experiments using organoids collected from colorectal cancer patients instead (Fig. 7). We have revised the Discussion section to acknowledge this limitation and clarify that the current work serves as a mechanistic study based on in vitro and ex vivo systems. We have also incorporated references to recent studies demonstrating the in vivo effects of C. albicans in tumor models, which support the biological relevance of our findings.

      (3) Species- and cell type-specificity claims are not well supported by the presented controls.

      We thank the reviewer for this insightful comment. We agree that our current dataset does not warrant definitive conclusions regarding species- or cell type-specificity. Accordingly, we have tempered our claims throughout the manuscript, describing the observed effects as context-dependent across different epithelial models. Specifically, we observed differential responses among the cell lines and epithelial systems evaluated, suggesting variability rather than strict specificity. Furthermore, the Discussion has been expanded to address potential underlying factors, such as variations in EGFR expression levels and other signaling determinants. We have also added a dedicated section to acknowledge this limitation and emphasize the need for future systematic investigations using a more diverse array of fungal species and cell models.

      (4) Reliance on colorectal cancer cell lines alone makes it difficult to judge whether findings are specific or general epithelial responses.

      We appreciate the reviewer’s thoughtful concern. Although most of our mechanistic experiments were performed in colorectal cancer cell lines, we also evaluated our finding across a broader range of epithelial models, including normal human colon-derived organoids and the breast epithelial cancer line MCF7 (Fig. 8). Neither model exhibited HIF-1α activation upon C. albicans exposure, supporting that the hypoxia response we observed might not be universal. Interestingly, the observed response in non-colorectal epithelial cancer lines (e.g., HCC1937, NUGC-3) suggests that this mechanism is not strictly confined to CRC. Based on these observations, we propose that the specificity is likely related to EGFR levels but may involve additional epithelial determinants, which we aim to investigate in future work.

      Reviewer #2 (Public review):

      The authors in this manuscript studied the role of Candida albicans in Colorectal cancer progression. The authors have undertaken a thorough investigation and used several methods to investigate the role of Candida albicans in Colorectal cancer progression. The topic is highly relevant, given the increasing burden of colon cancer globally and the urgent need for innovative treatment options. However, there are some inconsistencies in the figures and some missing details in the figures, including:

      (1) The authors should clearly explain in the results section which patient samples are shown in Figure 1B.

      We thank the reviewer for pointing out this omission. We apologize for the lack of clarity in the initial submission. The patient samples shown in Figure 1B are from the CRC patients with Stage III. We have revised the manuscript to explicitly state this information in the legend for Figure 1B to ensure better clarity for the reader.

      (2) What do a, ab, b, b written above the bars in Figure 1F represent? Maybe authors should consider removing them, because they create confusion. Also, there is no explanation for those letters in the figure legend.

      We thank the reviewer for this helpful comment. The letters above the bars represent statistical groupings from post-hoc multiple-comparison tests (a standard convention used after ANOVA or similar analyses): bars sharing the same letter are not significantly different, whereas different letters indicate statistically distinct groups. We chose this letter-based system over asterisks to avoid the visual clutter and potential confusion that often arise from numerous pairwise comparisons; therefore, we will retain the letter-based grouping. In the revised manuscript, we have explicitly defined this notation in the figure legend to be ease of interpretation for the reader.

      (3) The authors should submit all the raw images of Western blot with appropriate labels to indicate the bands of protein of interest along with molecular weight markers.

      We appreciate the reviewer’s request for raw data. We have now included the raw images of the Western blots in the supplementary materials, with clear annotations of the bands corresponding to the proteins of interest as well as molecular weight markers.

      (4) The authors should do the quantification of data in Figure 2d and include it in the figure.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have quantified the subcellular localization of HIF-1α in PBS-treated versus C. albicans–infected cells shown in Figure 2d. The quantification results are shown in the following figure and provided in Supplementary Figure 3c.

      (5) In Figure 2h, the authors should indicate if the quantification represents VEGF expression after 6h or 12h of C. albicans co-culture with cells.

      We thank the reviewer for pointing this out. We have updated Figure 2h to specify that the quantification represents VEGF expression after 12 hours of co-culture with Candida albicans.

      (6) In Figure 2i, quantification of VEGF should be done and data from three independent experiments should be submitted. The authors should also mention the time point.

      We thank the reviewer for this helpful comment. In the revised manuscript, we have quantified VEGFA fluorescence intensity based on three independent experiments (the other 2 replicates were shown in Author response image 1). The corresponding time point (12 hours of co-culture) has been clearly indicated in the figure legend.

      Author response image 1.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Some of the statements regarding Candida albicans and CRC progression in Figure 1 may be overstated (since the association with stage/survival may be cross-confounded). That is, analyses of survival ought to be stage-adjusted.

      We thank the editor for this important comment. We agree that the association between C. albicans and patient survival may be influenced by tumor stage as a confounding factor. In the revised manuscript, we have moderated our statements regarding the clinical associations and clarified the limitations of these analyses, now presenting these findings as correlative observations rather than causal relationships. We have also noted in the Discussion that stage-adjusted analyses would be required to more rigorously assess the independent contribution of C. albicans to patient outcomes.

      (2) Fan et al. (citation 26) is incorrectly referenced. The paper states that Bacteroides fragilis does not affect Candida albicans colonization. Instead, Bacteroides thetaiotamicron was shown to reduce C. albicans colonization, but B. fragilis was used in the current study as a control.

      We thank the editor for pointing out this error, and we have corrected the citation accordingly. As noted, the referenced study indicates that Bacteroides thetaiotaomicron, rather than Bacteroides fragilis, reduces C. albicans colonization. We selected B. fragilis as a control in this study because it is a prevalent gut commensal and has been previously implicated in colorectal cancer progression. Although prior reports suggest that B. fragilis does not significantly affect C. albicans growth, we observed that co-culture with B. fragilis led to a noticeable inhibition of C. albicans growth under our experimental conditions. This discrepancy may reflect differences in experimental settings. We believe these findings provide additional context for the complex interactions between gut microbiota and fungal pathogens.

      (3) The link between hypoxia signaling is interesting, but for the most part, these experiments were largely done in normoxic conditions, while the colon is generally hypoxic. So I would have encouraged the authors to consider testing the effect of C. albicans presence/absence under low-oxygen conditions, which may be more physiologically relevant.

      We thank the editor for this insightful suggestion. We fully agree that evaluating the effects of C. albicans under hypoxic or anaerobic conditions would be highly relevant to the physiological tumor microenvironment. Although we have attempted to assess the impact of C. albicans on cell migration under hypoxic conditions, we observed that tumor cells exhibited markedly accelerated migration and proliferation, resulting in near-complete wound closure within 24 hours in control groups. This limited our ability to reliably detect differences between conditions using standard migration assays. We agree that in vivo models may provide a more physiologically relevant context to address this question, and we will pursue this direction in future studies when appropriate experimental conditions become available.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 inconsistencies: In Figure 1C, there is no significant difference in C. albicans detection between stage II and stage III CRC patients. In fact, more patients in stage II appear positive, which is inconsistent with Figures 1A and 1B. For Figures 1A and 1B, the sample size (n=2) is too low to support meaningful conclusions. Please also clarify which stage is represented in Figure 1B.

      We thank the reviewer for this important comment. In the revised manuscript, we have clarified the sample information and explicitly stated that the samples shown in Figure 1b are derived from stage III CRC patients. We have also moderated our conclusions and described these findings as exploratory observations. Regarding the apparent inconsistency between Figure 1C and Figures 1a-b, we consider that this discrepancy may be partly due to the small number of clinical samples analyzed in our study. In addition, the TCGA-based analysis relies on transcriptomic data, whereas our analysis is based on immunohistochemistry (IHC). These methodological differences may also contribute to the observed variation.

      (2) Weak link between clinical and in vitro data: The transition from clinical samples to CRC cell line models feels tenuous. While C. albicans may induce hypoxia signaling, it is unclear whether this is specific to CRC cells or could occur in other epithelial cell types. Some broader testing would help strengthen this link.

      We thank the reviewer for this insightful comment. We agree that reinforcing the bridge between clinical observations and in vitro mechanistic findings, as well as clarifying cell type specificity, is important for a comprehensive study. In the revised manuscript, we have clarified that the clinical data provide correlative evidence, while the mechanistic insights are derived from controlled in vitro systems. To address the issue of cell type specificity, we have included additional analyses across multiple epithelial cell models (Figure 8). These results indicate that the response to C. albicans is not restricted to colorectal cancer cells but varies across different epithelial contexts.

      (3) Lack of in vivo validation: The mechanistic findings would be substantially strengthened by in vivo data, e.g., murine CRC models. Without this, the translational impact is limited.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of animal models, our current institutional biosafety protocols and facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study, and better be performed in future studies to validate the mechanisms.

      (4) Figure 8B interpretation: The authors conclude that C. albicans shows the strongest effect on c-Myc and c-Jun activation. However, from the presented blots, the differences compared to other fungi are not obvious. The claim should be toned down or quantified more rigorously.

      We thank the reviewer for this important comment. We agree that the differences in c-Myc and c-Jun activation among fungal species are not sufficiently pronounced to support a strong comparative claim. In the revised manuscript, we have moderated the corresponding statements to avoid overinterpretation.

      (5) Cell type specificity: Since the title emphasizes CRC specificity, the cell line comparison shown in Figure 8 should be moved earlier in the results. This would clarify from the start whether the described mechanisms are CRC-specific or more generalizable.

      We thank the reviewer for this insightful suggestion. We agree that earlier presentation of cell type comparisons would help clarify the scope of the observed effects. We have revised the Results section accordingly: “To evaluate the cell type specificity of this response, we further analyzed additional epithelial cell models, as shown in Figure 8”.

      In this study, we initially identified the effects of C. albicans in colorectal cancer (CRC) cells and therefore focused on establishing the underlying mechanisms in this context. Subsequently, we extended our analysis to additional epithelial cell types to evaluate whether these effects are shared or context-dependent. We believe that this stepwise organization, from detailed mechanistic investigation in CRC cells to broader comparison across cell types, provides a logical and coherent flow for the reader. In the revised manuscript, we have further clarified this rationale in the text to improve readability and interpretation.

      (6) It would be good to use a negative fungi control instead of a PBS control for most of the experiment.

      We thank the reviewer for this valuable suggestion. We agree that a negative fungal control would further strengthen the conclusions. Unfortunately, we were unable to incorporate additional controls during the revision, while we believe that our comparative analysis across multiple fungal species (Figure 8) partially addresses this issue by demonstrating differential signaling responses. Future studies will incorporate appropriate negative fungal controls to further validate the specificity of these effects.

      (7) It is surprising that the Dectin-1 inhibitor shows a smaller effect compared with the TLR2 inhibitor. This result warrants further explanation, as Dectin-1 is a well-known receptor for C. albicans β-glucans. The authors should clarify whether this difference reflects cell type-specific expression (e.g., low Dectin-1 in CRC cells), ligand accessibility, or pathway redundancy, and discuss how it aligns with existing literature.

      We thank the reviewer for this insightful comment. We agree that the relatively modest effect of Dectin-1 inhibition compared to TLR2 inhibition warrants further consideration. In the revised manuscript, we have expanded the Discussion to address this observation. We propose several possible explanations: Firstly, the expression level of Dectin-1 is relatively low in these epithelial cancer cells, thereby limiting its functional contribution. Secondly, differences in ligand accessibility, particularly in the context of fungal cell wall architecture, may influence receptor engagement. Finally, redundancy and cross-talk among pattern recognition receptor pathways compensate for Dectin-1 inhibition. These observations highlight the context-dependent nature of host–fungal interactions.

      Reviewer #2 (Recommendations for the authors):

      All my comments that need to be addressed are given above and below:

      (1) What do a and b represent in Figure 2f? They should be removed or clearly explained in the figure legend, as they are creating confusion for the audience.

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (2) In the figure legend of S3a, the authors mentioned only the Caco2 cell line, whereas in the figure, there are two more cell lines, HCT116 and SW48. The authors should revise the figure legend.

      We thank the reviewer for this comment. We have addressed this point and made the necessary corrections in the revised manuscript.

      (3) The scale bar information is missing for Figure S3b. It should be included.

      We thank the reviewer for this comment. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (4) In Figure 2e, the HIF-1α level in the Caco2 cells at 24 hr time point is a lot higher compared to the level at the 12-hour time point after C. albicans infection. But in the WB quantification in Figure 2f, the level of HIF-1α is not higher when compared to 12hr. Although it is relative data based on control, authors should check this calculation again for any errors.

      We thank the reviewer for carefully examining the data. We have re-verified the quantification and confirmed that the values represent relative protein levels normalized to the corresponding controls at each time point.

      Because samples from different time points were processed and analyzed separately, direct comparison of absolute protein levels across time points is not appropriate. Therefore, relative quantification within each time point provides a more accurate and representative assessment of HIF-1α changes.

      (5) Line 125-127: This sentence should be rephrased.

      We thank the reviewer for this comment. We have revised the corresponding section to improve clarity.

      (6) PHD-mediated ubiquitination is the primary mechanism regulating HIF-1α protein stabilization. The authors should add an appropriate reference here.

      We thank the reviewer for this suggestion. An appropriate reference has been added in the revised manuscript to support this statement.

      (7) The authors claim that they observed that although the total level of HIF-1α increased, the ratio of its ubiquitinated form to total HIF-1α decreased. The authors should clearly indicate in the figure which protein band from the WB image was used for quantification from Figure S3c, which resulted in the graph presented in Figure S3d.

      We thank the reviewer for this suggestion. We have revised the figure legend to improve clarity.

      (8) In Figure 3a, there are some faint grey color lines. These graphs should be reformatted.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (9) What do a and b in the bar graphs shown in Figure 3d,e; S4d,e,f represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (10) What do a,b,c in the bar graphs shown in Figure 4c,d,h represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (11) There are some faint grey lines in the bar graphs shown in Figure 4g. These lines should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (12) Grey line below HIF-1α in the graph shown in Figure h should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (13) The authors wrote - notably, despite treatment with AG1478, the levels of HIF-1α and c-MYC in C.albicans-infected cells remained significantly elevated compared to the uninfected control group (Figure 4b). There is no quantification for c-MYC. Statistics for HIF-1α quantification are missing. These should be added.

      We thank the reviewer for this comment. We have quantified HIF-1α levels, and the results are presented in Figure 4d, including statistical analysis.

      (14) There is no data for knockdown of MYD88, Dectin-1, and SYK as mentioned in the text lines 222-224. The authors should explain this discrepancy.

      We thank the reviewer for this important comment. MYD88, Dectin-1, and SYK are expressed at relatively low levels in HCT116 cells, and our preliminary qPCR analyses indicated that it would be technically challenging to achieve reliable and quantifiable knockdown of these targets. Nevertheless, previous studies have reported that Dectin-1 can be present on the surface of epithelial cells, suggesting that it may still contribute to fungal recognition even at low expression levels. Therefore, given the technical constraints of gene knockdown in this specific context, we reasoned that pharmacological inhibition would provide a more robust approach to suppress this pathway.

      (15) In line 227 in the results section it should be Figure S5c-e instead of Figure S5b-e. Figure S5b results do not match the results that are being explained here.

      We thank the reviewer for this comment. We have corrected the typos in the revised manuscript.

      (16) What do a,b,c in the bar graphs shown in Figure 5 a,b,i represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (17) Was the experiment in Figure 5e done in triplicate? If not, it should be done in triplicate and quantified. The scale bar information is missing for IF images shown in Figure 5e. It should be added.

      We thank the reviewer for this comment. The experiments were independently repeated for three times, and the quantification shown in Figure 5g represents the combined results from these biological replicates. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (18) Lines 273-274 in the results section: Als3 and Hwp1 are known to be involved in the adhesion of C. albicans to epithelial cells, while Ece1 encodes the virulence factor candidalysin. References should be added.

      We thank the reviewer for this suggestion. We have added a reference in the revised manuscript to support this statement.

      (19) What do a and b in the bar graphs shown in Figures 6 f,h,r represent? Since these letters are confusing and are present in several figures, they should be either deleted or clearly explained in the figure legends or text.

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (20) What do a,b, and c in the bar graphs shown in Figure S8 b represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend to of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (21) Scale bar should be added in Figure S9.

      We thank the reviewer for these helpful comments. We have addressed this point and made the necessary corrections in the revised manuscript.

      (22) What do a and b, in the bar graphs shown in Figure S11 represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (23) Were the organoids used in this paper characterized? If yes, how? Also, it should be mentioned in the appropriate section in the manuscript.

      The organoids are not characterized; they are cultured using patients’ samples according to our previous protocols (He et al. Cell Stem Cell 2022).

    1. eLife Assessment

      This paper presents a collection of analyses relating structure and function in the whole-brain Drosophila EM connectome and whole-brain calcium imaging data. The linkage of detailed anatomical structure with population activity is of broad interest in circuit neuroscience in light of increasingly detailed brain maps, but the methods used made the evidence inadequate due to a lack of consideration of neurotransmitter identity and technical issues with the network analysis. The conclusions are useful for specific network observations, but a more thorough analysis of the anatomical and functional data is needed to support the overall claims.

    2. Reviewer #1 (Public review):

      In this revision the authors address some of the points, but they also make some technical errors. My overall view of the manuscript hasn't changed since the original evaluation.

      Previously I noted that SC sparsity presents an issue when comparing to full FC matrices. They authors misinterpreted the Honey et al paper. They resampled ALL entries of the SC matrix (including zeros) from a Gaussian distribution. In effect, this assigns zeros small (but uniform) weights. In Honey et al, the authors resampled only existing edge weights from a gaussian distribution (the rationale at the time was that there might be pushback against the extremely heavy-tailed edge weight distribution). In other words, the zeros are still zeros following this resampling procedure.

      That said, I agree that the log transform is likely useful or necessary given edge weight distributions.

      In short, I still think that the approach is interesting and meritorious, I just don't think the execution is correct.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution. The revision now uses morphological cable length for some analyses rather than straight-line distance, which improves the realism and interpretability of these results.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric. While it is likely that some FC-SC relationship does exist at large scales, it does not in my view justify use of this metric for evaluating noise removal methods, since such methods may inadvertently remove real neural correlates. This concern remains unaddressed in the revision.

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. The authors examine the ratios of input and output neurotransmitters in different defined regions. However, I think it would be more useful to integrate the neurotransmitter information more fully into the assessment of SC, for instance: examining the signed weight (excitatory - inhibitory), or by examining the excitatory and inhibitory networks separately.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view constitute evidence of "similar design concepts."

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.<br /> The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Thank you very much for your comments.

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      The reanalysis of marmoset data is presented in Ext. Data Figure. However, as pointed out by other reviewers, the data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      Thank you for pointing this out. As you mentioned, the SC matrix becomes increasingly sparse as the number of ROIs increases (Ext. Data Fig.2-2b). In contrast, the FC matrix may contain values even when there are no direct connections between ROIs (indirect connections). We conducted an investigation into this issue. To deal with this issue, Honey et al. (2009) [6] resampled the elements of the SC matrix in rank order using a Gaussian distribution and calculated the FC-SC correlation between this resampled SC and FC.

      Ext. Data Fig.2-2a shows a comparison between resampled SC (Honey et al.’s method) and log-scaled SC (our method). Up to 200 ROIs, the proportion of SC matrix elements that are zero is less than 10% (Ext. Data Fig.2-2b), and there is little zero replacement of logarithmic elements. In this situation, replacing with Gaussian arithmetic tends to increase the correlation coefficient (Ext. Data Fig.2-2a). On the other hand, with 10,000 ROIs, where sparsity is extremely high, the proportion of SC matrix elements that are zero exceeds 70%. In this situation, 70-80% of the zeros are randomly assigned from the smaller end of the Gaussian distribution, which causes a lowering of the correlation coefficient (Ext. Data Fig.2-2a, c, d). For these reasons, we believe that log-scaled SC has less bias than resampling with a Gaussian distribution, and conclude that using log-scaled SC as is in this paper is reasonable. Log-scaled SC has also been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. To show that we have considered this issue, Ext. Data Fig.2-2 has been added to the manuscript.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      Author response image 1.

      a, Sparsity rate histogram of SC matrix with cPPSSI (0-0.1) and subsampled null SC matrices corresponding Fig.4e. Red line indicates sparsity rate of SC matrix with cPPSSI (0-0.1). b, Sparsity rate histogram of SC matrix with cPPSSI (0.9-1) and subsampled null SC matrices corresponding Fig.4f. c, Sparsity rate histogram of SC matrix with reciprocal synapse (≤2𝜇𝑚) and subsampled null SC matrices corresponding Fig.4i.

      Thank you for pointing this out. The number of synaptic connections in the SC matrix shows a large difference between those extracted from cPPSSI (0-0.1) and cPPSSI (0.9-1) (Fig. 4e, f). However, when null SC matrices (99) were generated for each and compared with the cPPSSI-extracted matrices, the FC-SC correlation was significantly higher or lower. At this point, since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs (red lines) fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised. The sparsity rates are 0.52 for cPPSSI (0-0.1) and 0.123 for cPPSSI (0.9-1). Since both cases involve comparisons with null SC matrices that have closely similar sparsity rates, we believe comparison using log-scaled SC is appropriate.

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

      Detailed registration flow of Ca data is described in “Preprocessing of D. melanogaster calcium imaging data” in Materials and Methods section (Ext. Data Fig. 1-1a). Then, optimal nuisance factor removal methods and smoothing size were investigated. We used both correlation analysis (FC-SC correlation) and ROC curve analysis (FC-SC detection). Since signals are assumed to be transmitted between regions based on SC, when SC is treated as the ground truth, we considered a pipeline with a FC-SC higher similarity and higher detection to be better. We updated the Results section to include this point.

      Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Thank you very much for your comments.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      Thank you for pointing this out. In many biology papers, the problem is clear, but as you say, this paper starts by comparing the very fine SC and FC of flies, which makes the problem unclear and the results sporadic. We have revised the structure of the introduction.

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      Thank you for pointing this out. We revised the manuscript to separate "hemibrain" and "whole-brain" from "connectome." "hemibrain" and "whole-brain" retain their original meanings.

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      We apologize, but this information was not included in the main figures or the Results section. It is presented in the Methods section and Ext. Data Fig. 4-1i, j. We moved related texts from the Methods to the Results section.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      Thank you for pointing this out. Regarding this point, we think it is difficult to discuss the neuron’s topology in this paper. We defined PPSSI and demonstrated only that unsegregated neurons with mixed pre- and post-synapses are scattered throughout the brain (Ext. Data Fig. 4-2e). Further research is needed to determine the relationship with morphology in individual neurons.

      One possibility is that inhibitory, non-spiking unsegregated neurons, such as CT1 amacrine cell [24, 27, 28] or interneurons in Antennal Lobe [29], may be widely used throughout the brain (WAGN is also a candidate for this). Grimes et al. [34] mentioned “The retina is a beautiful example of a neural network that optimizes signal processing capacity while minimizing cellular cost.” To maintain the signal dynamic range, A17 amacrine cells must optimize the processing units and wiring costs. If one unit equaled one cell, an enormous number of cell bodies would be required, reducing the number of processing units per volume and increasing the energy cost during development. To optimize this, they proposed arranging units capable of parallel processing within a single cell, thereby maximizing the processing units and wiring costs per volume.

      Signal bursts might also occur in the central nervous system (CNS), in which case CNS neurons also require dynamic range adjustment. The concept of optimizing processing units per volume is highly compelling and is thought to apply not only to the retina but throughout the entire brain.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      Marmoset data were reported in [10] and it was not generated for this study. We therefore removed the Ethics Declaration.

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

      Thank you for pointing this out. FlyEM connectome data version 1.2 was downloaded and used as described in Data Availability. This data is provided in the format defined by https://neuprint.janelia.org/public/neuprintuserguide.pdf, and we extracted neurons and synapses from it.

      The entire segmentation body is 28M segmentations, and there were 99,644 Traced proofread neurons. In addition, there were 73M (pre- or post- alone) synapses, 87M records in synapseSets and 128M records in synapseSet-to-synapse. When we extracted post-synapses between Traced neurons, the total number was 21.4M (i.e., connections from Traced neurons to other body fragments like Orphans were excluded).

      The FlyWire dataset (v783) was downloaded from the flywire codex and Zenodo. This dataset contained 139,255 proofread neurons and 54.5M (pair of pre- and post-) synapses, as described in Dorkenwald et al. [13], with 18.8M post-synapses in the regions corresponding to the hemibrain primary ROIs. We have updated the Results and Methods sections by taking into account your comment.

      In Fig. 3b, these images were created using a mask that extended the boundaries of the hemibrain primary ROIs, making the boundaries unclear. Therefore, we corrected the images in Fig. 3b by adjusting the mask so that the boundaries were properly aligned.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Thank you very much for your comments.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Thank you for pointing this out. We believe that the FC in C. elegans uses cell body dynamics, which is different from the synaptic population dynamics in a region of fly calcium imaging or fMRI data (BOLD [Blood Oxygenation Level Dependent] signal). The BOLD signal in a region is thought to correspond to the neurovascular coupling of synaptic population dynamics. Furthermore, compartmentalization of a neuron has been observed in C. elegans (Hendricks et al., 2012)*, showing different dynamics across neuron compartments. Thus, the dynamics of the cell body and the dynamics of the synaptic population in other regions are different in C. elegans. We speculate that there is some relationship between FC-SC between regions, because the FC-SC correlation in the fly brain reached r=0.87 with 20 ROIs (Fig. 2d). We believe that this result is different from the cell body dynamics in C. elegans.

      *Hendricks et al., “Compartmentalized calcium dynamics in a C. elegans interneuron encode head movement,” Nature 487, 99-103 (2012)

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.

      A comparison between FC-SC and neurotransmitter has been written in the Results section. We investigated the ratios of neurotransmitter input (ExtFig.3-2a) and output (Fig. 3f) in each region, and investigated the relationship between this ratio and FC-SC correlation in each neurotransmitter. This revealed significant correlations for acetylcholine (r=0.39, p=0.0013) and GABA (r=-0.25, p=0.046) (Fig. 3g). That is, the higher the percentage of excitatory connections, the higher the FC-SC correlation; conversely, the higher the percentage of inhibitory connections, the lower the FC-SC correlation.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      Thank you for pointing this out. As you say, fiber bundles of DTI and EM connectome are completely different. Nevertheless, the fact remains that the FC-SC correlation is high in both the fly and human brains. As mentioned above, both regional signal from calcium imaging and BOLD signal from fMRI are based on synaptic population dynamics. It was estimated that 43% of the energy consumption in the gray matter is due to synaptic activity of neurons (Harris et al., 2012), and the BOLD signal fluctuates greatly due to this activity. Furthermore, synaptic activity is thought to be much faster than the activity of microglia and astrocytes, so the FC of fMRI is thought to mainly capture the regional correlation of synaptic activity. In other words, in both flies and humans, although the size is different, the pre-synaptic activity in one region and the pre-synaptic activity in another region via neural fibers are being compared in a common manner in the form of FC-SC.

      In addition, non-spiking unsegregated neuron exists in mammals as well, such as the amacrine cell of the retina [34], and even pyramidal cells in the neocortex show local mixtures of pre- and post-synapses (Ext. Data Fig.1-2). If a functional unit is realized by local compartment in a neuron as mentioned in [34], the fly will be a powerful model organism for investigating them, and its functional “design concept” may also be useful for mammals.

      Harris et al., “The Energetics of CNS White Matter,” J. Neurosci., 2012, 32 (1) 356-371

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.

      Thank you for pointing this out. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections to address your comment.

      The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

      As you mentioned, functional analysis has limitations in spatial resolution. In particular, the resolution in the Z axis is 4 μm, which is 1,000 times lower than the resolution of electron microscopy data. This makes it difficult to perfectly match synaptic activity to a synapse in the structural data. Furthermore, spatial smoothing is applied to functional images to absorb inter-individual variability, which can only provide blurred results for group analyses. These are considered limitations of the methods used in fMRI analysis. Despite these limitations, we applied GLM analysis to walking behavior and observed clear inactivity region. This region roughly corresponds to the synaptic cloud of a neuron named WAGN (Fig.5b and c). This neuron also connects to WPNb and ANs in the connectome data, suggesting a possibility that it is related to walking behavior. This is merely a screening reference; therefore, further biological experimentation is needed to pursue this topic.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We should emphasize that the reviewers encouraged revision and resubmission. If the reviewers' comments were to be addressed in full in a revision to strengthen the evidence, this would significantly increase the impact of the findings and the relevance of the work to the fly neuroscience community and to the connectomics field more broadly.

      Thank you very much for your comments.

      Major Issues:

      (1) Structural correlation and functional correlation measure very different aspects of network data, yet a simple correlation between the off-diagonal elements of the two is used. It would be expected that this would not be directly proportional, and it's not clear why this would be a sensible measure. The authors need a better solution for dealing with the zero entries in the SC matrix. Replacing the infinities with zeros and then running the linear regression to get an SC/FC relationship is not appropriate. Even with a better metric, given that both intuition and other studies have shown a weak correlation between FC and SC, using FC-SC correlation as a quality descriptor for other properties is not proper. Furthermore, the authors don't account for neurotransmitter identity in the structural data, which would have strong implications for the relationships between FC and SC.

      Thank you for pointing this out. To investigate this issue we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate is low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the FC-SC relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and Gaussian resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Log-scaled SC has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. When zero replacement is undesirable, using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown). It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      The C. elegans studies presented by Reviewer #3 showed a weak correlation between FC and SC. However, C. elegans neurons do not fire and exhibited different calcium fluctuations depending on the region (Hendricks et al., 2012). This suggested that the cell body and various synaptic terminal regions have different FCs, which is consistent with the objective of our study (neuronal compartmentalization). If a functional unit is locally composed of multiple neurons and synapses, it is expected that SC and FC from that region will show a strong relationship. Larger regions would include multiple functional units, and a relationship between SC and FC would also be found, which is consistent with the results of our study. The C. elegans study compared FC of the cell body (a region) with SC of whole cell (not a same region), which would be inconsistent.

      (2) Synaptic segregation on neurons can be topologically present even if pre- and post-synaptic synapses are present in similar regions of space, as an axon branch and dendrite branch can overlap in space but remain distinct along the arbor. The authors emphasize a region-based definition that does not reflect cellular anatomy. Moreover, the authors do not make an argument for their claim of better robustness of their new synaptic segregation measures.

      Author response image 2.

      Distance calculation for DBSCAN. a, Example synapse pair (black dot) of distance calculation. Red line shows the straight-line distance, and green line shows the morphology-based distance. DBSCAN will places two synapses in the same cluster based on straight-line distance, but they will be in different clusters based on the morphology-based distance.

      Thank you for pointing this out. We changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy. We updated all related figures, such as Figure.4, Ext. Data Figure.4-1, 4-2, 4-3, 4-4, Figure.5h. Also, we updated related text in the Results and Methods sections.

      (3) Reviewers found the overall structure of the paper is difficult to follow, with sections appearing disjoint and the aims of different sections not well described. This extended to the paper organization as well, with the introduction not clearly setting up the questions and being distinct from the results. The manuscript would benefit from a clearer overarching narrative to unify the various analyses.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (4) Similarly, there are several descriptions of data and analysis that are unclear or lacking, including the source of the marmoset data and how the FlyWire synapse was subsampled.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      We have updated the Results and Methods sections regarding the extraction of "traced" neurons and synapses in FlyEM connectome data, and the extraction of post-synapses in hemibrain primary ROIs in FlyWire connectome data.

      (5) Comparisons between FlyWire and Hemibrain have shown many similarities and some clear examples of inter-individual variability. There was concern that technical decisions with handling FlyWire synapse sampling were responsible for some of the differences observed between the datasets.

      In response to Reviewer #2's question, we answered that both FlyEM and FlyWire use proofread neurons and their connecting synapses. We also updated Fig. 3b and the Results and Methods sections.

      Reviewer #1 (Recommendations for the authors):

      The paper is written in an unusual way. It would be helpful if the introduction read more like a standard introduction. Describe the relevant background that the reader needs to understand the results that come later. Frame the experiments in terms of a question or hypothesis. Results should be relegated to the results section (or, if you like, a final paragraph that summarizes the findings). They should not be intermingled throughout the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      The authors must be more attentive in terms of how they construct the segregated/unsegregated connectomes. I suggest exploring various thresholds/bins, but also considering proportionality thresholds that match the number of synapses.

      Thank you for pointing this out. As pointed out by other reviewers, we changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy.

      We also considered about the sparsity rates of the SC matrices. Since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices, shown in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised.

      The authors need a better solution for dealing with the zero entries in the sc matrix. Replacing the infinities with zeros and then running the linear regression to get an sc/fc relationship is not appropriate.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can also be very sparse. It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      It would be useful to include a description of where the human/marmoset datasets came from. It would be useful to describe the processing of those datasets and whether they're comparable to how the fly data was processed.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      The pre-processing of fly calcium imaging data is described in the Methods section. Unfortunately, this processing method is not comparable to that used in humans/marmosets as it was highly customized.

      The authors report sc/fc correlations for the human/marmoset datasets based on single papers. However, in the human case, especially, the strength of sc/fc correlations is highly variable. Not just based on number/size of parcels, but based on amount of data, processing pipeline, single-subject versus group averaged (incidentally, single-subject sc/fc is ‘much’* lower than group-averaged, which has big implications for this study, where the fly datasets are, in essence, N=1 studies).

      Yes, there are numerous FC-SC correlation studies. We think Honey et al. (2009) [6] to be a highly representative study. It showed r = 0.39 to 0.48 for individual participants in 998 ROIs, and r = 0.36 for averaged one, but it increased r = 0.53 excluding absent or inconsistent structural connections. So, single-subject may not be much lower than group-averaged. Since the SC for a fly is an N=1 study, the FC-SC correlation for the same individual cannot be calculated. We think further research will be necessary.

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      Please introduce the term "ROI"

      Thank you for pointing this out. We have revised the Abstract.

      Introduction:

      (1) On a general note: the introduction reads like an extended abstract (i.e., a mix of results and discussion).

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) Line 43: Does this mean FC-SC correlation is higher in flies but not significantly so? Please clarify.

      We performed Mann-Whitney U test and it was not significant (p= 0.2667).

      (3) Line 51: The "confidence" score does not indicate the degree of synaptic detection.

      In the NeuPrint user guide, https://neuprint.janelia.org/public/neuprintuserguide.pdf it states “confidence - The certainty that an annotated synapse is correct and valid.” Since “degree of synaptic detection” may be difficult to understand, we changed it to “certainty of an annotated synapse.”

      (4) Line 59-61: This statement needs refining: post-synapses do not "receive" neurotransmitters, action potentials aren't conducted along nerve fibres.

      We changed “receive” to “sense.” About “action potentials,” we changed “conduct an action potential” to “graded potentials”, and removed “along nerve fibers.”

      (5) Line 61: calcium activity as detected via GCaMP correlates with (electric) neuronal activity - please cite relevant GCaMP literature here.

      We added F. Helmchen and J. Waters, "Ca2+ imaging in the mammalian brain in vivo," Eur J Pharmacol., vol. 447, pp. 119-129, 2002.

      (6) Line 76: "interconnected" is rather vague; just say "many Drosophila neurons are reciprocally connected".

      Thank you for pointing this out. Lin et al., (2024) showed motif analysis and there are many reciprocal, three-node and rich-club connections. However, introduction was updated and this sentence was removed.

      (7) Line 77: comparing unsegregated vs reciprocal synapses is overly simplistic; these are separate features of the same object - i.e., a synapse can be reciprocal and at the same time be segregated in the presynaptic neuron but unsegregated in the postsynaptic neuron.

      Thank you for pointing this out. As you say, the relationship is complicated. In this paper, we are concerned with the degree of segregation of pre- and post-synapses, and we are looking at the segregation within a neuron. In this case, nearby reciprocal synapses (<=2 μm) are included in unsegregated synapses. We have made a correction to the sentence.

      (8) Line 79: I don't understand how we get from unsegregated synapses to local activity.

      Retinal amacrine cells have extensive unsegregated synapses, which provide local feedback inhibition of burst inputs [34]. We changed the text around these descriptions.

      (9) Line 80: What does "more essential function" mean?

      We removed this sentence.

      (10) Line 85: "as shown earlier": Is this based on results in this study or prior work? See also the general above note on mixing results/discussion into the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (11) Line 85-87: I don't understand how the applicability of certain fMRI analysis methods in turn means that functional activity is locally compartmentalized. Did you mean to say something along the lines of "we applied common fMRI methods which showed functional activity is locally compartmentalized"?

      These sentences discuss the commonality between fMRI (BOLD signal) and calcium signal, which both represent presynaptic population dynamics within a local region (voxel). Furthermore, unsegregated synapses are widespread throughout the fly brain (Ext. Data Fig.4-2) and can also be observed in human pyramidal cells (Ext. Data Fig.1-2). Unsegregated synapses suggest local compartment activity [33, 34, 39, 40] and contribute more to functional activity (Fig.4b). Therefore, the similar trend in FC-SC correlation (Fig.2d) between humans and flies suggest that both species exhibit localized compartmental activity via unsegregated synapses throughout the entire brain.

      Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      (12) Line 87: Please provide a reference for "common among various species".

      Thank you for pointing this out. Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      Results:

      (1) Line 91-92:

      (a) Please explain where the calcium data came from, how it was generated, etc.

      We added the data source and a reference (Brezovec et al. [14]).

      (b) Please clarify: what registration method?

      This is not simple. Please see the Methods section and Ext. Data Fig.1-1. This is also indicated in the text.

      (c) "calcium image" → "calcium image data"?

      We changed “calcium image” to “calcium imaging data”.

      (d) What is the "FDA template"?

      This is a brain template created by Brezovec et al. [14]. JRC2018 is a well-known brain template, but it was created by immunostaining postmortem brains and did not fit well with calcium imaging data from living flies. Therefore, we used the FDA template.

      (2) Line 93: Please introduce the term "ROI".

      We added “(Region of Interest)” in Line 38.

      (3) Line 94: Ito et al., Neuron (2014) "A systematic nomenclature for the insect brain" is a better reference for Drosophila neuropils; for the hemibrain, the ROIs were generated to match that original atlas

      Thank you for pointing this out. We added a reference.

      (4) Line 95/96: It is unclear what was used as the basis for the k-means/distance-based clustering

      This was because we wanted to investigate whether nuisance factor removal methods are robust, even for such diverse types of ROI. We added this point to the text.

      (5) Line 120ff: I'm not sure how the total number of ROIs is relevant for comparing flies and humans, given (a) the huge difference in brain size and (b) the difference in resolution of the functional data.

      Indeed, the fly brain and the human neocortex are completely different. We are investigating whether there are commonalities between them using a metric called FC-SC correlation. As described in our answer for (11), both the fMRI (BOLD signal) and calcium signal represent presynaptic population dynamics within a local region (voxel). FC represents the synchronization of synaptic activity between regions, and SC represents the structural connectivity of neurons. Both flies and humans showed high SC-FC correlation and showed similar trends (Fig. 2d), so we believe it would be interesting to investigate this phenomenon.

      (6) Line 123: "by contrast" is misleading here since, as you say, there isn't really a difference.

      We changed “by contrast” to “and.”

      (7) Line 141: I'm somewhat worried that the differences between FlyWire and hemibrain synapse counts are due to the issues mentioned above.

      Thank you for the comment but we are not sure about “the issues mentioned above” is referring to.

      (8) Line 148: There is no evidence that any differences in synapse are due to the resolution or anisotropy (as suggested in the introduction).

      We apologize that we don’t have direct evidence for it. We changed this to the sentence “This may be caused by differences in detection accuracy resulting from the resolution of EM scanning, but not to inter-individual variability.”

      (9) Line 155: References "39,45" have no brackets.

      These are not referencing numbers, but brain regions of Brodmann area 39 and 45.

      (10) Line 155-157: I don't think we can infer the composition of brain areas in humans based on a tenuous correlation in flies; this is highly speculative and really should be in the discussion.

      In humans, there are areas with strong and weak FC-SC correlations [8], which may be due to the E-I (Excitatory-Inhibitory) balance of connections. We investigated this possibility by comparing the correlation between neurotransmitters and FC-SC correlations in the fly brain. We slightly changed this sentence.

      (11) Line 159: I find the first 2-3 sentences in this paragraph confusing. Are you saying that you did all these things in the prior results sections, or that you wanted to look at X and therefore you did Y? Maybe there is an issue with the tense here?

      We changed the sentences around this description.

      (12) Line 161: "whole-brain" = FlyWire?

      We changed “whole-brain” to “FlyWire”.

      (13) Line 163: Please explain the "PPSSI" acronym.

      This is now explained on Line 75.

      (14) Line 165: The description of how the cPPSSI was calculated is hard to follow. For example, what's the "fraction of synapse number".

      We changed our sentences around this description to be clearer. The cPPSSI is the degree of segregation within a cluster and is also assigned to each synapse. The PPSSI is then the average of the cPPSSI values of all synapses in a neuron.

      (15) Line 166: Is there a difference between "cPPSSI" and "PPSSI"?

      Yes, there is. Please see our answer for (14).

      (16) Line 167: "The result showed a histogram resembling a normal distribution" → I suggest running a normality test.

      Thank you for pointing this out. We tested it by Lilliefors test and the result was p=0.001 (significantly not a normal distribution). Since there are numerous values with PPSSI=1, it is not judged to be a normal distribution. We therefore changed this description.

      (17) Line 173: I am somewhat worried about a selection bias in your correlation of segregated vs unsegregated synapses. First, it seems like only a small fraction of neurons are in the 0-0.1 and 0.9-1 PPSSI range. I would suggest running a proper correlation between PPSSI and FC-SC correlation instead of looking at just the two extremes. Second, your examples for segregated neurons (APL + CT1) are large neurons that densely innervate spatially close and functionally very similar neuropils. If the sample of unsegregated neurons consists mainly of these large interneurons, I'm not at all surprised that they contributed strongly to FC-SC correlation.

      Thank you for pointing this out. For this work we investigated synapses (not neurons), extracting those with cPPSSI of 0-0.1 and 0.9-1, and performed a rank text with the FC-SC correlation of random sub-sampled synapses. We aimed to demonstrate that unsegregated synapses in particular, strongly contribute to FC-SC, and we hope to investigate overall trends in a future study.

      (18) Line 185: I don't think the function of reciprocal synapses is "considered to be clear". There are examples of feedback inhibition through reciprocal synapses, in particular in the visual system, but that does not mean that this is true across the board.

      We changed “considered to be clear” to “considered to be clearer than unsegregated synapses.” Of course, the function of reciprocal synapses is unknown for the whole brain, but we think it is more well-studied than unsegregated synapses.

      (19) Line 188 / Figure 4h: that figure panel does not appear to show transmitter pairs.

      Figure 4h (FlyWire) showed transmitter pairs. Ext. Data Fig.4-1g did not, because FlyEM does not have transmitter information.

      (20) Line 192: Please clarify "functionally common".

      We changed our sentences to clarify this.

      (21) Line 199: "ventral nerve code" → "ventral nerve cord".

      We fixed this typo.

      (22) Line 201: I don't think you can use "conversely" here.

      We changed “Conversely” to “Moreover.”

      (23) Line 201: How certain are you that the WAGN neuron is the only candidate? Also, it would be nice to provide the neuron IDs so that people can identify them in the connectome.

      Thank you for pointing this out. We added Root ID: 720575940644632087 in the text. Actually, we found several GABA neuron candidates, such as 720575940637611365, 720575940644632087, 720575940613552947, 720575940640333109 and 720575940612264817. We investigated whether ER1(L) was present in these downstream connections and found that 720575940644632087 had the strongest connection with the largest number of synapses, so we adopted this.

      (24) Line 207: When you say "the left WAGN was strongly connected", are those connections not also present for the right WAGN?

      There is a right WAGN (Root ID: 720575940624377224), but it does not have strong interconnections with WPNb tier 2/3 (left) neurons. For the right WAGN, there are few inputs from WPNb tier 2/3 (left). We added “(left)” in the text.

      (25) Line 212: I don't think you can use "however" here.

      We removed “however.”

      (26) Line 214: "well unsegregated" → "very unsegregated"?

      This sentence was removed, because we recalculated Fig. 5h.

      Ethics Declaration:

      It seems the marmoset data were reported on in [10], so why is there a reference to the generation of the dataset?

      Yes, marmoset data were reported in [10], so we removed the Ethics Declaration.

      Reviewer #3 (Recommendations for the authors):

      (1) In my opinion, the title and framing of this manuscript dramatically overstate the results presented here. Also, the results presented in the different figures in this manuscript seem disjointed and are not very related to each other.

      Thank you for pointing this out. We have rewritten our manuscript slightly to address this. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections.

      (2) There are multiple ways to compute structural correlation matrices-the methods the authors implemented should be discussed in greater detail in the manuscript.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach, used in Honey et al. (2009) [6] and the log-scaled SC approach, used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in fewer zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship in our study. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the Gaussian distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can be also very sparse. The log-scaled SC aprroach has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. It may be possible to compare various methods in-depth, but this is outside the scope of this study and requires further research.

      (3) The use of the FC-SC detection score defined by the authors should be discussed and justified more extensively in the text.

      Thank you for pointing this out. This has already been discussed in [10]. We defined our own “FC-SC detection score,” but we consider the overall approach to be well established in the literature. For example, Stafford et al. (2014) carried out FC-SC detection for 168 mouse cortical regions, and obtained 78.26% sensitivity and 81.69% specificity for the top 1% of SC. Hori et al. (2020) also investigated FC-SC detection for 55 cortical regions of the marmoset brain left hemisphere, achieving an AUC of 0.72. We think FC-SC detection is an index that evaluates the relationship between FC and SC from a different angle than FC-SC correlation and is worthwhile.

      Hori et al., (2020). Comparison of resting-state functional connectivity in marmosets with tracer-based cellular connectivity. NeuroImage, 204, 116241.

      Stafford et al., (2014). Large-scale topology and the default mode network in the mouse connectome. Proc. Natl. Acad. Sci. U.S.A., 111(52), 18745-18750.

    1. eLife Assessment

      This useful study addresses the interesting question of how immune cells recognise infected erythrocytes in malaria. It proposes the parasite protein PfGBP-130 as an interaction partner of the human cell surface protein LFA 1, which could help explain how NK cells recognize infected erythrocytes. The conclusions are partially supported by pull-down and cell-based activation data. However, the overall evidence of direct interaction at the cell-cell interface and downstream effects is incomplete; stronger evidence is required to demonstrate surface exposure of PfGBP-130, as well as a direct role of this antigen in killing.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aim to determine the ligand on Plasmodium falciparum infected erythrocytes for the NK cell integrin, LFA-1, following up on previous evidence that LFA-1 is important for immune cell-mediated recognition of iRBCs.

      They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Fig 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information, as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. As this is the only data which shows a direct interaction between LFA1 and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI, or use additional biophysical methods to assess binding, to obtain more convincing data.

      The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is overinterpreted. Personally, I would remove this data as I did not find it convincing and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also the siRNA reduces the number of cells in gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could. Be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim 'only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      The authors next produce CHO cells with PfGBP on the surface. These cells bind to LFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by addition of antiCD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      In summary, the authors present a set of data which comes together to indicate an interaction between LFA1 and PfGBP on the Plasmodium infected erythrocyte surface. Pulldown studies show convincingly that these two proteins co-precipitate and BLI data suggest that this is direct. Also convincing is that NK cell activation can be reduced using antibodies against either LFA1 or PfGBP, indicating that this interaction does play a role in immune cell recognition of iRBCs.

      Comments on revised version:

      The authors made some minor changes in response to my review, but did not present any substantial new data to demonstrate a direct interaction between PfGBP and LFA1 or to convincingly show differences in NK cell-mediated killing.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used an LFA-1 αI-Fc fusion protein to pull down potential ligands and LC-MS/MS, leading to selection of PfGBP-130 as a potential membrane protein on the surface of infected cells. PfGBP-130 antibodies were raised and used to support the surface localization. This putative ligand interacted strongly with LFA-1 (Kd = 15 nM). A presumed PfGBP-130 ectodomain interacts with monocytes and NK cells but not cells that lack LFA-1. PfGBP-130 antibodies also interfered with NK cell-mediated infected cell killing; the effect, although statistically significant, is modest. The authors propose that NK cells recognize infected cells via LFA-1 interaction with PfGBP-130 exposed on the host cell and that this interaction is critical to initiation of NK cell activation and killing of infected cells.

      Comments on revised version:

      The authors submit a minimally revised manuscript that does not address any of my comments, as itemized here:

      (1) This reviewer suggested immunoblotting with hypotonic lysis and alkaline extraction as a simple test of whether PfGBP-130 is a membrane protein as the authors propose despite PEXEL cleavage that removes a signal peptide they originally proposed to be a TM domain. Instead of performing this simple immunoblot, the authors state that it is unnecessary because their LC-MS/MS of membrane-associated proteins recovered PfGBP-130, it must be a membrane protein. Unfortunately, this is insufficient because the high sensitivity of LC-MS/MS leads to detection of many soluble proteins. (For example, it is almost certain that their LC-MS/MS recovered hemoglobin, which is soluble and not a surface-exposed protein on infected cells.)

      (2) I also suggested a simple immunoblot using a few different immature-stage cultures to detect the full-length and pre-proteins of PfGBP-130 because their immunoblot detected only a 95 kDa band whereas the PEXEL-processed protein is expected to migrate at 85 kDa. The authors state this is unnecessary because their LC-MS/MS of LFA-1 pulldowns enriched for PfGBP-130 and that a single band was detected in immunoblots. This is insufficient because pulldowns often enrich for more than one protein (e.g. some proteins adsorb onto the immunoprecipitation beads or precipitate with beads in certain buffers); immunoblotting often fails to detect some proteins depending on stringency of blocking and wash buffers. They state that the processed form at 85 kDa "may not be well resolved under our current conditions" as a reason not to perform the simple experiment. This reviewer's original statement that P. falciparum antigens frequently cross-react with nominally specific antibodies (with two examples provided in my original review) remains an important concern that would undermine the authors' main conclusion.

      (3) As PfGBP-130 is not essential, a knockout was suggested to more directly test their model given the above concerns. The authors state this cannot be done and that their "multiple orthogonal approaches" suggest it is unnecessary. This reviewer considers this an essential experiment to support a provocative, fundamentally new finding, such as the identification of the NK cell activation ligand.

      (4) This reviewer suggested that the authors add some speculation about why PfGBP-130 is retained in parasites if triggers NK cell-mediated killing and is nonessential. Rather than adding relevant hypotheses to the Discussion, the authors appear to dismiss this suggestion by stating that PfEMP1, STEVOR, and RIFIN are retained despite being nonessential. The problem with this response is that each of these other antigens has a clearly defined role on the surface of infected erythrocytes that benefits the parasite. It is not clear that the authors have considered possible advantages the parasite may gain from exposing PfGBP-130 on the red cell surface.

    4. Reviewer #3 (Public review):

      Summary:

      Malhotra and colleagues present evidence that the integrin LFA-1 on NK cells is a ligand for the Plasmodium falciparum protein GBP130 on the infected erythrocyte surface and that this interaction plays a role in the clearance of infected erythrocytes by NK cells.

      The authors first select a subdomain contained within the CD11a subunit of LFA-1 as a probe to discover possible binding proteins on the infected erythrocyte surface. Parasite-infected erythrocytes stained positively with this probe; the level of staining increased as the parasites progressed through the life cycle. Using the LFA-1-based probe in cross-linking pull-down experiments, GBP130 was identified by mass spectrometry as a co-purifying parasite protein. The N-terminal portion of GBP130 was recombinantly expressed and shown to interact with LFA-1 alpha-I by biolayer interferometry experiments. The full-length extracellular domain of GBP130 was then recombinantly expressed and used to stain primary human NK cells and THP-1 cells. Knocking down LFA-1 by siRNA reduced staining by GBP130. To assess the contribution of GBP130 to the activation of NK cells, CHO cells exogenously expressing GBP130 were incubated with primary NK cells. Transfecting CHO cells with GBP130 led to increased activation of co-incubated NK cells compared to mock-transfected and compared to GBP130 transfected cells, with the inclusion of anti-CD11a to block NK cell adhesion. Finally, CHO cells expressing GBP130 led to increased activation of NK cells compared to mock-transfected CHO cells.

      Overall, although the authors present data from NK cell killing assays that include appropriate controls, the data suggesting a direct interaction between PfGBP-130 and LFA-1 does not include the same necessary controls, for example, the use of blocking antibodies. Most critically, the biolayer interferometry experiments use a recombinant fragment of PfGBP-130, which does not include the residues predicted to be important for mediating specific interaction with LFA1. The biolayer interferometry data instead suggest non-specific interactions between PfGBP-130 and LFA1, as binding does not reach saturation.

      Comments on revised version:

      The authors have addressed all minor concerns, however the major point regarding the biophysical data supporting direct interaction between PfGB130 and LFA-1, in my opinion, has not been satisfactorily addressed. Biophysical data supporting the interaction was generated using a fragment of PfGB130, which does not include residues that the authors predict by structural modelling to be important for the interaction. The authors argue that PfGB130 is a repeat containing protein and may have multiple binding sites for LFA-1. If this is the best mechanistic hypothesis given the current data, the authors need to explain this in the results section.

      Overall though, I agree with Reviewer#1 that the structural modelling results are not convincing and given that the modelling data do not straightforwardly agree with the experiment, the clarity of the manuscript would benefit from their omission.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      We thank the reviewer for this important suggestion. We agree that the uninfected RBC (uRBC) control is critical for interpreting the specificity of LFA-1 αI-Fc binding. In the revised manuscript, we have ensured that these control data are clearly presented and appropriately referenced in the main text; however, we have retained them in the Supplementary Information (Supplementary Figure S1) to maintain clarity and avoid overcrowding Figure 1, while still ensuring their visibility and accessibility to the reader. Importantly, these data demonstrate negligible binding of LFA-1 αI-Fc to uRBCs compared to iRBCs, supporting specificity. We have explicitly stated the parasite strain used (Plasmodium falciparum 3D7) in the Methods section (line 475).

      (2) They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      We agree that cross-linking can introduce potential artefacts. To mitigate this, we included hIgG control pulldown experiments performed under identical conditions. Proteins identified in the control eluate were excluded as background (summarized in Supplementary Table S1). Importantly, PfGBP-130 was the only protein specifically enriched in the LFA-1 αI-Fc pulldown across all three biological replicates (Fig. 2A, Venn Diagram). While cross-linking was used to stabilize transient interactions, consistent enrichment of PfGBP-130 across the three biological replicates precludes any concerns of non-specificity.

      (3) They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      The question pertains to Fig. 2Biii. The IFA images include both infected and neighboring uninfected erythrocytes within the same field. No PfGBP-130 staining is observed in uninfected cells. PfGARP staining, specifically done to verify parasite-infected cell and surface localisation, shows complete resonance with PfGBP-130 staining. This unequivocally shows that the antibodies raised specifically recognise only infected RBCs.

      (4) They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      We sincerely thank the reviewer for highlighting this important concern regarding the BLI data presentation and interpretation. We would like to clarify that the baseline signal prior to ligand addition was subtracted during data processing; therefore, the plotted curves represent the net response following ligand association. However, we agree that this may have obscured the visualization of the association phase. Accordingly, in the revised manuscript, we have re-plotted the data with adjusted y-axis scaling to better capture the association kinetics. In addition, to ensure robustness and reproducibility, the BLI experiments were performed in multiple independent replicates (n ≥ 3) using independently purified protein batches. The original figure showed a representative dataset; we have now included averaged sensorgrams along with standard deviation in the calculated KD values [K<sub>D</sub> = (1.7 ± 0.22) × 10<sup>-8</sup> M] (Figure 2C (i)). These revisions provide a clearer and more accurate representation of the binding interaction.

      (5) The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is over-interpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      We thank the reviewer for this thoughtful comment regarding the modelling analysis. We agree that computational docking and homology-based modelling have inherent limitations and should not be over-interpreted. In our study, these analyses were included strictly as supporting evidence to provide a structural framework for the PfGBP-LFA-1 interaction, while the primary conclusions are based on direct biochemical and functional validation, including pull-down, BLI measurements, receptor knockdown, and cellular inhibition assays. Importantly, the use of docking approaches such as ClusPro, followed by interface analysis and MD simulations, is a widely accepted and routinely used strategy to generate testable hypotheses for protein-protein interactions, particularly when experimental structures are unavailable (e.g., Comeau et al., 2004; Weng et al., 2019). We believe that the current modelling serves as a useful complementary analysis that is consistent with, and supportive of, the experimentally validated interactions.

      (6) They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim' only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      We thank the reviewer for this insightful comment. The apparent discrepancy arises because the flow plots depict the percentage of cells within a defined positive gate, whereas the graphs quantify mean fluorescence intensity (MFI) across the entire population. We have revised figure legend accordingly to indicate the same. Regarding the partial reduction in binding upon LFA-1 (CD11a) knockdown, we agree that this indicates LFA-1 is a major but not exclusive contributor, which is biologically plausible given incomplete siRNA depletion and the known avidity-dependent nature of integrin interactions. Importantly, our conclusion is supported by multiple orthogonal approaches (αI-domain binding, LC-MS/MS identification, BLI, docking, receptor knockdown, and functional blockade). We also appreciate the suggestion of soluble LFA-1 competition, which we acknowledge as an important future experiment. Finally, we have revised the text regarding HEK293T, HepG2, and stem cells to reflect that PfGBP-Fc binding is minimal but not absent, consistent with low/non-expression of LFA-1 in non-immune cells. Overall, we have moderated our claims to state that PfGBP-LFA-1 interaction is a dominant and functionally relevant mechanism, while not excluding additional low-affinity or accessory interactions.

      Figure legend change: Representative flow plots depict the percentage of cells within a predefined positive gate, whereas the accompanying summary graph quantifies fluorescence intensity across the analyzed population. These two metrics report distinct properties of the distribution and are therefore not expected to be numerically identical.

      (7) The authors next produce CHO cells with PfGBP on the surface. These cells bind toLFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      We thank the reviewer for this positive assessment and helpful suggestion regarding figure organization. We agree that the CHO-PfGBP and iRBC-based NK cell activation assays represent conceptually related experiments that both address LFA-1-PfGBP dependent activation using similar readouts. We have retained separate panels to distinguish the reductionist CHO-based system from the physiologically relevant iRBC context. We believe that the combined evidence from both systems strengthens the conclusion that PfGBP-LFA-1 interaction is a key contributor to NK cell recognition and activation.

      (8) The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      We respectfully disagree that the assay should be removed, because flow-cytometric quantification of P. falciparum parasitemia using DNA dyes such as Hoechst is a widely used, accepted, and high-throughput approach for measuring infected erythrocytes and parasite growth, with clear separation of infected from uninfected RBCs and good reproducibility across malaria studies (Dent et. al., 2009; Jang et. al., 2014). Importantly, closely related immune-cell killing experiments in the malaria field have used the same general strategy, co-culture with effector cells followed by flow-cytometric enumeration of parasitemia to infer parasite control, including the seminal NK-cell study by Chen et. al., 2014, which our assay design follows conceptually, and later work showing reduced parasitemia after co-incubation with cytotoxic lymphocytes measured by nucleic-acid dye flow cytometry. We therefore believe the experiment is methodologically valid and directly relevant to the biological question, namely whether disrupting PfGBP-LFA-1 engagement alters NK-cell-mediated restriction of parasite expansion.

      Reviewer #2 (Public review):

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      We thank the reviewer for this important observation regarding PfGBP-130 topology and export. We agree that the presence of a PEXEL motif supports proteolytic processing and that the mature protein may lack a classical transmembrane domain. However, consistent with our model of surface accessibility, we would like to clarify that in an independent proteomic study performed in our laboratory on the membrane-enriched fraction of Plasmodium falciparum-infected erythrocytes, PfGBP-130 was reproducibly identified by LC-MS/MS among membrane-associated proteins (data not shown; can be provided upon request). These findings support the conclusion that, irrespective of the absence of a canonical transmembrane domain, PfGBP-130 is associated with the iRBC membrane compartment, likely via peripheral or protein-complex–mediated interactions, as described for several exported Plasmodium proteins.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors’ controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      We thank the reviewer for raising this important concern regarding antibody specificity. We agree that cross-reactivity among P. falciparum antigens is a known issue and have taken multiple steps to ensure specificity in our study. First, the anti-PfGBP-130 antibodies were generated against a defined recombinant fragment and show no detectable binding to uninfected RBCs and no signal in hIgG control immunoprecipitates, supporting specificity. Importantly, in our LC-MS/MS analysis of LFA-1 αI-domain pull-downs, PfGBP-130 was specifically enriched and consistently identified across replicates, independently validating the target recognized by the antibody. Furthermore, the same antibody detects a single dominant band in both iRBC lysates and αI pull-down fractions, arguing against widespread cross-reactivity. Regarding the apparent molecular weight (~95 kDa), we agree that this likely corresponds to the precursor form, and that a processed form (~85 kDa) may not be well resolved under our current conditions.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      We agree that a PfGBP-130 knockout line would provide a powerful genetic validation of both antibody specificity and the proposed functional role of PfGBP-130 in NK cell recognition. At present, such experiments were not included in this study, and we acknowledge this as an important limitation. However, we would like to emphasize that our conclusion does not rely on antibody-based localization alone; rather, it is supported by multiple orthogonal approaches, including LFA-1 αI-domain pull-down coupled to LC-MS/MS, biophysical interaction analysis, receptor knockdown, and functional blocking assays. In addition, in one of our previous proteomic analyses of the membrane-enriched fraction of infected erythrocytes, PfGBP-130 was identified among the proteins present in the membrane fraction, supporting its association with the iRBC membrane compartment despite lacking a classical mature transmembrane domain.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

      We thank the reviewer for this thoughtful evolutionary question. We agree that if PfGBP-130 enhances NK-cell recognition, its retention likely reflects a context-dependent fitness trade-off rather than a simple benefit or cost. This situation is not unusual in P. falciparum: several exported or surface-associated proteins are retained despite being immunogenic because they also provide advantages in other settings, such as erythrocyte remodeling, cytoadhesion, niche adaptation, immune modulation, or transmission. The clearest precedent is the PfEMP1/var system, in which highly immunogenic surface antigens are nevertheless strongly maintained because they mediate sequestration and in vivo fitness, while antigenic variation limits continuous immune exposure (Chew et. al., 2022). Similarly, other variant surface antigens such as STEVOR and RIFIN are retained despite immune recognition because they contribute to erythrocyte binding, antigenic diversity, and immune evasion or modulation (Niang et. al., 2009; Sakoguchi et. al., 2025). More broadly, many P. falciparum genes that appear dispensable in standard in vitro culture are nevertheless preserved because culture does not recapitulate the selective pressures present in vivo, including splenic clearance, endothelial interactions, immune attack, and within-host competition.

      Reviewer #3 (Public review):

      (1) Anti-GBP130 antibodies are used in the cellular assays to block the interaction between GBP130 and LFA1. They should therefore also block interactions betweenGBP130 and LFA1 recombinant proteins in the biolayer interferometry experiment. Do the authors have data to show this? Similarly, the anti-CD11a antibodies used to block the interaction in the cellular assays should also block the in vitro interaction between recombinant LFA1 and GBP130.

      We thank the reviewer for this insightful suggestion. We agree that demonstrating antibody-mediated inhibition of the recombinant PfGBP-LFA-1 interaction would provide an additional orthogonal validation of the interface. While such blocking experiments were not included in the original BLI dataset, our current study already establishes the specificity of this interaction through multiple independent approaches, including αI-domain pull-down and LC-MS/MS identification, BLI-derived high-affinity binding (KD ~10<sup>-8</sup> M), structural docking, receptor knockdown, and antibody-mediated inhibition in cellular systems. We note that antibody-mediated blocking in a purified biophysical system is not always directly comparable to cellular assays, as epitope accessibility, orientation on biosensor surfaces, and conformational states of integrins (which are known to undergo activation-dependent structural changes) can influence inhibition efficiency. Nonetheless, we fully agree that this represents an important validation experiment.

      (2) The structural modelling analysis of the predicted complex between GBP130 andLFA1 (Figure 2cii) predicts that the majority of the important GBP130 interface residues are located in the region D509-N607. However, the authors present BLI data for the GBP130-LFA1 interaction, which used the N-terminal fragment of GBP (residues 69-270), which does not include the GBP130 residues predicted to be important for the formation of the complex between the two proteins. Could the authors provide an explanation for how an interaction was observed with theGBP130-N fragment, which does not contain the residues predicted to be important for interacting with LFA1?

      We thank the reviewer for this important observation. We agree that the structural model predicts a major interaction interface within the D509-N607 region of PfGBP-130; however, this does not preclude the existence of additional or auxiliary binding determinants within the N-terminal region used in our BLI assays (aa 69-270). PfGBP-130 is a multi-domain, repeat-containing protein, and such proteins frequently exhibit distributed or multivalent interaction interfaces, where individual regions can independently engage binding partners with lower affinity while the full-length protein achieves higher avidity through cooperative interactions. In our study, the BLI data using the N-terminal fragment demonstrate that this region is sufficient to mediate direct interaction with the LFA-1 αI domain, whereas the structural model based on full-length predictions likely captures a dominant or higher-affinity interface in the C-terminal region. Importantly, the interaction is supported by multiple orthogonal datasets, including pull-down/LC-MS/MS, cellular binding assays, and functional inhibition, indicating that the observed binding is not an artefact of fragment choice.

      Author response image 1.

      To further examine this, we performed docking and binding energy analyses comparing the full-length PfGBP-130-LFA-1 complex with the N-terminal domain-LFA-1 complex. Using the PRODIGY server, the predicted binding affinity for the full-length complex was -9.8 kcal/mol, whereas the N-terminal domain complex exhibited a still favorable binding energy of -5.6 kcal/mol. Similarly, HawkDock (v2) analysis yielded binding energies of -22.2 kcal/mol for the full-length complex and -14.1 kcal/mol for the domain-only complex. While reduced relative to the full-length protein, these values remain well within the range of stable protein-protein interactions, supporting the ability of the N-terminal region to independently contribute to binding. These energy calculations take into account all non-covalent interactions. For clarity, hydrogen bonds have been specifically highlighted in the figure to represent key interaction interface.

      (3) There is no section in the materials and methods describing how the BLI was performed; this should be added. The highest concentration ofGBP130 used in the interaction measurements is 1.4uM, almost 100x the measured Kd (0.015uM) for the GBP130-LFA1 interaction. At these high concentrations ofGBP130, I would expect to start seeing saturation of binding, but the interferometry curves show that saturation is not close to being reached. This strongly suggests that the binding of GBP130 to LFA1 is non-specific.

      We thank the reviewer for raising these important technical points. We have included a detailed description of the biolayer interferometry (BLI) methodology in the Materials and Methods section in the manuscript. Regarding the concern about lack of saturation at higher analyte concentrations, we respectfully disagree that this necessarily indicates non-specific binding. In BLI assays, incomplete saturation can arise from several well-recognized factors, including suboptimal orientation or partial inaccessibility of immobilized ligand on the biosensor, mass transport limitations, or heterogeneous binding populations particularly relevant for integrins such as LFA-1, whose αI domain exists in multiple conformational states with distinct affinities. Importantly, the interaction exhibits clear concentration-dependent association and dissociation kinetics that fit a 1:1 binding model with a KD in the nanomolar range, which is inconsistent with non-specific interactions that typically show poor fitting and minimal dissociation. Furthermore, the specificity of the PfGBP-LFA-1 interaction is supported by multiple independent lines of evidence in our study, including selective enrichment in αI-domain pull-downs, absence in IgG controls, reduction upon CD11a knockdown, and functional inhibition by blocking antibodies in cellular assays. We have now clarified these points in the revised manuscript and tempered the interpretation to acknowledge potential experimental constraints of BLI while maintaining that the cumulative data strongly support a specific interaction.

      Minor points:

      (1) For the pulldown experiments, can the authors confirm that cross-linking was also performed for the protein A beads + hIgG control?

      Yes, DTSSP cross-linking was performed identically in the protein A beads + hIgG control arm. This is consistent with the control design described in the manuscript.

      (2) If the recombinant CD11a I subdomain used as a probe is correctly folded and functional, it should bind ICAM1. Do the authors have this data?

      We agree that ICAM-1 binding is an important functional validation for the recombinant CD11a αI probe (Hogg et. al., 1998). The isolated αI domain of LFA-1 is well established as the principal ICAM-1-binding module, and soluble αI-domain reagents have previously been shown to bind/block ICAM-1 interactions. We did not include this control in the current version.

      (3) Were the authors able to perform the reciprocal pull-down, using pfGBP130-N-Fc to pull down LFA1 from cell surfaces?

      We did not perform a reciprocal pull-down with PfGBP130-N-Fc and native cell-surface LFA-1 in the present study; we agree this would be a useful orthogonal experiment.

      (4) After identifying GBP130 as a co-purifying protein in the LFA-1 pull-down experiments, the authors select an N-terminal fragment of GBP130 to recombinantly express and use. How did the authors narrow down which region of GBP130interacted with LFA-1?

      The N-terminal PfGBP130 fragment (aa 69-270) was selected empirically as a tractable, soluble recombinant segment containing a defined repeat-containing extracellular region, rather than because we had already mapped the full LFA-1-binding interface. We agree with the reviewer that our structural model suggests that additional residues, including a likely dominant interface outside this fragment, may contribute to the full interaction, and we have clarified that the N-terminal fragment should be interpreted as a minimal binding-competent region, not necessarily the sole binding site.

      (5) As erythrocytes age, their surface undergoes biochemical changes, most notably a drop in levels of sialylation, decreasing the net repulsive negative charge, and they generally become more adherent. Can the authors exclude the possibility that, rather than binding to a parasite-derived ligand, LFA alpha 1 is instead binding to a marker of older erythrocytes? In the data presented, increased binding of LFA alpha 1 is observed as parasites progress through the life cycle, but the host erythrocytes will be ageing during parasite replication, which could account for the increased levels of LFA alpha 1 binding. To rule out this explanation, data from LFA alpha 1 staining of age-matched uninfected erythrocytes could be provided.

      We agree that erythrocyte aging can alter surface sialylation and adhesiveness, and loss of sialic acid is known to reduce erythrocyte surface charge and increase adhesiveness. However, our data argue against aging alone explaining the signal, because LFA-1 αI-Fc binding was compared with uninfected RBC controls and the interaction led to enrichment of a parasite-derived ligand, PfGBP130, in pull-down/MS analyses.

      (6) Figure 3b(i) Surface staining of THP1 cells was performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. But no accompanying staining data using an anti-LFA1 antibody are shown, so it is not possible to determine whether staining profiles with GBP-130 Fc match staining profiles with anti-LFA1 antibodies. This is important to show what proportion of LFA1-positive cells can recognise parasite-derived GBP-130 Fc.

      (7) Figure 3c(i) Surface staining of peripheral NK cells is performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. Here, as well, there are no staining data using an anti-LFA1 antibody. This would allow a comparison between cell population LFA1 staining with an anti-LFA1 antibody and cell population LFA1 staining with GBP-130 Fc. The two staining profiles should be similar as both probes bind the same surface marker. However, it appears this might not be the case because the staining data using GBP-130 Fc show that only a minor proportion of NK cells (~20%) stain positive, but the majority of peripheral NK cells usually express CD11a, as it is a key adhesion molecule in the formation of immune synapses with target cells. This suggests that GBP-130 can only bind to a subset of NK cells, and if it is binding LFA1, then it can only play a role in mediating the formation of an immune synapse with this subpopulation of NK cells. Could the authors include a comment in the manuscript making clear that the GBP-130 only assists a small proportion of NK cells in adhering to parasite-infected erythrocytes? Are there any reasonable hypotheses as to whyGBP-130 was only able to stain a small subpopulation of LFA1-expressing NK cells?

      For minor comment 6 and 7

      We agree that parallel staining with anti-CD11a would help relate PfGBP130-Fc binding to total LFA-1-positive THP-1 and NK-cell populations. Importantly, LFA-1 expression and ligand binding competence are not equivalent, because integrin binding depends strongly on activation/conformation and avidity state; in NK cells, only a subset can display LFA-1 in a partially activated conformation at baseline despite broader CD11a expression. Thus, a smaller PfGBP130-Fc-positive subset than the total CD11a-positive population is biologically plausible and does not imply inconsistency.

    1. eLife Assessment

      This manuscript investigates inter-hemispheric interactions in the olfactory system of Xenopus tadpoles. Using a combination of electrophysiology, pharmacology, imaging, and uncaging, the transection of the contralateral nerve is shown to lead to larger odor responses in the un-manipulated hemisphere, and implicates dopamine signaling, likely originating from the lateral pallium, in this process. The study convincingly uses a rich and sophisticated array of tools to investigate olfactory coding, and uncovers valuable mechanisms of signaling likely to be conserved across vertebrates.

    2. Reviewer #1 (Public review):

      In this study, the authors investigate responses to methionine in the olfactory system of the Xenopus tadpole. They show that the LFP response is local to the glomerular layer, arises ipsilaterally, and is blocked by pharmacological blockade of AMPA and NMDA receptors, with little modulation during blockade of GABA-A receptors. They then show that this response is translently enlarged following transection of the contralateral olfactory nerve, but not the optic lobe nerve. Measurement of ROS- a marker of inflammation- was not affected by contralateral nerve transection, and LFP expansion was not affected by pharmacological blockade of ROS production. Imaging biased towards presynaptic terminals suggests that the enlargement of the LFP has a presynaptic component. A D2 antagonist increases the LFP size and variability in intact tadpoles, while a GABA-B antagonist does not. Finally, the authors provide anatomical and physiological evidence that the contralateral dopamine signal may arise from the lateral pallium. Overall, I found the array of techniques and approaches applied in this study to be creatively and effectively employed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, the authors investigate LFP responses to methionine in the olfactory system of the Xenopus tadpole. They show that this response is local to the glomerular layer, arises ipsilaterally, and is blocked by pharmacological blockade of AMPA and NMDA receptors, with little modulation during blockade of GABA-A receptors. They then show that this response is translently enlarged following transection of the contralateral olfactory nerve, but not the optic lobe nerve. Measurement of ROS- a marker of inflammation- was not affected by contralateral nerve transection, and LFP expansion was not affected by pharmacological blockade of ROS production. Imaging biased towards presynaptic terminals suggests that the enlargement of the LFP has a presynaptic component. A D2 antagonist increases the LFP size and variability in intact tadpoles, while a GABA-B antagonist does not. On this basis, the authors conclude that the increase driven by contralateral nerve transection is due to DA signaling.

      Overall, I found the array of techniques and approaches applied in this study to be creatively and effectively employed. However, several of the conclusions made in the Discussion are too strong, given the evidence presented. For example, the authors state that "The observed potentiation was not related to inflammatory mediators associated to inury, because it was caused by a release of the inhibition made by D2 dopamine receptor present in OSN axon terminals." This statement is too strong - the authors have shown that D2 receptors are sufficient to cause an increase in LFP, but not that they are required for the potentiation evoked by nerve transection. The right experiment here would be to get rid of the D2 receptors prior to transection and show that the potentiation is now abolished. In addition, the authors have not shown any data localizing D2 receptors to OSN axon terminals.

      Similarly, the authors state, "the onset of LFP changes detected in glomeruli is determined by glutamate release from OSNs." Again, the authors have shown that blockade of AMPA/NMDA receptors decreases the LFP, and that uncaging of glutamate can evoke small negative deflections, but not that the intact signal arises from glutamate release from OSNs. The conclusions about the in vivo contribution of this contralateral pathway are also rather speculative. Acute silencing of one hemisphere would likely provide more insight into the moment-to-moment contributions of bilateral signals to those recorded in one hemisphere.

      We thank the reviewer for their positive evaluation of our manuscript. We agree with their opinion about the necessity of including new experimental evidence to back up discussion and conclusions

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is a creative and careful study, but I felt that the conclusions in the Discussion were too strong. I think these could either be toned down or additional experiments could be done to support the idea that D2 receptors are required for the nerve transection-evoked potentiation, that the source of glutamatergic input is OSNs, and that contralateral interactions are mediated by DA. In particular, I think anatomical stains showing which neurons are carrying the DA signal and whether there is any potentiation of DA release after nerve transection would greatly strengthen the conclusions.

      This new version of the manuscript contains two new figures: 6 and 9.

      New figure 6 addresses the suggestion of this reviewer and provides anatomical evidence for the distribution of dopaminergic neurons in the olfactory bulb of X. tropicalis tadpoles using a tyrosine hydroxylase antibody (mouse monoclonal, Immunostar cat. no. 22941, 1:250; RRID:AB_57226). We identified a discrete neuronal population present in the border between the mitral cell layer and the glomerular layer that resembles the type1 TH+ population described in adult frogs (Boyd and Delaney 2002). TH+ neurons send their processes to innervate olfactory glomeruli and we provide evidence that they contact the GFP lateral glomerulus labelled in Dre.mxn1:GFP X. tropicalis tadpoles (Fig. 6C). These results reinforce a modulatory role for dopamine on glomerular neurotransmission. Materials & methods (lines 152-167), results (lines 393-399) and discussion (lines 550-563) have been modified accordingly.

      Figure 9 provides new evidence on the interhemispheric connections involved in the potentiation of glomerular responses. We first demonstrate that dorsolateral pallial neurons participate in the processing of olfactory information based on the general consideration that the lateral pallium is an olfactory cortex. We confirmed this possibility by stimulating the olfactory epithelium and recording ipsilateral calcium transients in pallial neurons of tubb2b:GCaMP6s tadpoles. We next injured the dorsolateral pallium and 24-48h afterwards we recorded odor-evoked responses in the GFP labelled glomerulus located contralaterally. We observed a ~70% potentiation of responses, which was comparable to the ~75% potentiation obtained by olfactory nerve transection. These results illustrated the involvement of pallial neurons in the control of glomerular output by likely modifying the activity of TH+ neurons. The results (473-506) and discussion (569-576) now include these new results.

      Does the contribution of DA signalling change across development? I think this would be helpful to interpret the results and relatively straightforward to do: apply raclopride at different developmental stages and measure how much potentiation occurs at each stage.

      This is indeed an interesting point, but conducting a comprehensive study of dopamine release throughout development would require a substantial amount of work and delay the publication of this paper. To perform these experiments, we should first implement new technical approaches, such as successfully injuring young tadpoles or recording from late premetamorphic stages. We believe that the proposed experiments could define a new line of arguments rather than complement the present work. Nonetheless, we acknowledge the suggestion of this reviewer.

      In this new version, we provide strong evidence for dopamine release in the glomerular layer, and a key question that arises is the nature of TH+ positive neurons. Recent findings obtained in mice show that there are five different types of dopaminergic interneurons present in the olfactory bulb (Kosaka, Pignatelli, and Kosaka 2020), and important functional differences exist between axon-bearing and anaxonic neurons (Dorrego-Rivas et al. 2025). This evidence suggests a key role for development. A completely new study based on transgenic X. tropicalis displaying labeled TH+ neurons could bring together development, anatomy, and physiology to gain an understanding of how dopaminergic signaling shapes glomerular function.

      In addition, there are several places where showing additional raw data in the figures and carefully quantifying variability would be helpful. For example, in Figure 3B, the authors should show equivalent raw traces from intact and transected tadpoles. In Figure 5D, it would be helpful to show raw traces for LFP equivalent to what is shown for presynaptic imaging in Figure 5E. In Figures 6E-F, it would be helpful to show raw traces.

      Thank you for this suggestion. The examples have been added to the figure panels.

      I found the last experiment with photobleaching somewhat inconclusive, and I am not sure what it adds to the study as presently written. Line 418: Please quantify how many OSNs remained. Line 423: What is the hypothesis for the source of variability?

      The goal of this experiment is to investigate the participation of chemotopy in the potentiation induced by contralateral injury. The elimination of 30-50% of topographically related OSNs did not alter contralateral glomerular responses. This evidence suggests that chemotopy was not relevant to the gain of function observed ; however, we cannot completely rule out a certain topographical contribution, as it was not possible to completely silence all inputs of the studied glomerulus. We now link these findings to the likely innervation of several glomeruli by TH+ neurons, which suggests the absence of a one-to-one glomerulus relationship. LFP amplitudes and their variance are now illustrated in box plots to highlight the absence of significant differences. Lines (457-471).

      An increase in the variance among the recordings obtained is a consistent empirical observation. Although it is a hallmark of the potentiation recorded, we cannot provide a mechanistic explanation. Considering that neurotransmitter release from OSN axon terminals is normally inhibited by dopamine, we hypothesize that disinhibition drives an increase in release probability , leading to larger variations in glutamate release. Such variations could be reflected in the amplitude of LFP negativities.

      It would be helpful to include a measurement of LFP over time so we have some idea of how stable the odor delivery is.

      The amplitude of LFP responses was stable for >30 min. Figure 3B shows recordings obtained during 30 min and new Figure 7F over 42 min. We believe that these examples illustrate that the amplitude, as well as kinetics of the responses obtained were consistent over the period studied.

      Line 227: Small upward deflection - could this be an electrical artifact? Can you run the stimulus delivery with no odor (say, with water) to see if you get the same signal?

      We do not know the precise source of this upward deflection. It is not an electrical artifact related to stimulation, which is sometimes evident (Fig 7A, methionine application). When present, it occurs after the activation of OSNs. One possibility is that the deflection originates in the layer of nerve fibers reflecting some aspect related to the conduction of APs and the relative position of the electrode. Interestingly, some recordings of LFP responses at the level of glomeruli carried out in rats also show a positive deflection (see Figs. 1B, 2A, 3B in (Lecoq, Tiret, and Charpak 2009), thus suggesting it is an intrinsic characteristic of this type of recordings.

      Line 237-239: I wasn't clear from the text whether this was a variation due to development, to transection, or natural variability.

      We now indicate that the relationship reflects normal development (lines 261-264).

      Line 521: N-type VGCCs: can these be targeted with pharmacology to strengthen the argument?

      We acknowledge this suggestion but we have not carried out these experiments as we believe that the interpretation could be complex due to the high density of synapses present in glomeruli and the likely involvement of other types of VGCCs in neurotransmitter release.

      Small issues:

      (1) Line 190-196: Some of this could potentially be moved to the Discussion section.

      These are some arguments to defend the validity of our experimental approach to record the response of the lateral glomerulus labeled by GFP. If we move them to the discussion, the information related to the spatial extent of our recordings would be split between results and discussion. We believe that the current format of the paper allows to focus the discussion on the interpretation of the results obtained.

      (2) Line 268: exponential recover phase.

      Thanks. Corrected.

      (3) Line 278: affected to -> arises from

      Thanks. Corrected.

      (4) Line 282: affect to -> can affect.

      Thanks. Corrected.

      (5) Line 403: 2Phatal technique: Please state briefly what this is

      It is now indicated: two-photon chemical apoptotic targeted ablation (2Phatal).

      NOTE:

      During the revision of this manuscript we realized that Figures 3C and 4B indicated mean±SD. The panels have been amended to show mean±s.e.m.

      References

      Boyd, J. D., and K. R. Delaney. 2002. "Tyrosine hydroxylase-immunoreactive interneurons in the olfactory bulb of the frogs Rana pipiens and Xenopus laevis." J Comp Neurol 454 (1):42-57. doi: 10.1002/cne.10428.

      Dorrego-Rivas, A., D. J. Byrne, Y. Liu, M. Cheah, C. Arslan, M. Lipovsek, M. C. Ford, and M. S. Grubb. 2025. "Strikingly different neurotransmitter release strategies in dopaminergic subclasses." Elife 14. doi: 10.7554/eLife.105271.

      Kosaka, T., A. Pignatelli, and K. Kosaka. 2020. "Heterogeneity of tyrosine hydroxylase expressing neurons in the main olfactory bulb of the mouse." Neurosci Res 157:15-33. doi: 10.1016/j.neures.2019.10.004.

      Lecoq, J., P. Tiret, and S. Charpak. 2009. "Peripheral adaptation codes for high odor concentration in glomeruli." J Neurosci 29 (10):3067-72. doi: 10.1523/JNEUROSCI.6187-08.2009.

    1. eLife Assessment

      This important study addresses the unresolved and long-debated question of whether atypical protein kinase C is required for the maintenance of synaptic potentiation and long-term memory. The convincing results confirm previous findings that persistent activity of PKMζ is required for lasting potentiation of hippocampal synapses and spatial memory. The study also adds new genetic evidence to support the earlier suggestion that enhanced expression of PKC iota/lambda compensates for the genetic reduction of PKM zeta to support synaptic potentiation and memory.

    2. Reviewer #1 (Public review):

      Summary:

      The authors convincingly demonstrate that when PKMzeta is genetically deleted from the hippocampus, the related atypical PKC, PKClambda is upregulated and compensates both neurophysiologically and behaviorally for the missing PKMzeta. Specifically, the upregulatiion of PKClambda supports late-phase hippocampal long-term potentiation (L-LTP) and long-term spatial memory in the PKMzeta knockout mice.

      Strengths:

      The study uses up-to-date transgenic techniques to alter the expression of the two atypical PKCs. The synaptic and behavioral experiments are well-controlled and appear to have been carefully executed.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      Summary:

      The authors significantly advance understanding of the role of unconventional PKC's, PKCM𝛇 and PKC𝜄/𝝀 in maintenance of late-phase LTP. Their results help to clarify the interplay between "structural" and "biochemical/enzymatic" mechanisms of LTP and learning in the hippocampus.

      Strengths:

      A strength is the use of state-of-the-art conditional knock-outs of PKCM𝛇 and PKC𝜄/𝝀 to confirm that PKC𝜄/𝝀 compensates for KO of PKCM𝛇 in the hippocampus to maintain long-term potentiation even when PKCM𝛇 is conditionally knocked out in the adult. The authors use both electrophysiological and behavioral methods to assess the effects of genetic manipulations on late-phase LTP and long-term memory. The authors present an informative discussion of the possible molecular mechanisms that may enable compensation by PKC𝜄/𝝀 for KO of PKCM𝛇 in the hippocampus. They correctly emphasize that the notions of "structural" and "enzymatic" mechanisms for maintenance of LTP are not mutually exclusive. With this publication, the experimental case for a role of PKCM𝛇 in maintenance of late-phase LTP is now quite strong.

      Weaknesses:

      There are no significant weaknesses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An ongoing controversy in the field of learning and memory is the specific neural mechanism that maintains long-term memory (LTM). A prominent hypothesis proposed by Sacktor and Fenton and their colleagues is that LTM is maintained by the ongoing activity of the atypical PKC isoform PKMζ. Early evidence in support of this hypothesis came from experiments showing that an inhibitory peptide, ZIP, whose activity was purported to be specific for PKMζ, blocked late-phase hippocampal LTP (L-LTP) and LTM. However, in 2013, two articles reported that LTM was normal in PKMζ knockout mice and that ZIP erased LTM in the knockout mice, indicating that ZIP lacked specificity for PKMζ. In response, Sacktor and Fenton and colleagues reported in 2016 that in PKMζ null mice, there is an increase in the expression of PKC𝜾/𝛾, a related isoform of atypical PKC, and this increased expression can compensate for PKMζ; their data indicated that the upregulation of PKC 𝜾/𝛾 mediates L-LTP and LTM in the PKMζ. In the present article, the authors provide additional support for this idea. They replicate the finding of an upregulation of PKC 𝜾/𝛾 expression in the hippocampus of PKMζ knockout mice; in addition, they show that the expression of several other PKC isoforms is upregulated in the knockouts. They find that down-regulation of PKC𝜾/𝛾 expression in the hippocampus using the Cre-LoxP technology, the 2016 paper merely used an inhibitor to block the activity of PKC𝜾/𝛾-blocks L-LTP. Finally, the authors demonstrate that, although LTM is preserved in the single PKMζ knockout mouse, it is eliminated in the PKMζ/PKC𝜾/𝛾 double knockout mouse.

      Strengths:

      The experiments appear to have been carefully executed, the results reliable, and the paper well-written. Overall, the article provides significant additional support for the idea that the activity of PKMζ is critical for the maintenance of hippocampal L-LTP and LTM. The article uses genetic methods, rather than simply pharmacological ones, to demonstrate that when PKMζ is genetically deleted, PKC𝜾/𝛾, compensates for the missing PKCζ.

      Weaknesses:

      The paper sets up what I believe is probably a false dichotomy between a structural explanation - a change in the number of synaptic connections among neurons - and the persistent kinase activity explanation for memory maintenance. Why are these two explanations necessarily antithetical? It is possible that an increase in synaptic connections and the ongoing activity of PKMζ both contribute substantially to memory maintenance. The authors certainly don't provide any evidence that the number of synapses in the hippocampus remains unchanged after the induction of L-LTP or LTM. Indeed, I see no reason why persistent PKMζ activity could not be a mechanism for the maintenance of an enhanced number of synaptic connections following the induction of LTP/LTM. To the best of my knowledge, this possibility has not yet been explored. Consequently, I don't see why the present results would lead one to favor a biochemical explanation over a structural one for memory maintenance. Given the significant experimental evidence that LTM involves persistent structural changes in neurons, both explanations are equally plausible at present.

      As requested, we eliminated the discussion of a dichotomy between structural and biochemical mechanisms of long-term memory in the Abstract and Introduction. We now briefly address the relationship between the two hypotheses, which are not mutually exclusive, in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors are attempting to advance understanding of the role of unconventional PKCs, PKCM𝛇, and PKC𝜄/𝝀 in maintenance of late-phase LTP. Their results help to clarify the interplay between "structural" and "biochemical/enzymatic" mechanisms of LTP and learning in the hippocampus.

      Strengths:

      A strength is the use of conditional knock-outs of PKCM𝛇 and PKC𝜄/𝝀 to assess the role of these two enzymes in maintaining long-term potentiation and in compensating for each other when one of them is conditionally knocked out in the adult.

      Weaknesses:

      The paper is extremely difficult to read because the abstract does not clearly state the advances made over earlier studies by the use of conditional KO mutation. For example, in line nine of the abstract, the authors state, "Here, we found PKC𝜄/𝝀 persists in LTP and long-term memory when PKM𝛇 is genetically deleted." This is confusing because it sounds as though the experiments have repeated earlier published experiments in which the gene encoding PKM𝛇 is deleted in the embryo. The authors are not clear throughout the manuscript that they are using conditional KO of the two enzymes in the adult animal, rather than deletion of the gene. The term "genetically deleted" does not mean "conditionally deleted in the adult." The final sentences of the abstract are: "Whereas deleting PKM𝛇 and PKC𝜄/𝝀 individually induces compensation, deleting both aPKCs abolishes hippocampal late-LTP. Hippocampal 𝜄/𝝀-𝛇 -double-knockout eliminates spatial long-term memory but not short-term memory. Thus, in the absence of PKM𝛇 , a second persistent biochemical process compensates to maintain late-LTP and long-term memory." These sentences do not convey a clear logical conclusion. The Discussion does a better job of stating the importance of the experiments.

      We have clarified the genotypes of the mice in the abstract and throughout the text.

      Reviewer #3 (Public review):

      Summary:

      The manuscript addresses an important, yet unresolved and long-debated, question: whether atypical protein kinase C is required for the maintenance of late-long-term synaptic potentiation (L-LTP) and long-term memory (LTM). The authors confirm previous findings that persistent activity of PKMζ is required for hippocampal L-LTP and spatial memory. They demonstrate that genetically deleting PKCι/λ and PKMζ individually induces compensatory upregulation, whereas deleting both atypical PKCs abolishes hippocampal L-LTP spatial long-term memory. The study uses an elegant combination of immunoblots, electrophysiology, and behavioral assays. The use of Cre-recombinase to target specific hippocampal regions and neurons adds to the rigor of the findings.

      Strengths:

      The manuscript addresses an important, yet unresolved and long-debated, question; whether PKMζ is required for the maintenance of L-LTP and LTM. The study demonstrates that PKCι/λ, which was previously shown to be critical for the initial generation of the early phase of LTP and short-term memory, becomes persistently active in L-LTP and LTM in a PKMζ knock-out model, compensating for the loss of PKMζ. Furthermore, when the compensation mechanisms are eliminated by simultaneous deletion of both PKMζ and PKCι/λ, maintenance of LTP and long-term spatial memory, but not of short-term memory, is diminished. The strength of this study is that the authors used a double-knockout strategy to directly address the controversy concerning the roles of PKMζ in memory formation. By showing that PKCι/λ compensates when PKMζ is deleted, the authors provided a compelling explanation for previous contradictory findings.

      Weaknesses:

      (1) The authors should provide the numerical values for all data.

      (2) It appears that blind procedures were only used for the behavioral experiments. Some explanation is warranted.

      (3) The description of the immunoblotting procedures lacks sufficient detail. The authors state that immunoblots were stained with multiple antisera to visualize multiple PKCs on the same immunoblot. To conserve antisera, the immunoblots were cut to isolate the relevant proteins based on molecular weight. Isoforms with similar molecular weights were either stained with antisera of different species or on separate blots. Despite this explanation, it is unclear how immunoblotting was performed in practice. For example, in Figure 1B, the authors compared the changes of four conventional PKC isoforms. Because all four antibodies are mouse monoclonal antibodies recognizing proteins of similar molecular weights, each probing should presumably have its own actin loading controls. However, these controls are missing from the figure. Some clarification is warranted.

      (4) The statement in the legend to Figure 4B, that the increases of maximum avoidance time from pretraining to trial 1 are not different, indicates both groups of mice successfully established short-term memory, which is not correct. The analysis only reveals that there is no difference between the two groups. No differences could be due to both groups learning the same, as the authors suggest, or alternatively to no learning in either group.

      (5) The labeling on some of the illustrations (e.g., Figure 2B) is unreadable.

      (6) In Figure 4B, only the single statistical comparison between "pretaining" and "1 trial" is shown. The other comparisons described in the legend should also be illustrated.

      (7) There is no documentation to support the statement that "The prevailing textbook mechanism for how memory is retained asserts that stable structural changes at synapses, the result of initial protein synthesis and growth, sustain memory without the need for ongoing biochemical activity dedicated to storing information" or for the statement in the Discussion that the structural model of memory storage is the standard account.

      (1) Numerical data used in statistical analyses are now provided for LTP experiments in Figure 4 figure supplement 1. Numerical values for all other experiments are presented in the figures.

      (2) Blind procedures were performed for all experiments except for LTP experiments that involved the transfection of eGFP as control, as the eGFP could be detected visually in the hippocampal slice by the experimenter. This is now clarified in the Statistics section of the Methods.

      (3) The description of immunoblotting was clarified in the Methods, and actin loading controls presented for all immunoblots in Figure 1 and Figure 1 figure supplements 1 and 2.

      (4) Short-term memory (Figure 5B) is now determined by 2 methods. First, we show that for both groups the times to enter the shock zone increase in the first training trial, as compared to the pretraining session with the shock off. The increases are not different between the groups. Second, we show increases of the maximal avoidance time from pretraining to trial 1 for both groups are significant, and that the increases are not different. These data show that short-term memory was present in both groups and not measurably different between the groups.

      (5) The fonts of the figure labels were enlarged.

      (6) The comparisons between pretraining and training trial 1 and between training trials 1 and 3 for the two groups are now shown in Figure 5B.

      (7) We abbreviated our discussion of the structural model, which is now presented at the end of the Discussion (as per Reviewer 1), and removed the comment that it is the prevailing view, stating instead that the hypothesis is “widely held.”

      Additional points: As requested, the timing of tamoxifen injections and tissue collection for immunohistochemistry is clarified in the protocol schematic of a new Figure 2A and Figure 2A legend.

    1. eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures, specifically host immune function and antibiotic exposure. The evidence presented is convincing, supported by rigorous phenotypic and genomic data from within-host evolution experiments. The manuscript now provides a nuanced and robust interpretation of how pathogens adapt to complex selective landscapes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, synergistically results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves parallel mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt. Furthermore, the detailed mapping of specific parallel mutations provides a highly useful genomic resource for the microbiology community.

      Revisions and Re-Appraisal:

      In the initial version of the manuscript, a primary limitation was the use of causal language to link specific mutations to phenotypes, despite the evidence from the evolution experiment being correlational. In this revised version, the authors have excellently addressed this limitation. They have meticulously revised the text to accurately reflect these relationships as strong, statistically significant genetic associations rather than confirmed facts. Furthermore, they explicitly acknowledge that future ancestral reconstruction experiments will be required to confirm direct causality. The authors have also appropriately clarified the visual interpretations of their data (such as the PCA clustering) and refined their discussion of mutation rates. With these revisions, the claims made are fully supported by the data presented.

      Impact and Context:

      The authors successfully achieve their aims, demonstrating that the combined effects of host and antibiotic pressures collaboratively propel the evolution of heightened virulence. While the nematode model does not perfectly mimic human or mammalian infection, the evolutionary principles uncovered here are highly relevant to both evolutionary biology and infectious disease management. The evidence presented is compelling, and the strong correlational hypotheses generated by this study offer a robust and significant basis for upcoming mechanistic research into pathogen adaptation.

      Comments on revisions:

      I commend the authors for their thorough, thoughtful, and highly constructive revision. You have successfully addressed all of my major and minor comments. The addition of Table S2 and the careful revisions to the causal language have significantly strengthened the manuscript and clarified the data interpretation. I have no further recommendations. Great work!

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded-off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because

      (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and

      (2) because the site of infection is different in C. elegans and human hosts.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

      Comments on revisions:

      Following the revision, I now think the weakness I initially described has been addressed well by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus were fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant stain MRSA and an isogenic susceptible strain MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Comments on revisions:

      For the most part, my comments have been addressed. It seems that the authors have not addressed my comments about quantifying population sizes in order to understand mutation supply, particularly in light of which experimental phase exhibits the strongest selection and possible increases in mutation rates. While I think this information would be very useful if they had collected it during the experiment, I don't think it is important enough to require additional experiments. I am therefore satisfied with the current state of the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

      Thank you for your feedback. Here, we summarize the major changes made in the revised manuscript:

      (1) We did not test causality between mutations and phenotypes in our study. We were intentional about not using causal wording (“mutation X caused/led to/resulted in phenotype Y”), and only discussed these results using the terms “correlation” and “association”, and only when they were statistically significant. We understand that some readers may view these terms as being equivalent to “causation”, thus in the revision, we have modified our wording as suggested (please see below for specific lines).

      (2) We agree that experimental evolution in nematodes is not a direct simulation of evolution in humans. The goal of our study was first and foremost, a test of how multiple selective pressures can shape pathogen evolution. This point was presented in the first paragraph, the second to last paragraph of the Introduction (which included our hypotheses), and the last paragraph of the manuscript. References to humans and other mammalian systems were intended to point out similarities between our findings and what had already been found in S. aureus outside the lab. Despite differences between mammals and nematodes, several parallels arose at both the phenotypic and genomic levels, which is interesting from an evolutionary standpoint. We understand that more experiments and tests would be needed before we can make claims about the selective pressures acting on S. aureus outside the lab. We presented some information in the context of humans because a large part of the literature on S. aureus is on its role as a major bacterial pathogen; we did not want to neglect this aspect of its natural life history.

      In the revised manuscript, we are more explicit in stating these points, as well as tempering some language regarding human infection, and removing some references to humans. Please see below for specific lines as well as justification for specific references to humans/mammalian systems.

      (3) We have including additional details on the experimental design below. We hope this is sufficiently clarifying.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Thank you very much.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      We have replaced instances of “association” and “correlation” with wording similar to that suggested where applicable, including:

      L 342 – 344: “The loss of SCCmec and ACME was more often identified in populations exhibiting an increase in total growth from the ancestor outside the host…”

      L 371 – 375: “Mutations in three genes were regularly identified in populations exhibiting significant increases in virulence from the ancestor: codY, gdpP, and pbpA. Mutations in agr in general were not associated with changes in overall virulence, but MSSA populations harboring mutations in this gene were more likely to exhibit greater virulence compared to MRSA populations (Wilcoxon rank sum exact test P = 0.045).”

      L 377: “Mutations in specific genes were often found in populations able to hemolyze red blood cells…”

      L 379 – 381: “There were also significant differences between the mutations regularly identified in oxacillin-resistant populations evolved from the MSSA ancestor...”

      L 384 – 385: “By contrast, mutations in agr were often in populations exhibiting loss of hemolytic activity, consistent with previous findings...”

      L 409 – 410: “Mutations that arose during experimental evolution are regularly found in strains associated with human systemic infections.”

      We have also stated that ancestral reconstruction is needed:

      L 553 – 555: “Future experiments may include introducing these mutations into the ancestral background to directly link the mutations in these genes to evolved virulence.”

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

      In reference to Fig. 6A, we have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Line 442

      In reference to Fig. S5, we conducted statistics to include both MRSA and MSSA populations and examined the effect of oxacillin on the number of mutations. While oxacillin had a significant effect on the number of mutations, we agree with the reviewer that this may be driven by the MRSA populations and have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because

      (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and

      (2) because the site of infection is different in C. elegans and human hosts.

      We have removed the two sentences referencing site of infection:

      Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      For our rationale for discussing humans in general, please see below.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

      Thank you very much.

      Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Thank you very much.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      While the population sizes within hosts and generation time could be determined, we would need to conduct additional experiments (e.g., infecting nematodes with S. aureus, then crushing, plating, and counting colony forming units across time intervals) in order to obtain measurements for pathogen growth in hosts across time. For experimental evolution, we crushed a set number of dead nematodes (30) and all bacteria that were released were allowed to grow in liquid media before an aliquot (25%) was used to seed the next passage. Picking and crushing nematodes across 48 populations for one time point was an arduous task. The additional steps of picking, crushing, and plating nematodes across multiple time intervals at the same time experimental evolution was being performed would not be logistically sound.

      In terms of the inoculation density of bacteria, all nematodes were placed on abundant lawns of S. aureus. Nematodes were exposed to full lawns the entire infection step; bacteria remained in abundance. While we do not know the exact inoculum each individual nematode was exposed to, we know that they ingested the bacteria because of the high mortality rate. Furthermore, we followed the same procedure for every replicate across every host-associated treatment. Host individuals within and across passages were also genetically identical to one another. Altogether, these factors allowed for more consistency across the experiment, such that relative inoculum size should be similar across individual hosts. Please refer to the evolution experiment diagram (Author response image 1) for more details.

      Ultimately, while knowing the absolute population size, inoculum size, and generation time within the host is interesting, the rounds of selection (the number of times each population was exposed to the selective pressures) is also important in addressing our major question. Every treatment, which started out from one ancestral clone (MRSA or MSSA), was exposed to the same number of bouts of selection (passages), yet we see significant divergence in terms of traits and mutations. Future directions would certainly involve determining the number of steps (e.g., number of generations within hosts) required to reach these end points, but not knowing exactly how many steps were required do not detract from addressing the larger question of determining how pathogens respond to multiple selective pressures.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

      We conducted every step of the evolution experiment on the same timeline. For example, all replicates across treatments were grown in liquid media at the same time (see Author response image 1.). All populations were exposed to the same selective pressures at this step of the experiment. We can then compare populations that were subsequently exposed to hosts against those that were not. Populations passaged without a host served as the control. Mutations that were solely unique to host-exposed populations would more likely contribute to the traits of interest, compared to mutations that were in common between the host-exposed and no-host treatments. Similar comparisons could be made with the oxacillin-exposed and no-oxacillin populations.

      In general, the only differences between treatments would be driven by the treatments themselves. Given that we are interested in treatment-level effects, any differences in population size or generation time between treatments could contribute to the treatment effects we observe, and thus were not something we aimed to hold uniform across our experiment.

      Author response image 1.

      Schematic of procedural steps involved in one passage of S. aureus through nematodes (+host -ox) compared to without nematodes (-host -ox).

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage you to address all other comments raised by the reviewers; however, the review team has identified the following points as the most critical and fundamental to improve your manuscript:

      (i) Reframing the narrative: You will need to adjust the narrative so that the study is presented as a "proof of principle" rather than a direct simulation of a human infection.

      While we referenced human infection, we believe the study had been presented as a proof of principle. Examples include:

      (1) We discussed the gap of knowledge in the first paragraph: “It is unclear how virulence evolves in the face of more than one selective pressure and whether this trait is constrained or facilitated by antibiotic resistance.” Lines 86 – 88

      (2) In the second to last paragraph in the Introduction, we presented the main hypotheses: “Adaptation may require resources to be expended toward either virulence or antibiotic resistance, leading to a trade-off between these traits (Ferenci, 2016). Alternatively, weaker selection from sub-MIC antibiotics may interact synergistically with hosts and facilitate the evolution or maintenance of high virulence and antibiotic resistance.” Lines 176 – 179

      (3) The last paragraph concluded with “Our findings ultimately emphasize the importance of considering the host context in the evolution of antibiotic resistance. Integrating multiple traits, such as virulence, antibiotic resistance, and fitness may be critical in identifying the factors that facilitate host shifts and persistence of drug-resistant pathogens.” Lines 613 – 616

      These paragraphs, which set up the context for our work, did not primarily discuss human infections.

      In the revised manuscript, we have further tempered language regarding human infection:

      L 169 - 172: “Experimentally evolving S. aureus in C. elegans thus allows us to track the early stages of virulence and antibiotic resistance evolution in novel host populations with the potential to identify conserved genomic regions underlying evolved traits.”

      L 595 – 596: “Additional direct tests are needed to evaluate the role of these mutations in adaptation of S. aureus to different infection sites.”

      L 610 – 611: “Pathogen evolution in a tractable invertebrate animal model yielded phenotypes and genotypes similar to those identified in mammalian hosts, highlighting the utility of evolution experiments to identify potential ecological and genetic mechanisms that may give rise to pathogen traits conserved across systems.”

      And removed some references to humans:

      In the Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      In the Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      Otherwise, our rationale for referencing humans/mammalian systems in our Introduction include:

      Setting the context of our study system: we discussed humans and clinical significance when we first introduced S. aureus (lines 132 – 151) and experimental evolution (lines 153 – 172). Much of what is known about S. aureus outside the lab is when it is interacting with humans, thus we weaved in relevant information that has been discovered in other organisms.

      Hemolysis: This ability is important for S. aureus virulence toward C. elegans (Sifri et al., 2003).

      S. aureus genomic database: we intended to leverage this large-scale database of genomes isolated from S. aureus outside the lab to compare patterns emerging from experimental evolution to those in existing isolates. Due to its relevance as a major bacterial pathogen, most of the isolates happen to be from clinical settings.

      (ii) Adjusting the causal language: You will need to soften the language so that correlational claims do not appear to be causal.

      We have adjusted language as noted above.

      (iii) Clarifying methodological aspects: You will need to provide more details on the methodology, such as population sizes, and clarify the implications of these in the conclusions of the work.

      We have provided additional explanation of methodology and the role of control (no host) treatments above.

      Reviewer #1 (Recommendations for the authors):

      The paper is robust, and the study is of great significance. Tackling the subsequent issues would greatly enhance the paper and elucidate its findings.

      Major Recommendations:

      (1) Revising Causal Language: The main flaw of the manuscript lies in its presentation of correlational data as if it were causal. We highly suggest a thorough review of the text to soften causal language when connecting genotypes to phenotypes. The absence of ancestral reconstruction should be recognized as a constraint. Assertions ought to be presented as robust, evidence-based hypotheses. For instance, rather than saying a mutation "associated with significant increases in virulence," you might say "was regularly identified in groups that developed increased virulence, strongly suggesting this gene's role in the adaptation." This will more precisely clarify the contribution of the work.

      We have softened language and stated that ancestral reconstruction is needed as noted above.

      (2) Expand on Parallel Mutations: The examination of parallel evolution in Figure 4A is intriguing but would be notably stronger with additional details. I suggest including an additional supplementary figure or table detailing the specific non-synonymous mutations identified in the highly parallel genes (e.g., codY, agr, gdpP). It is essential for the reader to understand whether parallel evolution is happening at the gene level (different mutations in a single gene) or at the nucleotide level (the precise same mutation appearing again). Kindly specify if any of these mutations were nonsense mutations, as this suggests that the loss-of-function is advantageous.

      The full table of mutations is in fig share (10.6084/m9.figshare.28745558). We have added a Supplemental Table (Table S2) containing mutations in genes occurring in more than two populations. Many of these mutations were not the same, indicating parallel evolution at the gene level (lines 315 – 317).

      Minor Recommendations for Clarity and Accuracy:

      (1) Introduction:

      Lines 176-177: Please add a citation for the statement describing the function of the SCCmec cassette, as this is established knowledge.

      Done.

      (2) Results:

      Section Title (Line 254): The title "Host and sub-MIC antibiotic promoted growth..." is imprecise. Figure 3B shows that it is the combination of these factors that promotes growth in MRSA, while oxacillin alone is detrimental. Please revise the title to reflect this synergistic effect.

      “Synergistically” has been added to the title: “Host and sub-MIC antibiotic synergistically promoted growth of MRSA…” Lines 269 – 270

      Lines 261-263: The description of Figure 3B is incomplete. The text should explicitly state that the -host+ox treatment resulted in the lowest growth for MRSA, which provides a critical contrast and suggests a fitness cost.

      We have added “By contrast, exposure to sub-MIC oxacillin alone yielded the lowest growth, suggesting a fitness cost.” Lines 277 – 278

      Line 294: The claim that "Sub-MIC oxacillin selection also resulted in more mutations" is a generalization not supported for the MSSA genotype, according to Figure S5. Please revise this sentence to specify that this effect was observed in the MRSA populations.

      We have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Lines 419-421: The claim that the +host+ox populations in Figure 6A "formed a distinct cluster" is an overstatement, as there is visible overlap with one other treatment (e.g., host-ox). Please revise this to more accurately describe the visual data (e.g., "clustered together, largely separating...").

      We have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Lines 442 – 443

      Lines 422-424: The interpretation of the MRSA PCA (Figure 6A) focuses on the correlation between virulence and sub-MIC growth. However, the correlation between "biofilm production" and "growth without oxacillin" appears visually stronger. Please address this correlation as well for a more complete interpretation.

      We have added “For MRSA populations, biofilm production and growth without oxacillin also appeared to be positively correlated.” Lines 447 – 448

      (3) Discussion:

      Lines 469-470: The statement that "exposure to oxacillin resulted in pathogens causing the greatest host mortality" is imprecise. The data in Figure 2A show that it is the combination of host and oxacillin. Please revise this for accuracy and add a direct citation to Figure 2A here.

      We have added clarification: “Nonetheless, we observed differing evolutionary trajectories, where exposure to oxacillin in host-associated treatments resulted in pathogens causing the greatest host mortality.” Lines 496 – 498

      Reviewer #2 (Recommendations for the authors):

      After reviewing the paper and reading the previous reviews from PLoS Biology, my biggest criticism of the paper is the way the story is told. In principle, the results are interesting and relevant, but the analogy to human infection and immune system/ antibiotic treatment strategies does not fit entirely with the experimental design or the results. I think the motivation needs to be reframed. In the study, antibiotic exposure is purely environmental, i.e., not in the host. How does environmental antibiotic use affect in vivo evolution, as this is not tested? As previous reviewers have pointed out, S. aureus is not an enteric pathogen in humans but most often causes skin infections. Furthermore, much of the results and discussion is focused on haemolysis of red blood cells, a cell type that C. elegans does not have. What the paper does present, on the other hand, and something that is interesting and novel, is a test in a model system of how a bacterial pathogen evolves to competing selection pressures. I might have hypothesised a priori that these competing pressures result in trade-offs, something which there is no evidence of, even though growth rate does not appear to be negatively impacted as a consequence of selection for drug resistance and virulence together. Instead, many traits are correlated and seemingly at the mechanistic level. This is cool and is a proof of principle, even if the system does not completely mirror reality, and I think the story should be told as such.

      We agree entirely with the reviewer that testing how pathogens respond to multiple selective pressures and the resulting lack of trade-offs are significant and interesting. We presented this question (lines 86 – 88) and our hypothesis about such trade-off in the Introduction (lines 176 – 179). As stated above, we had framed our paper to highlight these points and have removed references to antibiotic concentrations in treated humans.

      We measured and discussed hemolysis because it is important for virulence toward C. elegans (lines 195 – 197) (Sifri et al., 2003). We believe our manuscript contained a reasonable discussion of this trait. For example, three panels of the main figures presented the main hemolysis results (Figures 2B, 2C, and 2D), whereas 23 other panels did not at all involve hemolysis. In the Discussion, hemolysis took up half of the shortest paragraph (lines 509 – 519) and an additional sentence (line 589 – 591), out of seven total paragraphs.

      Specific comments:

      (1) L137-138. Can S. aureus really survive for long periods of time outside of the host? Can you clarify this statement? Do you mean it is an opportunistic pathogen and can also replicate in the environment?

      S. aureus can form biofilms and persist for weeks on inert surfaces (Kramer et al., 2024; Tran et al., 2023), indicating that it may replicate in non-host environments. We have included the phrase “opportunistic pathogen” to clarify (line 145).

      (2) L187 - to ascertain

      Corrected.

      (3) Figure 2B - there seems to be a benefit of haemolysis activity to oxacillin resistance, perhaps a crossover in mechanism? In MSSA, without a host, it goes to complete fixation, whereas it is completely lost when antibiotics aren't present. I know this is discussed later, but I would appreciate a more detailed hypothesis of why this could be.

      Antibiotics have been found to induce expression of virulence traits, such as in the case of oxacillin and hemolysis. Thus, it is reasonable that exposure to oxacillin during evolution would maintain MSSA’s hemolytic ability. We hypothesize that the loss of hemolysis in the absence of oxacillin may be due to the cost of hemolysis expression without a stimulant (oxacillin), hemolysis may not be expressed as often and be subject to deleterious mutations. Alternatively, the stress that cells were under favored virulence in some way, rather than the direct action of the antibiotic.

      (4) L225-228 - As C. elegans do not have red blood cells, why would we expect this? Do you see increased lysis of C. elegans gut cells? Or could it be due to iron accumulation as you are growing the staph on BHI?

      We measured and correlated nematode mortality with hemolytic ability because hemolysis had been found to be involved in virulence toward C. elegans (Sifri et al., 2003). The hemolysis phenotype is a surrogate for S. aureus virulence gene expression.

      (5) Figure 3A - There seems to be a growth cost of evolving oxacillin resistance in the absence of a host. Why might this be?

      MRSA populations exposed to oxacillin without a host during evolution visually exhibited the lowest growth rate. While this is an interesting question, the result was not statistically significant, so we cannot speculate in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Some claims in the introduction are either non cited or not correctly stated. The second sentence has a claim about the interplay between antibiotic resistance and virulence with no citation listed. Additionally, there is a claim about S. aureus "evading detection" by attacking the host's immune cells. That is by definition not avoiding detection. Perhaps phrasing it as resisting host immune function would make it clearer.

      We have added a citation (lines 80 – 81) and clarified our wording: “Once inside the host, S. aureus resists host immune function by hindering or lysing immune cells.” Lines 140 – 141

      (2) Once in the introduction and in the discussion, the authors referred to S. aureus as a novel pathogen for C. elegans, I do not think enough is known to make this statement.

      This S. aureus strain is novel because it was isolated from humans, so at least in its recent evolutionary past, it has not interacted with C. elegans. Furthermore, we used a C. elegans isolate (N2) that had been frozen and maintained in the lab on E. coli, and had not been exposed to other microbes in its recent evolutionary past. Finally, S. aureus has not been found to be a native pathogen of C. elegans in nature (Ekroth et al., 2021).

      (3) Key suggestion: Change Figure 1C to reflect the design better. So you could have the +OXA before the host and then have an arrow looping back again to show the cycle of each step. So a figure that would have something like: MRSA > +OXA > +host>+OXA --> MRSA .

      We have updated the figure as suggested.

      (4) Suggest changing "greatest" on line 191, section header to greater.

      Done.

      (5) Line 258: Rich media can still provide selective pressures that are difficult to quantify - fast growth, cofactor and other nutrient limitations due to that fast growth

      We have adjusted our wording: “Importantly, rich media reduced the risk of introducing additional selective pressures than those being tested.” Lines 273 – 274

      (6) Why were intergenic mutations routinely ignored? These can often be very important phenotypically.

      We had focused on genes because there was a sufficient number of genes to discuss, but we have added a Supplemental Table (Table S2) containing all mutations (including intergenic and synonymous) appearing in more than 2 populations. We have also added information regarding mecA, an accessory gene, highlighting the role non-core genes may have in shaping bacterial evolution:

      “Despite evolving in similar environments, MRSA and MSSA populations differing only in the presence of an intact accessory gene (mecA)—proceeded on divergent evolutionary paths…” Lines 66 – 68

      “Carriage of Staphylococcal cassette chromosome mec (SCCmec), which encodes mecA, an accessory gene that provides resistance…” Lines 187 – 188

      “As MRSA and MSSA only differed in the presence of an intact mecA gene at the start of the experiment, accessory genes may play important roles in shaping bacterial evolution (Jackson et al., 2011).” Lines 472 – 474

      (7) Line 294: more mutations than what?

      We have clarified the sentence: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence…” Lines 310 – 311

      (8) Lines 295-297: wording is pretty confusing. It seems that the discussion is about increased mutation rates, possibly due to hypermutators resulting from mutL or recA mutations, but this isn't well-thought out and much is implied here. Furthermore, see the above comment about comparing mutations across conditions - it's hard to make inferences of mutation rates without knowing the mutation supply as a result of varying population sizes across conditions and through the experiment.

      We have clarified the sentence: “…there were only two mutations in DNA and mismatch repair genes (mutL and recA), suggesting repair genes were not the sole mechanism involved.” Lines 313 – 314

      Because all populations evolved from one ancestral clone (either MRSA or MSSA), all mutations that are found at the end of the experiment would have arisen de novo from that ancestor. Since all populations experienced the same number of passages/rounds of selection, we determined mutation rate by counting the number of mutations that were found at the last passage for each replicate population. Populations that acquired significantly more mutations had a higher mutation rate in terms of # of mutations/# of selection rounds.

      (9) Line 486: typo "Mutations genes".

      Corrected.

      (10) Line 487: "antibiotics may allow" is awkward; suggest changing to more precise language, possibly relating to pleiotropy if that is what was meant here.

      We had intended to mean “adaptation [to antibiotics] may allow”. We have clarified: “Mutations in genes involved in resistance to antibiotics were found more often in populations with increased virulence, suggesting that antibiotic adaptation may also favor evolution of virulence.” Lines 514 – 516

      REFERENCES

      Ekroth AKE, Gerth M, Stevens EJ, Ford SA, King KC. 2021. Host genotype and genetic diversity shape the evolution of a novel bacterial infection. ISME Journal 15:2146–2157. DOI: https://doi.org/10.1038/s41396-021-00911-3, PMID: 33603148

      Kramer A, Lexow F, Bludau A, Köster AM, Misailovski M, Seifert U, Eggers M, Rutala W, Dancer SJ, Scheithauer S. 2024. How long do bacteria, fungi, protozoa, and viruses retain their replication capacity on inanimate surfaces? A systematic review examining environmental resilience versus healthcare-associated infection risk by “fomite-borne risk assessment.” Clinical Microbiology Reviews. PMID: 39388143

      Sifri CD, Begun J, Ausubel FM, Calderwood SB. 2003. Caenorhabditis elegans as a model host for Staphylococcus aureus pathogenesis. Infection and Immunity 71:2208–2217. DOI: https://doi.org/10.1128/IAI.71.4.2208-2217.2003, PMID: 12654843

      Tran NN, Morrisette T, Jorgensen SCJ, Orench-Benvenutti JM, Kebriaei R. 2023. Current therapies and challenges for the treatment of Staphylococcus aureus biofilm-related infections. Pharmacotherapy 43:816–832. DOI: https://doi.org/10.1002/phar.2806, PMID: 37133439

    1. eLife Assessment

      This important study shows that action potentials undergo frequency-dependent failure along the axons of fast-spiking interneurons during sustained high-frequency firing, offering a mechanistic explanation for why inhibition may fail to restrain seizures. The evidence is solid, though additional analyses could further strengthen the mechanistic interpretation. The work will be of broad interest to neuroscientists studying axonal physiology, cortical inhibition, and epilepsy.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines whether action potentials (APs) reliably propagate to the distal axon in neocortical parvalbumin-expressing interneurons (PV-Ins) during prolonged high-frequency activity, as occurring during epileptiform activity. The authors use dual soma and axon-attached patch-clamp recordings from mouse and human PV-INs and show that axon AP amplitude declines when the firing frequency exceeds ~200 Hz and fails during seizure-like bursts. Finally, they show that elevation of external K+ to 10 mM also reduces AP amplitude. Taken together, these data strongly suggest that the reduction in transmitter release observed during intense PV-INs activity or during seizure-like events is mainly mediated by the reduction in the presynaptic AP amplitude in PV-INs.

      Strengths:

      This paper is very interesting, well-written and technically impressive. It provides new and important results. The paper will have a great impact in the field of both axon physiology and epilepsy.

      Weaknesses:

      I did not find any significant weakness in the methods, data analysis and results.

    3. Reviewer #2 (Public review):

      Summary:

      The authors demonstrate a frequency-dependent progressive failure of action potential propagation through the axonal arbors in fast-spiking interneurons

      Strengths:

      The experimental protocols are technically challenging, but the data is of very high quality, and the presentation and writing are very clear.

      I congratulate the authors on submitting a really excellent study demonstrating an activity-dependent alteration in the efficacy of axonal propagation of action potentials in fast-spiking interneurons. It is a well-designed project involving technically challenging experiments, and yet the data is of very high quality, the results are compelling, and the presentation is clear.

      Weaknesses:

      I have some minor suggestions and comments, including those below, but I hope and expect that these could be performed quickly and without difficulty.

      Two of the most interesting figures were consigned to the supplementary information, and I would recommend that they are "upgraded" to be in the main document. The two figures are Figure 1 - Figure Supplement 2, showing the inverse correlation of the AP size with recording distance and branch; and Figure 6 - Figure Supplement 1, showing the postsynaptic effect. My rationale for saying this is that I feel that both add useful biological information to the narrative.

      I was glad to see that "realistic" firing patterns were used, because I recall an old modelling paper from Mainen and Sejnowski (https://pubmed.ncbi.nlm.nih.gov/7770778/) that is highly relevant to this paper and should be referenced. However, I would like to suggest one further bit of analysis of the data presented in Figure 4, because I think it will support the main story. In Figure 4, the ostensible conclusion is that there is relative preservation of spike amplitude for this natural firing pattern, but that is almost certainly because the average firing rate is substantially below the level where spike amplitude suppression was seen in Figures 2 & 3. Instead, I recommend analysing for each consecutive spike pair, the ratio of the heights of the two spikes with respect to the interspike interval. Viz<br /> t2 - t1 versus spike 2 amplitude / spike 1 amplitude

      The data may be a little noisy, but given the very large number of spike pairs, I would expect to see the suppression effect to be fully evident, and that can feed directly into the model.<br /> I think the author's intuition that dissipation of ionic gradients is a key factor is correct, so I was pleased that Na+ was not ignored in the discussion (the results section only talked about K+).

      Perhaps the fact that Na gradients may also be depleted could be mentioned in the results section, too. In the discussion, perhaps the authors could mention two other details: that this "fatigue" may reflect ATP depletion, and progressive failure of the Na-K-ATPase in the axons. That could be examined in a follow-up study (I certainly am not suggesting a raft of experiments for this study), but it could be mentioned in the discussion. And second, that the ionic depletion may be greater within the confines of the cell-attached pipette tip, which is why the branching pattern/distance data (F1FS2), the Ca imaging data and the post-synaptic effects (F6FS1) are such important additional supporting data, because together they indicate that the effect is along the whole axon.

      Regarding the rise in [K+]o, it would be worth mentioning the fact that this will be greatly exacerbated by the postsynaptic effects of high-frequency PV activity, because the consequent Cl loading of the postsynaptic cell is subsequently cleared by coupling to K+ extrusion. A good reference for this is http://www.ncbi.nlm.nih.gov/pubmed/20211979; a recent review (https://pubmed.ncbi.nlm.nih.gov/39637123/), which argues that this may even be the dominant source of raised [K+]o in the immediate preictal period, larger even than that exiting cells through the Hodgkin-Huxley mechanism.

      The referencing needs some attention. In some instances, the citations either do not really illustrate preceding statements or are simply the wrong citation.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting paper which asks a compelling and translationally relevant question: since the firing rate of GABAergic PV+ interneurons (which powerfully control pyramidal cell excitability) increases prior to and during seizures, why doesn't this increase in inhibition do more to prevent seizure propagation? The authors hypothesize that increased PV+ spiking might lead to spike propagation failures in the axon.

      To test this hypothesis, the authors conduct paired electrophysiological recordings from PV+ neurons in acute barrel cortex slices of mice and from a handful of human neurosurgical samples. They use patch clamp recordings to measure the membrane potential of PV+ neurons at the soma, while simultaneously measuring spike propagation with a recording electrode in the axon of the same neuron.

      After a variety of elegant experiments and modeling, the authors conclude that extracellular K+ accumulation around the axon during high-frequency firing might be causing propagation failures.

      Strengths:

      Overall, the paper is nicely written, the experiments are technically challenging, and the figures are, for the most par,t well laid out. The topic will be of broad interest for the neuroscience field, given the relevance of PV+ interneurons to cortical circuit function, plasticity across development, and disease.

      Weaknesses:

      In addition to the strengths here, I feel the need to highlight a few weaknesses which, if rectified, could improve the work.

      (1) The key hypothesis in this paper is that extended periods of somatic spiking lead to progressive decreases in the axonal AP amplitude, which eventually lead to failures, potentially (but not necessarily) at branchpoints. Two comments here.

      It would be helpful for the authors to show us examples of the axonal spike waveforms at a faster time base (along with the somatic recording) so that we can really understand what's happening to the spike in the axon.

      Their data are also compatible with failures of spike initiation at the AIS. Could the authors show us the first derivative of somatic voltage for successes and failures, and maybe show us some phase plots of Vm vs dV/dt for the failures, successes, and attenuated spikes? Effectively, what I'm asking is whether the changes they see in the distal axon are downstream of the initiation zone. It's very possible that extended spiking is simply depolarizing the AIS and inactivating Na+ channels there. In which case, the authors should be able to pull this out in a phase plot.

      (2) There's no baseline period for their calcium fluorescence signals, which is necessary to compare their "signal" magnitude to frame-by-frame variance of dG/R. Could the authors correct this issue in Figure 6B?

      (3) Some of their stats are a bit unorthodox. Why are they doing two separate Wilcoxon tests in 6D and 6E? Why not throw all that into a one-way ANOVA model followed by appropriate post hoc tests?

      (4) Why don't the authors observe washout of their effect after high K+ application? This concerns me that their high K+ application is having secondary and long-lasting effects on PV excitability, which mimic (but are not necessarily identical) to their hypothesized mechanism of axonal failures.

    1. eLife Assessment

      This valuable study contributes to the field of neuro-glial biology by establishing a direct causal link between astrocytic metabolism (glycolysis) and the structural wiring of neural circuits. Connecting the metabolic-synaptic mechanism to locomotor reorientation in the dopaminergic circuit offers new insights into how energy metabolism shapes circuit assembly and function. The evidence offers a solid foundation, moving logically from molecular mechanisms to circuit-level anatomy and finally to behavior; however, several central conclusions currently exceed the direct evidence presented. With appropriate calibration of claims and interpretations and/or additional clarifying experiments, the manuscript has the potential to make a significant contribution to our understanding of glial regulation of circuit assembly.

    2. Reviewer #1 (Public review):

      This study investigates how astrocyte metabolic state influences astrocyte-synapse interactions and the organization of the dopaminergic circuit in the Drosophila CNS. Using a creative split-GFP-based contact reporter ("PEAPOD"), combined with genetic perturbations of glycolytic enzymes, synaptic labeling, EM, transsynaptic tracing, single-cell transcriptomics, and behavioral assays, the authors propose that disruption of astrocyte glycolysis enhances astrocyte-dopamine neuron contacts, promotes synaptogenesis, and biases dopaminergic-motor circuit connectivity through a mechanism involving altered Neuroligin 2 trafficking.

      The work is conceptually ambitious and technically broad. The development and application of a contact reporter for astrocyte-neuronal interfaces is potentially valuable to the field, and the convergence of multiple glycolytic perturbations on similar phenotypes is a notable strength. However, several central conclusions currently extend beyond the direct evidence presented. Clarification and calibration of these claims would substantially strengthen the manuscript.

      Major Points:

      (1) Astrocyte glycolytic impairment is inferred rather than directly demonstrated

      The central premise of the manuscript is that reduced astrocyte glycolysis drives the observed phenotypes. While multiple glycolytic enzymes (e.g., pfk, eno, pyk) are genetically perturbed and produce similar increases in PEAPOD signal, the manuscript does not directly demonstrate altered glycolytic flux or metabolic state in astrocytes under these conditions. Reduced enzyme levels or genetic mutation do not necessarily establish functional metabolic deficiency, particularly given potential compensatory mechanisms.

      Because glycolytic impairment is foundational to the proposed mechanism, the conclusions should either be supported by direct metabolic readouts in astrocytes or framed more cautiously as perturbations of glycolytic enzymes rather than confirmed metabolic deficiency.

      (2) Interpretation of the PEAPOD signal requires clearer calibration

      The PEAPOD system is an innovative tool to detect membrane proximity between astrocytes and dopamine neurons. However, the manuscript frequently interprets increased PEAPOD intensity and volume as increased "ensheathment" or increased synaptic contact. A split-GFP-based reporter measures membrane apposition within a defined spatial range but does not directly quantify structural ensheathment, synapse number, or functional synaptic engagement.

      Although the authors show an association of the PEAPOD signal with presynaptic markers in some regions, the distinction between increased membrane contact, altered membrane organization, and true changes in perisynaptic coverage should be more explicitly discussed. Several conclusions would benefit from clearer wording that distinguishes contact proximity from ultrastructural or functional synapse remodeling.

      (3) Evidence for biased dopaminergic-motor circuit wiring is indirect

      The manuscript proposes that disruption of astrocyte glycolysis biases dopaminergic-motor connectivity. This conclusion relies heavily on trans-Tango labeling intensity and downstream cell-type composition analysis via FACS and single-cell RNA sequencing.

      Transsynaptic labeling approaches can be influenced by expression levels, reporter trafficking, labeling efficiency, and differential recovery during dissociation and FACS. Changes in labeled cell abundance or reporter intensity do not necessarily equate to altered synaptic wiring. Given that this conclusion represents a major conceptual advance of the study, the manuscript should either provide additional orthogonal support or temper the claim to reflect that altered labeling efficiency or synaptic engagement, rather than definitive rewiring, has been demonstrated.

      (4) Mechanistic claims regarding Neuroligin 2 trafficking are suggestive but not definitive

      The authors propose that astrocyte glycolytic disruption alters Neuroligin 2 (Nlg2) trafficking, leading to ER retention and downstream synaptogenic effects. The observation of Nlg2-positive intracellular bodies colocalizing with ER markers is intriguing. However, quantitative analysis, additional compartment markers, and/or biochemical support would be necessary to firmly establish altered ER exit or glycosylation status.

      At present, the mechanistic model is plausible but should be presented more explicitly as a working model supported by suggestive evidence rather than a fully established trafficking defect.

      (5) Behavioral phenotypes are not yet causally linked to dopaminergic circuit changes

      The locomotor phenotypes observed upon astrocyte glycolytic perturbation are clear. However, the manuscript attributes these changes to altered dopaminergic-motor connectivity. A direct causal linkage between astrocyte metabolic state, dopaminergic circuit remodeling, and behavior is not conclusively demonstrated. The discussion should either clarify the inferential nature of this link or provide additional evidence supporting dopamine-specific dependence.

      Minor Points:

      (1) Statistical analyses across multi-group comparisons should be more clearly justified, particularly where multiple pairwise tests are performed. A clarification of the multiple-comparison correction and the exact comparison strategy would improve rigor.

      (2) The temporal interpretation of activity-dependent remodeling experiments would benefit from a clearer explanation of what timescale is being tested.

      (3) Developmental compensation versus the acute effects of glycolytic perturbation are not fully distinguished and should be discussed.

      (4) The orthology and functional equivalence of Drosophila Nlg2 should be described with greater precision to avoid potential confusion.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a significant advance in our understanding of how metabolic states in astrocytes directly influence the structural assembly and functional output of neural circuits. By focusing on the Drosophila larval dopaminergic system, the authors uncover an interesting mechanism: astrocyte glycolysis acts as a negative regulator of PEAPODs, ultimately altering locomotor behavior. Metabolic fluctuations (e.g., due to diet, development, or disease) could fundamentally reshape neural connectivity, with broad implications for neurodevelopmental and metabolic disorders.

      Strengths:

      The manuscript offers a compelling narrative linking astrocyte metabolism to DA-MN circuit wiring and behavior. For the field, this study serves as an important prompt to investigate how metabolic states might dynamically tune neural connectivity during development and in disease.

      Weaknesses:

      The definitive acceptance of the proposed linear mechanism depends on future validation through genetic interaction tests and rescue experiments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors are trying to demonstrate how astrocytes influence the connections within neural circuits that control behavior.

      Strengths:

      The data presented in the manuscript are thorough and well-executed, using advanced Drosophila approaches (Ca2+ imaging, GRASP, clonal analysis, trans-Tango) in new ways (PEAPODS) and with new tools (pyk mutants, anti-pyk Ab, LexAop2-pykRNAi). Use of two RNAi lines for each of three glycolytic enzymes is strong evidence that perturbation of glycolysis is responsible, though it does not rule out that inappropriate build-up of intermediates, or shunting to alternative pathways, may play a role here. Subsequent focus on Pyk alone is understandable.

      Weaknesses:

      As strong as the data is, it does not always support some of the stated claims, and this should be addressed in any revision. In addition, there seems to be an oversimplification of the possible effects of Pyk RNAi, and some missing pieces that could fill in gaps and align the proposed mechanism with observed phenotypes.

      Where the data does not support stated claims:

      (1) The authors claim larvae executed more reorientation actions during locomotion "as a result" of attenuated astrocyte-to-DAergic neuron signaling through neuroligin 2 (astrocyte) and neurexin 1 (DA Neuron). They correlated these, but did not make a direct connection.

      (2) There is a claim that "at the circuit level, behavioral alterations were found to arise from increased DAergic neuronal synaptogenesis and DAergic-motor connection" (sic). However, the work does not build a causal relationship between behavior and synaptogenesis or connectivity. At present, the manuscript does not directly address whether increased DA-motor neuron synapses are sufficient to explain the increased orientation reactions observed.

      (3) It is asserted that (line 182, and elsewhere) "astrocyte glycolysis deficiency increased PEAPODs and DAergic neuron synaptogenesis". While astrocyte Pyk KD increased PEAPODS (Figure 2), and it also increased endogenous Brp-GFP in DA neurons (via STaR, Figure 3F), the added Brp-GFP was not localized to synapses under these conditions (pyk KD), to unequivocally demonstrate that the increased PEAPODS are at the sites of DAergic synapses. Also seen in 6I-J.

      (4) It may be premature to refer to this strictly as synaptogenesis, as alternative explanations (e.g., stabilization or impaired pruning) could also account for the observations.

      (5) The use of trans-Tango is an elegant way to support the idea that extra DAergic synapses are formed onto motor neurons, with potential impact on motor circuits. But again, the claim (line 215, and elsewhere) that this "Biased DAergic-motor wiring" is what "alters motor output", would benefit from additional evidence.

      (6) Oversimplification of the possible effects of Pyk RNAi: Because Pyk knockdown is likely to alter glycolytic flux rather than abolish glycolysis entirely, it may be clearer to describe the manipulation as 'Pyk loss' rather than 'glycolysis-deficient' in most contexts.

      (7) Filling gaps to align the proposed mechanism with observed phenotypes:

      a) Figure 6K-M - the ER retention of Nlg2 should also be tested using Pyk-RNAi, in addition to the pyk mutants shown. This would confirm the astrocyte-specific nature of this effect and close the loop to align the phenotypes.

      b) From the mechanism proposed (ER retention of Nlg, presumably leading to loss of Nlg function in astrocytes), one might expect that the effects of loss of Nlg2 from astrocytes could phenocopy the behavioral deficits seen in pyk KD (from astrocytes). Ackerman et al (2021) knocked down Nlg2 from astrocytes and examined motor behavior with FIMTrack. They saw increased accumulated distance but did not see the effects seen upon pyk KD in this manuscript (increased pausing, sweeping). The authors could perform this experiment themselves or alternatively should address this inconsistency in the discussion.

    1. eLife Assessment

      This important study by Kong et al. systematically and rigorously dissects the gene regulatory network underlying melanoma and breast cancer risk at the multi-cancer 2q33 locus. The authors provide compelling evidence that rs3769823 is a key functional variant that acts through allele-preferential binding of the transcription factors E4F1 and IRF2 to regulate CASP8 and FLACC1 in a cell-type-specific manner. The work makes a significant contribution to understanding the mechanisms operating at multi-cancer risk loci.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Kong et al. conduct a systematic analysis of the multi-cancer risk locus at 2q33. The authors start with a careful analysis of co-localization between the melanoma risk SNPs and several other cancers and conclude that a subset of credible causal SNPs is shared among different cancers, including breast cancer. Next, they define a starting list of 27 SNPs as potential credible causal SNPs and analysis of TADs (topologically associating domains) to zoom in on CASP8 and FLA CC1 as potential target genes. They then systematically rule out coding and splicing variants in the set and focus on a smaller set of three SNPs constituting a melanocyte enhancer element. Using a combination of mass spectrometry, reporter assays, and electrophoretic mobility shift assays, the authors define a role for transcription factors IRF2 and E4F1 in the regulatory network driving risk at the locus.

      This work represents a high-quality tour de force, using multiple tools, to zoom in on a gene expression regulatory network associated with risk for multiple cancers. It provides a detailed framework for analyses of other multi-cancer risk loci. Limitations of the work, which is rather a current limitation of the field, is the lack of a model to study how the identified network of regulatory elements, transcription factors, and target genes mechanistically drive risk at the organismal level. Advances such as those described in this manuscript contribute significantly to our knowledge of how common risk variants drive risk.

    3. Reviewer #2 (Public review):

      Summary:

      Kong et al. investigate a well-validated risk locus at chromosome band 2q33.1 adjacent to CASP8, a ubiquitously expressed and central initiator caspase in the extrinsic apoptotic pathway. Importantly, this region is a multi-cancer risk locus harboring multiple highly correlated risk alleles that are confounded by linkage. In addition to protein coding and splicing variants, further evaluation of eQTL and TWAS results for the locus suggests a cis-regulatory effect is present for CASP8 and nearby FLACC1. The authors prioritize variants using orthogonal statistical fine-mapping approaches and triage top candidates for functional assays. Luciferase reporter assays demonstrated convincing allele-specific regulatory activity of rs3769823 variant as well as suggestive evidence for rs3769821 and rs59308963. These three variants lie in close proximity within a melanocyte regulatory element marked by overlapping promoter and enhancer chromatin state signals. The authors employ a haplotype reporter assay, which shows that the combination of risk alleles in the forward direction has additive effects compared to the protective haplotype. These effects are also cell type specific among melanocytes, melanoma, and breast cancer cell states. Utilizing electron mobility shift assays, the authors convincingly show augmented nuclear protein binding of the rs3769823-A risk allele, and mass spectrometry of allele-specific rs3769823 binding proteins revealed specific activity of E4F1 and IRF2, whose motif score is strengthened by the risk allele. Correlation of these transcription factors' expression with CASP8 expression suggested repressive effects of E4F1 and activating effects of IRF2, which were confirmed in siRNA assays across multiple cell types. These data provide important evidence towards the molecular mechanisms governing disease susceptibility at the 2q33.1 risk locus and nominate s3769823 as a causal variant through cis-regulatory activity by E4F1 and IRF2.

      Strengths:

      Major strengths of the work include the authors' employment of orthogonal fine-mapping approaches and functional assays in multiple cell types. These help to fortify a novel molecular mechanism of rs3769823 and also work together to propose a complicated multi-variant and cell-type-specific effect at this locus, which is worth future investigation.

      Weaknesses:

      The rs3769823 variant is a protein-coding variant for CASP8. While the authors conclude that this is likely neutral to CASP8 function, their evidence is suggestive at best and does not close the door on a protein-coding function for this variant.

      Similarly, another variant, rs10804111, is associated with alternative splicing of CASP8. The authors do well to include the potent rs10804111 sQTL effect on CASP8 and further confirm it by a minigene assay. However, its exclusion from the fine-mapping results may be due to a potent bias towards active chromatin marks. Therefore, rs10804111 still requires further investigation.<br /> Some attention is given to FLACC1, whose promoter may be in contact with multiple variants. However, little is known about FLACC1 function, and the authors don't provide meaningful supporting data to illustrate whether FLACC1 is relevant in the context of melanocyte, melanoma, or other cancer types that share this risk locus (breast, prostate). Showing the absolute expression levels in the eQTL analysis would be helpful towards this.

      Phenotypic assays interrogating the rs3769823-E4F1-IRF2 relevance to melanocyte biology and melanoma pathogenesis are not included.

      Finally, the segmented figure organization negatively impacts the readability of the paper.

    1. eLife Assessment

      This important study introduces an innovative synthetic nanobody approach to probe the function of the bacterial SMC complex. The work is a compelling example of the potential of this approach. The authors generate protein chimeras to provide convincing evidence that their identified nanobodies target the coiled-coil region of the SMC subunit, demonstrating that this region is critical for SMC function in vivo. Overall, the work is significant for the fields of genome organisation, SMC protein biology, synthetic biology, and bacterial cell biology.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the major comments raised in the previous round of review. Public Reviews below refer to the version submitted to Review Commons.]

      Summary:

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      Significance:

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins has been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

    3. Reviewer #2 (Public review):

      Summary:

      Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      Significance:

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

    4. Reviewer #3 (Public review):

      Summary:

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition of the Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the "transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc "neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism is that the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only identify sybodies that bind to a rather small part of the large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      Fig. 2B: is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the "counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      Testing binding sites of sybodies to the SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      Significance:

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not trivial, even for a smaller selection of sybodies. We have now incorporated ELISA data as new Table S1, which shows that most sybodies support clear binding to Smc-ScpAB. Curiously, while (only) some sybodies show a clear preference for ATP-bound or unbound Smc, this is not a strong predictor of the strength of phenotype observed in vivo. We have also attempted to characterize the binding of Smc to sybodies by other methods including pull-downs, cross-linking, and by biophysical methods (GCI). However, we prefer to not include these data as the outcomes are not clear due to inconsistencies in the behaviour of purified sybodies.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface that we have confidently identified by the mapping using chimeric proteins. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the mapped binding sites are located on the SMC coiled coils, the later scenario seems unlikely and would be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This shows that sybodies without strong phenotypes are similarly expressed at least at low inducer concentration. Moreover, many sybodies localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We have included example data in the revised version of the manuscript as Figure S4 and Figure S5. Notably, a sybody (Sb007) with a weak growth phenotype shows focal localization at low inducer concentration and high expression levels when fully induced, comparable to sybodies with strong phenotypes. Altogether, this suggests that the lack of phenotype is not due to absence of sybody expression or localization.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As alluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes of sybody expression are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes in B. subtilis. To highlight this point, we have added the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state.”

      “ELISA data revealed that nearly all clones bind purified Smc-ScpAB (Table 1). However, the ELISA signals of only few Sybodies showed clear dependence on the presence or absence of ATP and DNA (Table S1).”

      Significance:

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Public review):

      Summary:

      Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      Significance:

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Reviewer #3 (Public review):

      Summary:

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition of the Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the "transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc "neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism is that the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only identify sybodies that bind to a rather small part of the large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      The reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the this point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this is not expected to restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). Notably, the effect size of ATP/DNA on ELISA signals was not a strong predictor to the strength of phenotypes observed in vivo. The text has been revised accordingly. See line 84 and line 92.

      We are thus quite confident based prior work (and on the now included ELISA data) that the Smc ATPase mutation did not strongly bias the selection in one way or another. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies from the loop library.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results of two of the three libraries but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then sybodies are likely ineffective in inactivating Smc in B. subtilis, with the notable exceptions of the sybodies that we have isolated and characterized in this manuscript. We have added this notion to the manuscript.

      Fig. 2B: is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the "counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point. We have added the following statement to clarify this point: “These elongated cells are known to harbour expanded nucleoids, consistent with delayed oriC separation rather than delayed DNA replication”

      Testing binding sites of sybodies to the SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we have added ELISA results (new Table S1) that support a direct interaction.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and have simplified the statement by removing “stabilize” and “transient”.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils. which are less well represented in the SMC literature (when compared to the heads and hinge domains for example), likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Significance:

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

    1. eLife Assessment

      This valuable study focuses on a unique morphogenetic module, the junction-based lamellipodia (JBL). It provides a biomechanical understanding of how JBLs control endothelial cell-cell junctional remodelling to generate lumenised, multicellular blood vessels. The manuscript represents a robust, thoughtfully executed, and convincing study that uses high-resolution time-lapse imaging combined with pharmacological treatments to advance our understanding of lumen formation in vascular development.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Original review:

      Summary:

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      Strengths:

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      Weaknesses:

      The study primarily presents descriptive observations and includes limited quantitative analyses or genetic modifications. Molecular mechanisms are typically interrogated through the use of pharmacological inhibitors rather than genetic approaches. Furthermore, the precise semantic distinction between JAIL and JBL requires additional clarification, as current evidence suggests their biological relevance may substantially overlap.

    3. Reviewer #2 (Public review):

      Original review:

      Summary:

      In Maggi et al., the authors investigated the mechanisms that regulate the dynamics of a specialized junctional structure called junction-based lamellipodia (JBL), which they have previously identified during multicellular vascular tube formation in the zebrafish. They identified the Arp2/3 complex to dynamically localize at expanding JBLs and showed that the chemical inhibition of Arp2/3 activity slowed junctional elongation. The authors therefore concluded that actin polymerization at JBLs pushes the distal junction forward to expand the JBL. They further revealed the accumulation of Myl9a/Myl9b (marker for MLC) at the junctional pole, at interjunctional regions, suggesting that contractile activity drives the merging of proximal and distal junctions. Indeed, chemical inhibition of ROCK activity decreased junctional mergence. With these new findings, the authors added new molecular and cellular details into the previously proposed clutch mechanism by proposing that Arp2/3-dependent actin polymerization provides pushing forces while actomyosin contractility drives the merging of proximal and distal junctions, explaining the oscillatory protrusive nature of JBLs.

      Strengths:

      The authors provide detailed analyses of endothelial cell-cell dynamics through time-lapse imaging of junctional and cytoskeletal components at subcellular resolution. The use of zebrafish as an animal model system is invaluable in identifying novel mechanisms that explain the organizing principles of how blood vessels are formed. The data is well presented, and the manuscript is easy to read.

      Weaknesses:

      While the data generally support the conclusions reached, some aspects can be strengthened. For the untrained eye, it is unclear where the proximal and distal junctions are in some images, and so it is difficult to follow their dynamics (especially in experiments where Cdh5 is used as the junctional marker). Images would benefit from clear annotation of the two junctions. All perturbation experiments were done using chemical inhibitors; this can be further supported by genetic perturbations.

    4. Reviewer #3 (Public review):

      Original review:

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      Strengths:

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      Weaknesses:

      The study primarily presents descriptive observations and includes limited quantitative analyses or genetic modifications. Molecular mechanisms are typically interrogated through the use of pharmacological inhibitors rather than genetic approaches. Furthermore, the precise semantic distinction between JAIL and JBL requires additional clarification, as current evidence suggests their biological relevance may substantially overlap.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014).These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small molecule inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations. We have expanded our discussion on the distinction between JAIL and JBL and hope that this will clarify why – in our opinion – these terms should be used differentially in different cell biological contexts (see below and lines 348-374 in the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In Maggi et al., the authors investigated the mechanisms that regulate the dynamics of a specialized junctional structure called junction-based lamellipodia (JBL), which they have previously identified during multicellular vascular tube formation in the zebrafish. They identified the Arp2/3 complex to dynamically localize at expanding JBLs and showed that the chemical inhibition of Arp2/3 activity slowed junctional elongation. The authors therefore concluded that actin polymerization at JBLs pushes the distal junction forward to expand the JBL. They further revealed the accumulation of Myl9a/Myl9b (marker for MLC) at the junctional pole, at interjunctional regions, suggesting that contractile activity drives the merging of proximal and distal junctions. Indeed, chemical inhibition of ROCK activity decreased junctional mergence. With these new findings, the authors added new molecular and cellular details into the previously proposed clutch mechanism by proposing that Arp2/3-dependent actin polymerization provides pushing forces while actomyosin contractility drives the merging of proximal and distal junctions, explaining the oscillatory protrusive nature of JBLs.

      Strengths:

      The authors provide detailed analyses of endothelial cell-cell dynamics through time-lapse imaging of junctional and cytoskeletal components at subcellular resolution. The use of zebrafish as an animal model system is invaluable in identifying novel mechanisms that explain the organizing principles of how blood vessels are formed. The data is well presented, and the manuscript is easy to read.

      Weaknesses:

      While the data generally support the conclusions reached, some aspects can be strengthened. For the untrained eye, it is unclear where the proximal and distal junctions are in some images, and so it is difficult to follow their dynamics (especially in experiments where Cdh5 is used as the junctional marker). Images would benefit from clear annotation of the two junctions. All perturbation experiments were done using chemical inhibitors; this can be further supported by genetic perturbations.

      We have added annotations to several figures and paid particular attention to the proximal and distal junctions.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014). These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations.

      Reviewer #3 (Public review):

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      Yes, the green (non-converted) VE-cadherin can indeed originate from either of the two cells. The main point we want to make, based on our observations, is that the red (converted) VE-cadherin from the proximal junction (as defined by the ROI) does not contribute to the distal junction.

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      We have performed washout experiments and find that the ectopic filopodia disappear when the inhibitor is removed. This experiment is shown in supplementary Figure 3 and supplementary Movies 12 and 13.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      This is an interesting thought and we haven take a closer look. There is quite a bit of sample-to-sample variation in the ZO1 signal. The quantification (Author response image 1) indicates that there is no increase in the CK666 treated embryos on average.

      Author response image 1.

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      This line was used because it labels the entire JBL protrusion more clearly. We have also included an example using the VE-cad-Venus line (supplementary Figure 4b), which shows a Myl-Cherry pattern consistent with the other examples.

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

      We have added annotations and labels to all movies. We have also improved annotations in several figures (i.e. Figs. 1, 2, 5, 6 and 7)

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers are supportive of the manuscript but identify a number of areas where the clarity of the presented data could be improved, and further quantification could be provided to strengthen your conclusions. We would encourage you to address these minor concerns as best you can and to consider the recommendations of all three reviewers when deciding how to revise your manuscript.

      Reviewer #1 (Recommendations for the authors):

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      JBL are described as oscillating membrane protrusions emerging at endothelial junctions, operating in a ratchet-like manner to mediate convergent cell movements. This ratchet mechanism allows endothelial cells to approach each other, thereby aligning and joining local luminal segments into a continuous vascular structure. The study employs in vivo high-resolution time-lapse imaging, a technically demanding method that captures spatiotemporal dynamics of cytoskeletal and adhesion complexes during JBL activity with unprecedented detail.

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      An intriguing observation is that a novel junction arises at the distal pole of a JBL. This distal junction is formed from a pool of VE-cadherin that is spatially redistributed from regions outside the initial JBL domain. The distal junction then merges with the proximal junction through a process dependent on actomyosin contractility, as was judged by Myl9 recruitment.

      The alternation between pushing forces (Arp2/3-dependent JBL protrusion) and pulling forces (actomyosin-driven junction fusion) defines JBL as a bidirectional mechanical module. Inhibition of actomyosin prevents merging of proximal and distal junctions, thereby stalling lumen continuity. This two-phase system, actin-based extension followed by actomyosin-mediated constriction, ensures both elongation and maturation of endothelial arrangements, ultimately securing vascular patency.

      This manuscript represents a robust and thoughtfully executed study that advances our understanding of lumen formation during vascular development. The overarching conclusions are well substantiated, and the results section provides a clear and detailed exposition of the key findings. I appreciate the explanatory movie at the end. Nevertheless, I offer several remarks for further improvement:

      (1) The fluorescent images presented are visually compelling, yet lack quantitative analysis in the initial figure. Although quantification is included in Figure 3, it is advisable to incorporate this analysis into Figure 1 as well. Early presentation of quantification will help the reader to appreciate the impact and significance of the findings from the outset.

      We appreciate the reviewer’s suggestion and have now added line graphs to measure the spatiotemporal intensities of the Utrophin and ZO-1 reporters in Figure 1b. These measurements demonstrate the sequence of F-actin protrusion and subsequent junctional movement. In Figure 1a, we have added a double-headed arrow which shows the overall movement of the junction towards the dorsal side of the forming DLAV.

      (2) For the fluorescence images, further quantitative analysis of membrane overlap, either in terms of width or pixel overlap, would enhance the rigor of the study. Temporal quantification of overlap may provide valuable insights into the stability and reproducibility of the process across experimental replicates.

      JBL are quite heterogenous with respect to size, shape and dynamics, which makes quantifications of membrane overlap (JBL size) across experimental replicates difficult. We have published some quantifications on JBL orientation and oscillation in our previous paper (Paatero et al., 2018, Nat. comm. Figures 1 and 2), which are in agreement with our current study.

      (3) When referencing the role of Arp2/3, the authors employ an ArpC1b transgenic fish. The results section should thus specifically address the involvement of ArpC1b, rather than generalizing to Arp2/3. In the discussion, it would be appropriate to speculate on the potential involvement of the complete Arp2/3 complex. Notably, the use of CK is acknowledged as a broadly accepted inhibitor of actin polymerization.

      As ArpC1b is a subunit of an active Arp2/3 complex (Padrick et al., 2011), we have used an ArpC1b-Venus as a readout for Arp2/3 localization. The construct has been validated before in cell culture (Law et al., 2021) as well as in zebrafish (Malchow et al., 2024) and the spatiotemporal distribution of the reporter shown to be consistent with Arp2/3 complex. We are stating this in the results section (lines 173-178) and subsequently use the term Arp2/3 to facilitate reading of the text. In the corresponding figure legends, we are maintaining the term ArpC1b. CK666 interferes with the dimerization of Arp2 and Arp3 subunits and thus prevents activity of the Arp2/3 complex.

      (4) The discussion regarding JAIL versus JBL involvement remains challenging to interpret. If JAIL structures arise from the loss of cell-cell contacts, both JAIL and JBL resemble membrane protrusions and are likely governed by similar molecular mechanisms, predominantly actin polymerization and Arp2/3 activity, with probable contribution from Rac1 signaling. The precise semantic distinction between JAIL and JBL warrants further clarification, as their biological relevance may be overlapping.

      We agree with the reviewer. Below we outline the reasons why lamellipodial protrusions that emanate from cell-cell junctions should not be indiscriminately called JAIL, but that JAIL and JBL constitute different cellular activities acting in different tissue contexts. We have modified the text in the Discussion (lines 348-374).

      (1) JAIL have originally been described in cell culture experiments (Abu-Taha et al., 2014). According to this and subsequent papers by the same group, local dissolution of endothelial adherens junctions (i.e. downregulation of VE-cadherin) triggers the formation of lamellipodia-like structures. These protrusions eventually retract, followed by the reestablishment of EC junctions.

      (2) In our in vivo studies, we observed lamellipodial protrusions during endothelial cell rearrangements, and we call these structures JBL (Paatero et al., 2018). While JBL appear very similar to JAIL in general (i.e. regulation by Arp2/3 and its localization), we also observe two critical differences. For one, JBL form while maintaining the original (proximal) junction. Moreover, a distal junction is formed at the front edge of the JBL, leading to a “double junction” configuration. In our current manuscript, we have examined the role of actomyosin contractility and find that it correlates with and is required for the merging of proximal and distal junctions during JBL cycles. These observations indicates that the proximal and distal junctions are essential components of JBL function during endothelial cell elongation and rearrangements. These salient and distinct features prompted us to adopt the term junction-based-lamellipodia (JBL), in order to differentiate them from JAIL.

      (3) We like to argue that JAIL and JBL represent similar but different lamellipodia-like protrusions. JAILs are associated with the maintenance of endothelial integrity, and control permeability and trans-endothelial cell migration, as has been suggested by several publications (Cao et al., 2017; Kipcke et al., 2025; Seebach et al., 2021; Taha et al., 2014). In contrast, JBL drive cell rearrangements, by step-wise elongation of cell junctions leading to convergent cell movements.

      (4) Although JAIL have also been implicated in endothelial cell migration (Cao and Schnittler, 2019; Cao et al., 2017; Seebach et al., 2021), neither junctional patterns nor junctional dynamics have been analyzed in this context. We therefore propose that JAIL and JBL are actin-based protrusions forming at endothelial cell-cell junctions, but act in different contexts to provide cell motility (JBL) or endothelial integrity (JAIL).

      (5) Some of the quantification plots, specifically in figures 5d and 6c, do not display significant differences or distribution patterns. It would be beneficial to revise these graphs to clearly represent statistical significance and underlying data distributions.

      Because of the spatiotemporal heterogeneity, it is difficult to perform statistical quantifications across samples. In Figure 5c/d, we have imaged/analyzed myl9-EGFP in a mosaic situation, in which only one of interacting cells expresses high levels of myl9-EGFP. This is a rare situation and we managed to image only this example. Nevertheless, it is consistent with our other expression data of myl9-reporters and also with our previous photoconversion experiments using photoconvertible UCHD (Paatero et al., 2018, Figure 4), which shows that actin-rich JBL form at the front end of the endothelial cell in the direction of junction elongation. In Figure 5d, we have quantified the average intensity of GFP signal within the region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) The observation of myosin recruitment does not inherently imply a concomitant increase in actomyosin contractile activity. The inclusion of phospho-MLC staining would considerably strengthen the evidence for enhanced actomyosin activity.

      This is a good suggestion and we have extensively tried different anti-P-Myl antibodies (and protocols), but did not get them to work reliably on zebrafish embryos. We therefore rely on published work that has established the correlation between the recruitment of myosin light chain and increased actomyosin tension (Fernandez-Gonzalez et al., 2009; Munjal et al., 2015).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1a is not described/mentioned in the Results.

      The have corrected this (lines 102-108). We have also added measurements to better present the different dynamics of F-actin (UCHD) and ZO1 within the JBL and the relative endothelial cell movements (see Figure 1b), as suggested by reviewer#1.

      (2) In Figure 3a, the authors claim that Arp2/3 is deposited at the distal side of the junction ring. While it is clear where the proximal junction is (ZO1-rich), the distal junction is less so (hardly any ZO1). It is therefore difficult to agree based on this time-lapse imaging that Arpc1b-Venus is at the distal junction. Can the authors please include panels showing merged channels and annotate where the proximal and distal junctions are?

      The activation of the Arp2/3 complex and the formation of the distal junction are sequential events. We see that ArpC1b oscillates with an accumulation at the onset and during JBL protrusion. In contrast, the distal junction is formed when the protrusive activity has been stopped. One caveat of the analysis shown in Figure 3a is that our ZO1 reporters label the distal junction only very weakly – this is in particular the case for the ZO1-tdTomato knock-in. The distal junction is better visible in VE-cadherin and UCHD reporters, as shown in Figures 5 to 7.

      (3) In Figures 3b and c, it is also difficult to distinguish proximal and distal junctions in these images. Please mark the boundaries in the image panels (Figure 3b) and indicate on the x-axis where the proximal and distal junctions are (Figure 3c).

      In Figure 3b, we show ArpC1b-Venus and mRuby-UCHD side-by-side. This Figure demonstrates that the Arp2/3 complex maintains its position at the front of the JBL during the protrusive phase (always distal to the UCHD signal). The imaging is done at very short intervals (1/30sec), which makes it difficult to follow entire oscillations due to photo-bleaching of the ArpC1b reporter.

      (4) The treatment of CK666 resulted in perturbed localization of Arpc1b-Venus. Therefore, the inhibition of junctional elongation can also be explained by the mislocalization of Arp2/3, rather than the inhibition of Arp2/3 activity at the junctions. Can the authors discuss this or perform another experiment that is more specific to manipulating Arp2/3 activity?

      CK666 is a well-established inhibitor of Arp2/3. Structural and functional analyses have shown that CK666 interferes with the interaction between Arp2 and Arp3, thereby preventing the activation of the complex (Hetrick et al., 2013; Padrick et al., 2011). We therefore conclude that the phenotypes we observe in CK666 treatment are due to Arp2/3 inhibition.

      It is possible that CK666 prevents ArpC1b binding to the Arp2/3 complex. However, published work suggests that ArpC1b can bind to Arp2/3 also in its inactive state (Chou et al., 2022). Thus, we can only speculate why we lose localization ArpC1b under CK666. We prefer not to do so.

      (5) In Figures 5d and 6c, is the quantification of Myl9 intensity of one cell only? If so, can the authors show the dynamics of average Myl9 intensity i) between forwarding and non-forwarding JBL poles and ii) as the proximal and distal junctions merge several endothelial cells?

      Figure 5c/d depicts two interacting cells, expressing different levels of Myl9a-EGFP. This is a rare experimental situation and we managed to image only this example. We quantified the average signal at both poles of the junctional ring within a region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI. The analysis has been done on immunofluorescent images, therefore a dynamic analysis over time is not possible.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) Figure 5. The 'f' in the figure legend should be 'e' since there is no panel 'f'.

      We have corrected this.

      (7) Figure 7. As the boundaries for proximal and distal junctions are not always clear, especially when Cdh5 appears as clusters, how do you determine where the two junctions are in order to measure the interjunctional space? Please offer a clearer explanation in the Methods.

      We have added the following in the M&M. “Junctional merging tracking Speed of junctional merge was evaluated by monitoring isolated junctional rings during DLAV formation. Inhibitor treatment Y-27632 (75 μM) or DMSO (1%) were applied 30 min before mounting. The same concentrations of chemicals were applied to the low-melting-point agarose mounting medium and the E3 medium on top of it before imaging and imaging the junctions for 10-15 min on an Olympus SpinSR spinning disc microscope. Distances were measured using Fiji software. In each frame, the interjunctional distance was defined as the maximum distance between the proximal and distal junctions. A line was manually drawn between the proximal and distal junctions in Fiji, and its length was recorded. The same proximal and distal junction landmarks were used consistently across all time points.”

      (8) One would think that upon the inhibition of junctional mergence (by ROCK inhibition), actin polymerization would persist to push the distal junction forward to elongate the JBL. However, there is instead a decrease in junctional elongation (Figure 7b). Can the authors speculate why? Additionally, junction elongation can probably be achieved by continuous "pushing" of the distal junction alone (through actin polymerization). Can the authors speculate why there is a need/what is the benefit of merging proximal and distal junctions for junction elongation?

      These are all very interesting questions, but they are quite complex and would require extensive and speculative answers, which is outside the scope of this study. Nevertheless, here are a few quick thoughts on these issues.

      (1) When endothelial cells elongate, they have to overcome tensile forces at the junctions (generated by the subjunctional actomyosin belt). JBL are providing a tractive and deforming force, which overcomes the tensile force and thus promotes junctional elongation.

      (2) The distal junction is then providing an anchor to which the actin cytoskeleton can attach. The space between proximal and distal junction becomes a compartment of local actomyosin contraction, which provides the force for the ratchet to move the proximal junction forward  junctional mergence.

      (3) Thus, it is not the protrusion (pushing) itself that elongates the cell but the elongation of the junction (driven by actomyosin contraction)!

      (4) The maintenance of the proximal junction is most likely needed to ensure endothelial integrity during the JBL cycles.

      (5) How the frequency and the size of JBLs is regulated is not known. One possible player that might be involved is an internal clock mechanism (e.g. a feedback loop via small GTPases (such as Rac)  Arp2/3 regulation). Another possibility is that JBL size is limited by it sweeping up basally localized VE-cadherin (in cis-configuration). Increasing cell-cell adhesion (by VE-cad trans-interactions between the JBL and the underlying cell) eventually stop the protrusion. It is also possible that an cell-autonomously controlled mechanism of F-actin polymerization (actin pulses) are involved in the regulation of the JBC cycle length.

      (9) The animation showing the molecular mechanism of JBL function during endothelial junction elongation (Video 25) is very helpful in understanding the dynamic coupling between junctional proteins, actomyosin cytoskeleton, and junction remodelling. However, I wonder why there are no Myosin II proteins binding to the actin bundles during the merging of proximal and distal junctions (between 0:25 and 0:28), since this is one of the main findings by the authors in this study.

      Since we show two JBL cycles, we want to spread the information over both of them.

      References:

      Cao, J. and Schnittler, H. (2019). Putting VE-cadherin into JAIL for junction remodeling. J. Cell Sci. 132.

      Cao, J., Ehling, M., März, S., Seebach, J., Tarbashevich, K., Sixta, T., Pitulescu, M. E., Werner, A. C., Flach, B., Montanez, E., et al. (2017). Polarized actin and VE-cadherin dynamics regulate junctional remodelling and cell migration during sprouting angiogenesis. Nat. Commun. 8, 1–20.

      Chou, S. Z., Chatterjee, M. and Pollard, T. D. (2022). Mechanism of actin filament branch formation by Arp2/3 complex revealed by a high-resolution cryo-EM structure of the branch junction. Proc. Natl. Acad. Sci. U. S. A. 119, e2206722119.

      Fernandez-Gonzalez, R., Simoes, S. de M., Röper, J. C., Eaton, S. and Zallen, J. A. (2009). Myosin II Dynamics Are Regulated by Tension in Intercalating Cells. Dev. Cell 17, 736–743.

      Hetrick, B., Han, M. S., Helgeson, L. A. and Nolen, B. J. (2013). Small molecules CK-666 and CK-869 inhibit actin-related protein 2/3 complex by blocking an activating conformational change. Chem. Biol. 20, 701–712.

      Kipcke, J. P., Odenthal-Schnittler, M., Aldirawi, M., Franz, J., Bojovic, V., Seebach, J. and Schnittler, H. (2025). TNF-α induces VE-cadherin-dependent gap/JAIL cycling through an intermediate state essential for neutrophil transmigration. Front. Immunol. 16,.

      Law, A. L., Jalal, S., Pallett, T., Mosis, F., Guni, A., Brayford, S., Yolland, L., Marcotti, S., Levitt, J. A., Poland, S. P., et al. (2021). Nance-Horan Syndrome-like 1 protein negatively regulates Scar/WAVE-Arp2/3 activity and inhibits lamellipodia stability and cell migration. Nature Communications 2021 12:1 12, 5687-.

      Malchow, J., Eberlein, J., Li, W., Hogan, B. M., Okuda, K. S. and Helker, C. S. M. (2024). Neural progenitor-derived Apelin controls tip cell behavior and vascular patterning. Sci. Adv. 10, 1174.

      Munjal, A., Philippe, J. M., Munro, E. and Lecuit, T. (2015). A self-organized biomechanical network drives shape changes during tissue morphogenesis. Nature 524, 351–355.

      Paatero, I., Sauteur, L., Lee, M., Lagendijk, A. K., Heutschi, D., Wiesner, C., Guzmán, C., Bieli, D., Hogan, B. M., Affolter, M., et al. (2018). Junction-based lamellipodia drive endothelial cell rearrangements in vivo via a VE-cadherin-F-actin based oscillatory cell-cell interaction. Nat. Commun. 9,.

      Padrick, S. B., Doolittle, L. K., Brautigam, C. A., King, D. S. and Rosen, M. K. (2011). Arp2/3 complex is bound and activated by two WASP proteins. Proc. Natl. Acad. Sci. U. S. A. 108, E472–E479.

      Sauteur, L., Krudewig, A., Herwig, L., Ehrenfeuchter, N., Lenard, A., Affolter, M. and Belting, H. G. (2014). Cdh5/VE-cadherin promotes endothelial cell interface elongation via cortical actin polymerization during angiogenic sprouting. Cell Rep. 9, 504–513.

      Seebach, J., Klusmeier, N. and Schnittler, H. (2021). Autoregulatory “Multitasking” at Endothelial Cell Junctions by Junction-Associated Intermittent Lamellipodia Controls Barrier Properties. Front. Physiol. 11,.

      Taha, A. A., Taha, M., Seebach, J. and Schnittler, H. J. (2014). ARP2/3-mediated junction-associated lamellipodia control VE-cadherin-based cell junction dynamics and maintain monolayer integrity. Mol. Biol. Cell 25, 245–256.

    1. eLife Assessment

      This important study presents a technically rigorous and carefully controlled analysis of the signalling potential of cancer-associated gain-of-function Notch alleles. The work is clearly presented, and the experiments are robust, comprehensive, and well-controlled. While some data primarily establish the system or report negative findings, the comparative approach in a well-characterized model provides convincing mechanistic evidence for how these Notch variants function. This study will be of interest to researchers in both developmental and cancer biology.

    2. Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

      Comments on revised version:

      I reviewed the changes and response to criticism, and it seems to me that all has been reasonably addressed.

    3. Reviewer #3 (Public review):

      Summary:

      This manuscript by Shimizu et al., systematically analyzes cancer-associated mutations in the Negative Regulatory Region (NRR) of Drosophila Notch to reveal diverse regulatory mechanisms with implications for cancer modelling and therapy development. The study introduces cancer-associated mutations equivalent to human NOTCH1 mutations, covering a broad spectrum across the LNR and HD domains. By linking mutant-specific mechanistic diversity to differential signaling properties, the work directly informs targeted approaches for modulating Notch activity in cancer cells. These are an exciting set of observations from S2 cells, which should be taken up further for further assessment in any physiological implications.

      Strengths:

      This manuscript by Shimizu et al., systematically analyzes cancer-associated mutations in the Negative Regulatory Region (NRR) of Drosophila Notch to reveal diverse regulatory mechanisms with implications for cancer modelling and therapy development. The study introduces cancer-associated mutations equivalent to human NOTCH1 mutations, covering a broad spectrum across the LNR and HD domains. The authors use rigorous phenotypic assays to classify their functional outcomes. By leveraging the S2 cell-based assay platform, the work identifies mechanistic differences between mutations that disrupt the LNR-HD interface, core HD, and LNR surface domains, enhancing understanding of Notch regulation. The discovery that certain HD and LNR-HD interface mutations (e.g., R1626Q and E1705P) in Drosophila mirror the constitutive activation and synergy with PEST deletion seen in mammalian T-ALL is nice and provides a platform for future cancer modelling. Surface-exposed LNR-C mutations were shown to increase Notch protein stability and decrease turnover, suggesting a previously unappreciated regulatory layer distinct from canonical cleavage-exposure mechanisms. By linking mutant-specific mechanistic diversity to differential signaling properties, the work directly informs targeted approaches for modulating Notch activity in cancer cells.

      Weaknesses:

      This is an exciting set of observations, however the work is entirely cell line based, and is the primary weakness. I list my main specific concerns herewith:

      (1) The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo.

      (2) And perhaps for this reason too, some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for or species-specific differences such as the absence of S1 cleavage. Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models.

      (3) Also, while the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Thus even though the changes are evident in Notch signaling, any impact on blood cells or hematopoiesis leading to aberrant malignancies remains to be seen.

      (4) Which hematopoietic cell type, progenitor or differentiating cells, would be most sensitive to this kind of altered Notch signaling also remains unclear.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

      Reviewer #2 (Public review):

      Summary:

      This ambitious study introduced 22 mutations corresponding to amino acid substitution mutations known to induce cancer in human Notch1, located within the Negative Regulatory Region, into the Drosophila Notch gene. It comprehensively examined their effects on activity, intracellular transport, protein levels, and stability. The results revealed that the impact of amino acid substitutions within the Negative Regulatory Region can be grouped based on their location, differing between the Heterodimerization Domain and the Lin12/Notch Repeat. These findings provide important insights into elucidating the mechanisms by which amino acid substitution mutations in human Notch1 cause leukemia and cancer.

      Strengths:

      In this study, the authors successfully measured the activity of amino acid-substituted Notch with high precision by effectively leveraging the advantages of their previously established experimental system. Furthermore, they clearly demonstrated ligand-dependent and Deltex-dependent properties.

      Weaknesses:

      Amino acid substitution mutations exhibit interesting effects depending on their position, so interest naturally turns to the mechanisms generating these differences. Unfortunately, however, elucidating these mechanisms will require considerable time in the future. Therefore, it is reasonable to conclude that questions regarding the mechanism fall outside the scope of this paper.

      We thank the editors and reviewers for their initial reviews and constructive suggestions. We have revised the manuscript with some additional data contained in two additional supplementary figures and by the inclusion of additional text.

      Reviewer #3 (Public review):

      While this is indeed an exciting set of observations, the work is entirely cell-line-based, and is the primary reason why this approach dampens the enthusiasm for the study. The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo. Some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for, or species-specific differences such as the absence of S1 cleavage.

      Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models. While the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Overall, the work does not yet provide evidence for altered Notch signaling that is physiologically relevant.

      S2 cells are a standard cell culture model which have been extensively used for analysing Notch signalling mechanisms and by and large are found to recapitulate the mechanisms of Notch activation and its regulation in vivo. However, we agree that it will be desirable in future work to build on our current findings by generating Notch mutants in vivo in Drosophila as the in vivo context may introduce additional nuances in the behaviour of the mutants.This can be done by overexpressing cDNA constructs in particular tissues, or more physiologically by generating endogenous gene mutations using CRISPR/Cas9 based gene editing. However, the likely outcome of the latter approach is embryo lethality due to constitutive over-activation during development. Therefore, methods of genetic manipulation need to be applied which allow the final activating mutant form to be generated in somatic clones. We feel that this would be considerable amount of additional work and is out of scope for the current study, but we look forward to developing this approach in future work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (a) Table 1: Explain the rationale for mapping non-conserved residues between human and fly Notch; consider adding an alignment or supplementary figure.

      We have added a new Supplementary figure S2 showing an alignment of Notch sequences from different species to indicate the degree of conservation at the sites chosen for our mutagenesis study. Some locations were highly conserved and some locations less so. Both conserved and non-conserved residues were included to examine how structural perturbations at equivalent positions affect signalling activity, independent of sequence conservation. In addition to the new supplementary figure, we have changed the text in the Table 1 legend to clarify.

      (b) Add or discuss data connecting LNR and HD mutant expression levels with stability and degradation mechanisms.

      We have added additional text in the results section referring to Fig6A/B regarding the varying Notch protein levels between the different mutants. With regard to the slower degradation kinetics of certain LNR-C mutants in Fig6 E/F, we have also added a new supplementary figure S3 which shows that mutants from the LNR/HD interface do not behave similarly to the LNR-C mutants with respect to their degradation kinetics.

      (c) Some mutants, especially those retained in the secretory pathway, are insufficiently characterized. The mechanism underlying their differential trafficking and stability remains underexplored.

      We have added some extra text to the discussion section which explores the issue of secretory pathway retention of HD mutants in Drosophila cells further.

      (2) Figure Legends:

      (a) Figure 1A - Explain the ribbon vs. space-filling representation and color coding; include a definition of the Heterodimerization Domain.

      We have added extra text to the Figure 1A legend

      (b) Figure 2E - Clarify mutant selection; if possible, include additional examples for consistency.

      We added extra text regarding selection of mutants for study into the legend of Figure 2

      (c) Figure 3-4 - Explain logic for alanine substitutions; discuss difference at residue 1570 (P vs. A).

      We added the following text to the result section. “Y1532 and Y1535 are not mutated in human cancers and therefore could not be assessed through patient-derived variants. Alanine substitution provides a controlled way to probe their contribution to NRR integrity and activation sensitivity by selectively removing their side-chain interactions while preserving overall fold.” We added extra text in the discussion section regarding the differences in the outcomes of the 1570 to A and P mutations.

      (d) Figure 4 - Improve resolution and legibility.

      We have replaced figure 4.

      (e) Figure 6C - Correct residue numbering (1563, 1566).

      Thank you for spotting this. This has been corrected.

      (f) Figure 6F - Include control where protein levels do not increase.

      A new supplementary figure S3 has been added which included this control data.

      (3) Contextual and Conceptual Framing:

      (a) Incorporate the limitations of the S2 system, and delineate which mechanistic insights are likely conserved versus those that may be species- or context-specific.

      We have incorporated text to discuss S2 cell limitations.

      (b) The study does not test functional consequences in hematopoietic or developmental contexts. Expand the discussion to emphasize how these cell-based findings could inform future in vivo studies or mammalian cancer modeling.

    1. eLife Assessment

      This manuscript offers valuable structural and mechanistic insights into the assembly of the Type II internal ribosome entry site (IRES) from encephalomyocarditis virus (EMCV) and the translation initiation complex, revealing a direct interaction between the IRES and the 40S ribosomal subunit. A solid experimental strategy, combining cryo-EM analysis, complementary biochemistry, and detailed structural comparisons, provides mechanistic insights into IRES-based translation initiation systems. This paper will attract researchers in cap-independent translation, host-pathogen interactions, and virology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      Comments on revised version:

      I have gone through the revised manuscript by Das and Hussain along with the rebuttal comments. While the poor resolution of the ribosomal complex limits detailed analysis of the molecular interactions, addition of the luciferase reporter assay in the supplementary has enriched the paper.

    3. Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      Comments on revised version:

      Thanks to the authors for providing thorough responses to the reviewer questions and comments. I appreciate their attempts of improving overall resolution of the complex via various processing strategies that the reviewers suggested.

      The authors interpretations of their cryo-EM data match those reported by Bhattacharjee et al. 2025 (EMCV-IRES 48S) and can be contextualized in the light of Velazquez et al. 2025 (poliovirus IRES-48S).

      The authors' contextualization of their results with previously published studies (Discussion section lines 355-402) is satisfactory to me but can be improved.

    4. Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Comments on revised version:

      The revised manuscript does not improve the resolution; however, the authors provide a detailed and well-reasoned rationale that directly addresses the concerns I raised about their structural interpretation. In addition, two independent preprints have been released since the initial submission. In one case, the authors report a higher-resolution, and importantly, all three studies present consistent assignments and interpretations. Together, these observations strengthen confidence in the authors' conclusions. I therefore do not have major concerns regarding the publication of this revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript offers valuable structural and mechanistic insights into the structure and assembly of the Type II internal ribosome entry site (IRES) from encephalomyocarditis virus (EMCV) and the translation initiation complex, revealing a direct interaction between the IRES and the 40S ribosomal subunit. While a solid cryo-EM method was used, enhancing the overall resolution or adding complementary biochemical data would further improve the clarity and impact of this study. This manuscript will attract researchers in cap-independent translation, host-pathogen interactions, and virology.

      We thank the editorial team for a favourable assessment and for mentioning our work as ‘valuable’. In the following sections, we have addressed the weaknesses and recommendations pointed out by the Reviewers and hope for an improvement in the description of this work.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the Type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is well-justified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the low resolution, we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides.

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree about the lack of additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. Previously, multiple studies have reported on the importance of GNRA and RAAA loops and we have cited them in the manuscript. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown for the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed a significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In this work, we observe that the GCGA loop interacts with tRNA<sub>i</sub> in the EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex.

      However, to address this concern in the revised manuscript we mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports demonstrating the importance of these regions for overall IRES activity.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      (a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing map-sharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We have included the image for rejected 2D classes (Author response image 1). We agree with the Reviewer’s query related to the huge number of micrographs and relatively smaller number of particles for the final reconstruction. Since the total number of micrographs (22000) is the summation of multiple datasets, prepared and collected at different times, the distribution of the particles per micrograph was not uniform in all sessions, ranging from good to poor. Among these, around 8000 micrographs have poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous across the compiled dataset, and only 237054 ribosomal particles were obtained after multiple rounds of 2D and 3D classification. Further, the final reconstruction was performed using particles obtained after masked classification for IRES and ternary complex density. Only the particles that show the best density for both IRES and ternary complex are used for this map. Another set of particles that have only a portion of IRES and NO density for ternary complex forms another map. And we have a third map with an empty 40S.

      We thank the reviewer for the suggestions to improve the quality of the maps further. As suggested, we started with the processing of the data. However, during this process the common computational cluster that were using for this data processing had to be physically relocated, and unfortunately after the relocation we faced technical issues in accessing and continuing with the processing. Several attempts to resolve the issue with the help of IT team failed. Thus, we lost 3-4 months without any progress. Therefore, we used Relion on our in-house workstation to process the data files from the start, as our in-house computational resources are unequipped to run cryoSPARC processes (for large dataset due to memory limitations).

      We reprocessed the datasets in Relion5 and did ‘Bayesian Processing’, for reference-based beam-induced motion correction per-particle. Post-processing, we used cryoSPARC to merge the particles and tried classifying the good ribosome particles using focus-based masked classification, as shown in Supplementary Figure 1.1. However, this processing did not improve the resolution, as Map B (containing 40S, tRNA, IRES) had an overall resolution of 4.8 Å (Author response image 2). Therefore, we would like to report the same maps as given in the initial submission.

      We estimated the time to redo the entire processing using cryoSPARC on the common computational cluster, and it would take us another 3-4 months or more and we do not anticipate a massive improvement in the extra density.

      Author response image 1.

      The selected 2D classes and the rejected 2D classes from initial round of classification, and the final selected 2D classes, which were subjected to Ab-initio reconstruction to get the good ribosome particles.

      Author response image 2.

      Reprocessing of the entire dataset using Relion5 for polishing of selected particles, followed by 3D classification and refinements in cryoSPARC.

      (b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES using Alphafold3; however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence (Author response image 3), and a few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992.

      Author response image 3.

      Prediction of tertiary structure of EMCV IRES (280-905 nucleotides) and zoomed features for each domain present in the IRES. The predicted aligned error plot for the RNA structure is shown.

      We used individual domains of EMCV IRES and predicted the tertiary structure, independent of other IRES domain using Alphafold3. As a result, the confidence scores improved, and the tertiary structures also correlated with the experimentally determined EMCV IRES secondary structure (Duke et al 1992; Maloney and Joseph, 2024). Although the overall tertiary structure of EMCV IRES is lacking, recent studies were able to solve the structures of EMCV IRES domains in complex with their respective binding proteins. We superimposed the independently predicted domains D, E, and F tertiary structure on the NMR ensemble of IRES domain D to F with PTB1 (Dorn et al 2023), where the predicted domains fit in the experimental model. Similarly, we used the cryo-EM structure of domain J-K-eIF4G-eIF4A (Imai et al 2023) and found a close fit with the predicted structures. The analysis highlighted that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting. This analysis is now added in the revised manuscript in Supplementary figure- 3.2.

      Furthermore, 3D structural models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3, Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our domain I apex model (EMCV IRES).

      (c) Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5, might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The Aichi viral IRES harbours a GNRA loop in its longest domain, that is, domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region have elevated the IRES activity or remained unaltered (Yu et al 2011). We have hypothesized that these IRESs might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in Aichi virus- type 5) based on the location and architecture to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop where it can potentially mediate long-range interactions with tRNA<sub>i</sub>, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs have the placement of this GNRA motif-containing domain before the eIF4G-binding domain. Thus, we suggest the possibility of adoption of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC. During the revision of this work a preprint reported the structure of polioviral IRES-48S PIC where domain IV apex (similar to domain I apex in EMCV IRES) interacts with uS13 and uS19, and the GNRA loop directly interacts with tRNA<sub>i</sub> during start codon recognition (Velazquez et al 2025). We hypothesize that Aichiviral IRES might use this motif to mediate long-range interactions with tRNA<sub>i</sub>, similar to type 1 and type 2 IRESs, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC.

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down the IRES-bound ribosomal complex.

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. We refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss loops, RNA stretches or motifs that could be inferred with more confidence in the IRES density as shown in Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details of this interaction were unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of a portion of domain I in the cryo-EM map. This is also reflected in the SHAPE data (Chamond et al 2014-Supplementary figures 2, and 8), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Maloney and Joseph, 2024- Figure 4b).

      Furthermore, this decrease in SHAPE reactivity pattern is evident for FMDV IRES domain 3 apex (similar to domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018). Thus, these studies are consistent with the placement of IRES model in the cryo-EM map. Moreover, we performed structural analysis (mentioned above) which showed that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting (Supplementary figure- 3.2).

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the initial submission. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We have mentioned this in the revised manuscript as we had NOT mutated AUG-826 (Pg 8).

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and due to the low-resolution we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides. The interaction of domain I apical region with uS13, uS19, and tRNA<sub>i</sub> is also observed the high-resolution structure of reconstituted EMCV IRES-48S PIC that was reported in a preprint while our work was under peer review process (Bhattacharjee et al 2025). Thus, the reconstituted EMCV IRES-48S PIC (Bhattacharjee et al 2025) also supports our assignment of domain I and its conserved loops, interacting with ribosome and tRNA<sub>i</sub>.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We tried to improve the resolution, but the improvement was not sufficient to resolve the IRES at the nucleotide level. Independently, another group has reported the same findings at a higher resolution while our work was under peer review process (Bhattacharjee et al 2025), which corroborates our structural data on EMCV IRES and its interaction with ribosome and tRNA<sub>i</sub> in its 48S PIC stage. Further, in the revised manuscript we also present biochemical validation for GNRA and RAAA loops in EMCV IRES. We mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports (Roberts and Belsham, 1997; López de Quinto and Martínez-Salas, 1997; Robertson et al 1999) demonstrating the importance of these regions for overall IRES activity.

      Reviewing Editor Comments:

      The reviewers' comments are appended. While the reviewers acknowledge the complexity associated with this system, they also raised concerns about the modeling of RNA and registering its sequence in low-resolution maps. We believe that the strength of evidence and overall impact of your study can be elevated by providing higher-resolution cryo-EM data or complementary biochemical studies and addressing reviewers' concerns.

      Reviewer #2 (Recommendations for the authors):

      (1) Science:

      Have the authors tried a focused refinement (local refinement in cryoSPARC) using a generous mask that encloses the head and the IRES but excludes the ternary complex and the body of the 40S? This can be done with all the particles in map B (~55K) and has the possibility of improving the resolution of domain I which can be subsequently used to build a better model of the IRES. See the middle right panel, light yellow colored mask in Figure 1A in PMID 37659578 for the type of mask being suggested.

      We did another round of 2D classification to eliminate any residual junk in the ~55k particle set, corresponding to Map B. Post classification, 49439 particles were selected and refined using non-uniform refinement to get Map B11. The overall resolution of Map B11 was 4.6 Å. Thereafter, we made a mask around the 40S head-IRES-tRNA on Map B11 and subjected the class for local refinement. The overall local resolution in the masked region improved to 4.5 Å (Author response image 4).

      Author response image 4.

      Data processing- Map B particles were 2D classified, and further junk was cleared as rejected particles. The selected particles were refined using non-uniform refinement to get Map B11, and later, a focused mask circling the head-tRNA-IRES region was used for local refinement in the region to yield map B111.

      We estimated the local resolution across the focused region in Map B111 and compared this with that of Map B (Author response image 5). The local refinement shows minor improvement in the local resolution in this region, and is not sufficient to resolve the IRES density at the level of nucleotides.

      Author response image 5.

      Comparison of local resolution across head-IRES-tRNA in map B1 (as reported in the manuscript) and Map B111.

      (2) Presentation:

      (a) Please use the previously established convention of naming the domains: "domain I", "domain H", etc, instead of "I domain" or "J-K domain" while describing parts of the IRES.

      We have made the changes as per the established convention.

      (b) Figure 2B reports a 6.9 A distance vs. 7 A in the text. Please use ~ or approximately to keep numbers consistent.

      We have used ~ symbol to suggest the approximate distance.

      (c) References missing on page 15 when referring to "previously determined HCV and CrPV structures".

      We have added the references (Pg 12).

      (d) Please edit the text for typos and sentence structure.

      The typos and sentence structure were corrected wherever necessary.

      (e) Some phrases and sentences (e.g. last few sentences of the first paragraph in the discussion) could be rewritten for clarity.

      Previous sentence- “The domain I of EMCV IRES is similar to domain IV of polioviral IRES (or other type 1 IRESs such as Coxsackie viral IRES) in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C), therefore, anticipating a similar interaction with tRNA<sub>i</sub>, highlighting a sequestering tendency by competing with cellular mRNAs.”

      Rephrased sentence- “Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The domain I of EMCV IRES is similar to domain IV of polioviral IRES or other type 1 IRESs in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C). Therefore, we anticipate a similar interaction of domain IV (in type 1 IRES class) with tRNA<sub>i</sub>. Also, this interaction of IRES with tRNA<sub>i</sub> could be a strategy by which these IRESs can sequester the tRNA<sub>i</sub> pool in the cell, rendering them unavailable for capped cellular mRNAs.”

      Reviewer #3 (Recommendations for the authors):

      (1) For the revision process, the authors provided three atomic models alongside their corresponding cryo-EM density maps, including a 48S complex in closed conformation. Given this conformation, it is reasonable to interpret the structure as representing a post-start codon recognition state (late-stage initiation). However, this reviewer finds that the local resolution within the mRNA channel is insufficient to support the atomic model building as presented. The density does not allow for an unambiguous assignment of nucleotides in this region; the authors should either improve the local resolution or remove the modeled mRNA from the structure.

      We understand the concern of the Reviewer. Although the mRNA density in the channel is poor, we modelled the mRNA with AUG-834 at the P site because the known biology of EMCV IRES. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is very likely that AUG-834 is placed at the P site.

      (2) As noted by the authors, the start codon in the EMCV IRES is positioned within a strong Kozak sequence. The nucleotide at position -3 is known to interact with eIF2α, yet, in the current model, A831 is positioned such that physical contact with eIF2α would be structurally impossible. This discrepancy raises concerns about the accuracy of the modeled eIF2α, which, like other regions of the structure, is not clearly supported by the cryo-EM density. The authors should revise the atomic model of eIF2α to ensure it is consistent with the experimental map and established molecular interactions.

      In our analysis of EMCV IRES-48S PIC, we could observe eIF2α and eIF2γ in Map B and B1. However, the local resolution was low to model the entire protein with side-chains (Supplementary figure 1.2 A). So, we used rigid body fitting of eIF2α and eIF2γ (Author response image 6). From the model, we could trace the backbone of Arg55, however could not resolve the side chain. Similarly, the mRNA in the channel was modelled based on placement of AUG-834 at the P site for EMCV IRES, which enabled us to model the flanking residues, rather than at the nucleotide-level resolution. We anticipate that a higher resolution structure will be able to capture this interaction of eIF2α with mRNA nucleotide (-3), therefore refrained from commenting on this interaction in the manuscript. In the revised manuscript, we have removed the side chains of eIF2α and eIF2γ, and kept the Cα-backbone only. The map-model statistics of map B1 is updated in table 1.

      Author response image 6.

      (left) Fitting of eIF2α model in the map. (right) Fitting of Cα backbone of eIF2α and mRNA in the map.

      (3) The authors observed additional density interacting with ribosomal proteins uS19 and uS13, and tRNA, which they tentatively assign to domain I of the IRES. Although the local resolution in this region does not allow an unambiguous assignment, the interpretation is reasonable. However, further structural and functional validation is necessary to support this assignment. The authors should improve the local resolution, either by performing focused refinement or by increasing the number of particles used in the reconstruction.

      The assignment of the extra density to domain I of the IRES was based on the architecture of the density. This density allows no other IRES domain to fit in this region (Supplementary figure 3.2). We tried to improve the local resolution using focused refinement, but the resolution was insufficient to resolve the IRES at the nucleotide level. Please see the above-mentioned comments in this regard on Pg 12.

      (4) Figure 5 shows a slight shift in the position of the ternary complex. Is the observed tRNA conformation compatible with the structural rearrangements required for 60S subunit joining?

      During the transition of 48S PIC to 80S elongation-competent complex, there are major changes in the conformation of tRNA<sub>i</sub>, due to the joining of eIF5B, and release of eIF2 (Petrychenko et al 2024). This joining event of eIF5B positions the tRNA<sub>i</sub> elbow and acceptor stem towards the 40S body to aid 60S ribosomal subunit joining (Petrychenko et al 2024). However, in the context of EMCV IRES-48S PIC, we observed that the position of tRNA<sub>i</sub> elbow and acceptor stem is towards the 40S head, and away from the body. On superimposing the human 48S PIC structure (before 60S joining), 48S-5 (PDB Id- 8PJ5- Petrychenko et al 2024), we note that tRNA<sub>i</sub> in EMCV IRES-48S PIC is away from the canonical tRNA<sub>i</sub> position (in contact with eIF5B). Therefore, we anticipate a change in tRNA<sub>i</sub> conformation during eIF5B joining and eIF2 release. This hypothesis coincides with the fact that the IRES interacting with the tRNA<sub>i</sub> elbow needs to be displaced from the position to facilitate the interaction of tRNA<sub>i</sub> with eIF5B. Moreover, this rearrangement would also aid in 60S joining and prevent any clash with the IRES domain I. We have added this in Results selection 5 and Figure 5D.

      (5) In the discussion section, the authors state: "eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in the case of HCV IRES (Hashem et al, 2013)." This statement is highly speculative. Is there any experimental or structural evidence to support this proposed mechanism in the context of EMCV IRES?

      Previous biochemical reports on the eIF3-eIF4G interaction suggested that eIF4G residues from 1011-1104 interact with eIF3 (Villa et al 2013). In the context of EMCV IRES, this region of eIF4G is not required to form 48S PIC on the IRES, suggesting the eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation. However, the recent structure of the human canonical 48S PIC has shown that the eIF4G-HEAT1 domain can interact with eIF3 subunits c, h, and l, and that eIF4G-bound eIF4A can interact with 40S ribosomal protein eS7, thus mediating the interaction between eIF4-bound mRNA and the 43S PIC (Brito Querido et al 2024) but the known eIF3-binding region in eIF4G was not captured in the map. Although the canonical eIF3-eIF4G interaction is essential in the case of cap-dependent initiation, this interaction could be dispensable for 48S PIC formation on EMCV IRES. In case of HCV IRES-mediated initiation, eIF3 is displaced from its canonical position that facilitates the binding of HCV IRES to 40S ribosomal subunit (Hashem et al 2013). We did not see any density corresponding to eIF3 in the obtained maps. Further, we have used focused classification using a mask on the canonical eIF3 position; however, we do not see any density corresponding to eIF3 in the EMCV IRES-48S PIC complex. Therefore, we hypothesized the possibility that eIF3 might be dislodged from its canonical binding site on the 40S ribosomal subunit. However, as per the recent independent report on EMCV IRES-48S PIC, eIF3 is present in the complex (Bhattarcharjee et al 2025).

      Hence, we have rephrased the existing sentence- “However, eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in case of HCV IRES (Hashem et al 2013).”

      Rephrased sentence- “However, the canonical eIF3-eIF4G interaction (Villa et al 2013) is dispensable for EMCV IRES-48S PIC formation (Lomakin et al 2000; Sweeney et al 2014), and we do not see any density for eIF3 even after focused classification. However, as per the recent independent report on reconstituted EMCV IRES-48S PIC, eIF3 is present in the complex at the canonical position (Bhattarcharjee et al 2025). This position of eIF3 further highlights the possibility that eIF4G-eIF4A proteins are also placed similarly to the canonical eIF3-eIF4G-eIF4A position (Brito Querido et al 2024) in context to EMCV IRES-48S PIC. Thus, placing eIF4G-domain J-K close to ES6 of 40S ribosome, which coincides with the previous hydroxyl radical cleavage assay (Yu et al 2011).”

      (6) eIF4A has been shown to directly interact with eIF3 and facilitate recruitment of the 43S PIC. Does the interaction of the J-K domain with eIF4G/eIF4A, compatible with the known eIF4A-eIF3 interaction within the 43S PIC? In other words, during EMCV IRES-mediated initiation, could the eIF4A-eIF3 interaction functionally substitute for the eIF4G-eIF3 interaction?

      Reports on EMCV IRES-mediated translation initiation have shown eIF4G as an essential component of 48S PIC formation (Pestova et al 1996; Lomakin et al 2000; Kolupaeva et al 2003; Sweeney et al 2014), where eIF4G directly interacts with domain J-K of IRES and eIF4A, thus enabling loading of eIF4A on the IRES. In our study, the cryo-EM map of EMCV IRES-48S PIC lacks density for eIF3 and eIF4 proteins, and locating eIF4F is challenging due to the inherent flexibility associated with the complex. Previous studies on EMCV IRES-48S PIC have mapped the location of eIF4G close to ES6 towards the platform side of the body and eIF3 using the hydroxyl radical cleavage assay (Yu et al 2011). The human 48S initiation complex structures have shown a similar location for eIF4G, which is at the mRNA exit site, contacting eIF3 (Brito Querido et al 2020; Brito Querido et al 2024). On overlapping the 18S rRNA of EMCV IRES-48S PIC to that of the human 48S PIC in closed conformation (PDB Id- 8OZ0), and further superimposing the J-K-St- eIF4G- eIF4A (PDB Id- 8HUJ) on human 48S PIC (PDB Id- 8OZ0) with respect to HEAT1 of eIF4G, the domain J-K becomes positioned at the subunit face of 40S body, close to ES6 (Author response image 7). This correlates with the previously reported position for eIF4G with respect to EMCV IRES-48S PIC (Yu et al 2011). The predicted model shows no clashes with the canonical eIF4A-eIF3/ eIF4G-eIF4A-eIF3 interaction, or with the domain J-K-eIF4G-eIF4A model. Thus, highlighting a possibly compatible interaction axis among eIF3-eIF4G-eIF4A-domain J-K of IRES.

      Author response image 7.

      (upper left) Location of eIF4G-eIF4A in canonical human 48S PIC (PDB Id- 8OZ0). (upper right) Superimposition of 18S rRNA from human 48S and EMCV IRES 48S. (lower left) Superimposition of Human Closed 48S PIC structure (PDB Id- 8OZ0) on EMCV IRES-48S PIC model and placement of EMCV IRES- J-K domain-HEAT1-eIF4A structure (PDB Id- 8HUJ) with respect to eIF4G-HEAT1 domain. (lower right) Predicting location of eIF3 and eIF4 proteins in EMCV IRES-48S PIC.

      (7) Assuming that the additional density near the ternary complex corresponds to Domain I of the IRES and that the codon in the P site represents the EMCV AUG start codon, what is the authors' mechanistic model for EMCV IRES-mediated initiation? Specifically, how is the mRNA positioned or inserted into the 40S mRNA channel in the absence of canonical scanning? As it stands, the discussion does not sufficiently address this key aspect of the EMCV initiation mechanism.

      The EMCV IRES start codon (A-834) is directly placed in the P site (Pestova et al 1996), and the captured complex harboured the initiator tRNA in P<sub>IN</sub> state with AUG at the P site. This start codon is preceded by domains J-K-L, where the J-K domain interacts with eIF4 proteins via eIF4G1-HEAT1 domain, and L domain is 20 residues upstream of the AUG and known to interact with eIF4B (Pestova et al 1996; de Quinto et al 2001). Based on the position and binding partners for these domains, the domain L could be placed at the mRNA exit site, preceded by domain J-K, which could be placed close to eIF4G-eIF4A position on EMCV IRES 48S PIC, near expansion segment 6 (ES6). The domain J-K can interact with eIF4G, localized close to the left foot or ES6 as per previous biochemical experiments (Yu et al 2011). This suggests that position of eIF4G and eIF4A could be the same as that of cap-dependent initiation where it can interact with eIF3 core subunits as well as the IRES domain J-K and the predicted path of mRNA from the exit site can follow the path of mRNA in human closed 48S PIC (PDB Id- 8OZ0), where it interacts with eIF3 core.

      Examining the path of RNA in channel from the G-825 (exit site) to C-785 (domain J-K), we found the shortest distance is ~ 173 Å. This bridge could be filled by a single-stranded stretch of 40 nucleotides. However, the presence of domain L (stem loop- residues- 782 to 810) might hinder the placement of A-834 in the P-site (Author response image 8). We anticipate that to accommodate the start codon at the P site, either the domain L stem loop is resolved, which is an energetically expensive process (free energy of the thermodynamic ensemble is -11.12 kcal/mol, predicted using RNAfold). Another way could be a change in the orientation or conformation of domain J-K such that the start codon is directly placed at the P site without resolving domain L.

      Author response image 8.

      (left) The shortest distance between the last fitted residue- 825th of EMCV IRES to 785th of J-K domain of IRES (keeping eIF4G position same as that of PDB Id- 8OZ0) is 173 Å. (right) Tracing the path of mRNA (red) upstream of AUG coming out of the exit site of 40S ribosome and the possible position of eIF4G on EMCV IRES-48S PIC. Addition of nucleotides between C-785 and G-825 would fill the gap. The route of predicted mRNA from the exit channel is based on the mRNA (green) exiting the channel (PDB Id- 8OZ0).

      The domain I is followed by domain J-K, close to the left foot of the 40S ribosomal subunit as per previous biochemical experiments (Yu et al 2011). However, the minimum distance connecting the I domain at 601st nucleotide to 682nd nucleotide of domain J-K (at the predicted location) is ~300 Å, which might be difficult to be covered by 80 nucleotides (from 601 to 682), present as a double helical strand. We suppose there could be instances of J-K domain repositioning in the EMCV IRES-48S PIC such that the I domain apical region can contact the 40S head and simultaneously place the start codon at the P site (Author response image 9).

      Author response image 9.

      Rotated views of EMCV IRES domains- I apical part in contact with 40S head and tRNAi and predicted location of J-K domain in contact with eIF4G, close to the left foot of 40S (predicted from PDB Id- 8OZ0). The minimum distance connecting 601st nucleotide in I domain to 682nd nucleotide in J-K domain is 295.5 Å.

      We lack any details on the other IRES domains, such as domain I lower stem, domain J-K, or L; therefore, we refrained from commenting on these in our manuscript.

      (8) Supplementary Figure 1 is missing labels for the RNA ladders.

      The size of the DNA ladder used is mentioned.

      References:

      Bhattacharjee S, Abaeva IS, Brown ZP, Arhab Y, Fallah H, Hellen CUT, Frank J, Pestova TV. The mechanism of ribosomal recruitment during translation initiation on Type 2 IRESs. bioRxiv [Preprint]. 2025 Jun 11:2025.06.11.659010. doi: 10.1101/2025.06.11.659010. PMID: 40568087; PMCID: PMC12191231.

      Brito Querido J, Sokabe M, Díaz-López I, Gordiyenko Y, Fraser CS, Ramakrishnan V. The structure of a human translation initiation complex reveals two independent roles for the helicase eIF4A. Nat Struct Mol Biol. 2024 Mar;31(3):455-464. doi: 10.1038/s41594-023-01196-0. Epub 2024 Jan 29. PMID: 38287194; PMCID: PMC10948362.

      Brito Querido J, Sokabe M, Kraatz S, Gordiyenko Y, Skehel JM, Fraser CS, Ramakrishnan V. Structure of a human 48S translational initiation complex. Science. 2020 Sep 4;369(6508):1220-1227. doi: 10.1126/science.aba4904. PMID: 32883864; PMCID: PMC7116333.

      Chamond N, Deforges J, Ulryck N, Sargueil B. 40S recruitment in the absence of eIF4G/4A by EMCV IRES refines the model for translation initiation on the archetype of Type II IRESs. Nucleic Acids Res. 2014;42(16):10373-84. doi: 10.1093/nar/gku720. Epub 2014 Aug 26. PMID: 25159618; PMCID: PMC4176346.

      Dorn G, Gmeiner C, de Vries T, Dedic E, Novakovic M, Damberger FF, Maris C, Finol E, Sarnowski CP, Kohlbrecher J, Welsh TJ, Bolisetty S, Mezzenga R, Aebersold R, Leitner A, Yulikov M, Jeschke G, Allain FH. Integrative solution structure of PTBP1-IRES complex reveals strong compaction and ordering with residual conformational flexibility. Nat Commun. 2023 Oct 13;14(1):6429. doi: 10.1038/s41467-023-42012-z. PMID: 37833274; PMCID: PMC10576089.

      Duke GM, Hoffman MA, Palmenberg AC. Sequence and structural elements that contribute to efficient encephalomyocarditis virus RNA translation. J Virol. 1992 Mar;66(3):1602-9. doi: 10.1128/JVI.66.3.1602-1609.1992. PMID: 1310768; PMCID: PMC240893.

      Fernández N, Fernandez-Miragall O, Ramajo J, García-Sacristán A, Bellora N, Eyras E, Briones C, Martínez-Salas E. Structural basis for the biological relevance of the invariant apical stem in IRES-mediated translation. Nucleic Acids Res. 2011 Oct;39(19):8572-85. doi: 10.1093/nar/gkr560. Epub 2011 Jul 8. PMID: 21742761; PMCID: PMC3201876.

      Hashem Y, des Georges A, Dhote V, Langlois R, Liao HY, Grassucci RA, Pestova TV, Hellen CU, Frank J. Hepatitis-C-virus-like internal ribosome entry sites displace eIF3 to gain access to the 40S subunit. Nature. 2013 Nov 28;503(7477):539-43. doi: 10.1038/nature12658. Epub 2013 Nov 3. PMID: 24185006; PMCID: PMC4106463.

      Imai S, Suzuki H, Fujiyoshi Y, Shimada I. Dynamically regulated two-site interaction of viral RNA to capture host translation initiation factor. Nat Commun. 2023 Aug 28;14(1):4977. doi: 10.1038/s41467-023-40582-6. PMID: 37640715; PMCID: PMC10462655.

      Kim H, Kim K, Kwon T, Kim DW, Kim SS, Kim YJ. Secondary structure conservation of the stem-loop IV sub-domain of internal ribosomal entry sites in human rhinovirus clinical isolates. Int J Infect Dis. 2015 Dec;41:21-8. doi: 10.1016/j.ijid.2015.10.015. Epub 2015 Oct 27. PMID: 26518063.

      Lomakin IB, Hellen CU, Pestova TV. Physical association of eukaryotic initiation factor 4G (eIF4G) with eIF4A strongly enhances binding of eIF4G to the internal ribosomal entry site of encephalomyocarditis virus and is required for internal initiation of translation. Mol Cell Biol. 2000 Aug;20(16):6019-29. doi: 10.1128/mcb.20.16.6019-6029.2000. PMID: 10913184; PMCID: PMC86078.

      López de Quinto S, Martínez-Salas E. Conserved structural motifs located in distal loops of aphthovirus internal ribosome entry site domain 3 are required for internal initiation of translation. J Virol. 1997 May;71(5):4171-5. doi: 10.1128/JVI.71.5.4171-4175.1997. PMID: 9094703; PMCID: PMC191578.

      Lozano G, Francisco-Velilla R, Martinez-Salas E. Ribosome-dependent conformational flexibility changes and RNA dynamics of IRES domains revealed by differential SHAPE. Sci Rep. 2018 Apr 3;8(1):5545. doi: 10.1038/s41598-018-23845-x. PMID: 29615727; PMCID: PMC5882922.

      Maloney A, Joseph S. Validating the EMCV IRES Secondary Structure with Structure-Function Analysis. Biochemistry. 2024 Jan 2;63(1):107-115. doi: 10.1021/acs.biochem.3c00579. Epub 2023 Dec 11. PMID: 38081770; PMCID: PMC10896073.

      Pestova TV, Hellen CU, Shatsky IN. Canonical eukaryotic initiation factors determine initiation of translation by internal ribosomal entry. Mol Cell Biol. 1996 Dec;16(12):6859-69. doi: 10.1128/MCB.16.12.6859. PMID: 8943341; PMCID: PMC231689.

      Petrychenko V, Yi SH, Liedtke D, Peng BZ, Rodnina MV, Fischer N. Structural basis for translational control by the human 48S initiation complex. Nat Struct Mol Biol. 2024 Sep 17. doi: 10.1038/s41594-024-01378-4. Epub ahead of print. PMID: 39289545.

      Roberts LO, Belsham GJ. Complementation of defective picornavirus internal ribosome entry site (IRES) elements by the coexpression of fragments of the IRES. Virology. 1997 Jan 6;227(1):53-62. doi: 10.1006/viro.1996.8312. PMID: 9007058.

      Robertson ME, Seamons RA, Belsham GJ. A selection system for functional internal ribosome entry site (IRES) elements: analysis of the requirement for a conserved GNRA tetraloop in the encephalomyocarditis virus IRES. RNA. 1999 Sep;5(9):1167-79. doi: 10.1017/s1355838299990301. PMID: 10496218; PMCID: PMC1369840.

      Sweeney TR, Abaeva IS, Pestova TV, Hellen CU. The mechanism of translation initiation on Type 1 picornavirus IRESs. EMBO J. 2014 Jan 7;33(1):76-92. doi: 10.1002/embj.201386124. Epub 2013 Dec 15. PMID: 24357634; PMCID: PMC3990684.

      Velazquez MA, Nuthalapati SS, Hankinson J, Fominykh K, Lulla V, Sweeney TR, Hill CH. Structural and mechanistic insights into translation initiation on the enterovirus Type 1 IRES. bioRxiv [Preprint]. 2025 Oct 3: 2025.10.04.680434. doi: 10.1101/2025.10.04.680434.

      Yu Y, Sweeney TR, Kafasla P, Jackson RJ, Pestova TV, Hellen CU. The mechanism of translation initiation on Aichivirus RNA mediated by a novel type of picornavirus IRES. EMBO J. 2011 Aug 26;30(21):4423-36. doi: 10.1038/emboj.2011.306. PMID: 21873976; PMCID: PMC3230369.

    1. eLife Assessment

      This is a valuable study on single-cell transcriptomic analyses, focused on morphogenesis of the zebrafish inner ear in wildtype and lmx1bb mutants. The supporting evidence is mostly convincing, but incomplete in parts.

    2. Reviewer #1 (Public review):

      Summary:

      The authors dissected the ears with some surrounding tissue from 600 embryos at 4 developmental time points of wild-type larvae, as well as from an lmx1bb mutant, performed scRNA-seq analyses, and subclustered the ear/neuromast clusters. They identified cluster markers and performed PAGA pseudotime analyses to build developmental timelines of lineages. They validated some of the cluster markers with HCRs. Many of the clusters are not annotated in detail, but the data sets are still valuable for the community.

      Strengths:

      Using scRNA-Seq, the authors identified cluster markers for tissues of the developing zebrafish ear and validated some of them with HCRs. The data they compiled and submitted to public databases is a valuable resource for the community.

      Weaknesses:

      Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell type/state-specific ones.

      Major comments:

      (1) It would be very useful if the cluster numbers in the Excel files also had the associated cell type annotations as a second column (at least for the ones that are known). E.g., in Supplemental Table 2, the text states which clusters represent which neuromast and ear cell type, but these are not mentioned in the Excel table.

      (2) Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell-type/state-specific ones.

      (3) Uploading the data to gEAR (https://umgear.org/dataset_explorer.html), a web-based, publicly available ear database, would further increase the usefulness of this study to the broader community.

      Method:

      The authors should provide the details about how many cells were sequenced for each ear developmental stage, how many cells were present per cluster (page 8), and how many cells were present in each subcluster of ear and lateral line clusters (page 10).

    3. Reviewer #2 (Public review):

      Summary:

      Munjal and colleagues present a single-cell RNAseq atlas of otic tissue at 4 developmental stages, generate coarse-grained PAGA graphs to describe the development of various otic cell types, rigorously validate their scRNAseq annotations using fluorescent in situ hybridization, and identify changes in epcam expression in lmx1bb mutants that potentially cause the dramatic defects in otic vesicle formation in these mutants.

      Strengths:

      The data set is very nice, and the annotations are extremely rigorous and more in-depth than other datasets that include these tissues, since these investigators have enriched significantly for this tissue of interest. Their use of PAGA to identify potential developmental relationships within the data is rigorous. I also would like to specifically point out how incredibly gorgeous the microscopy of the lmx1bb phenotype is in Figure 7. Wow.

      Weaknesses:

      A missed opportunity is that the authors describe creating an additional scRNAseq dataset from lmx1bb mutants, but do not show any comparative scRNAseq analyses that would identify broader sets of differentially expressed genes. It seems almost as if a key element of the study was removed at the last minute, and as a result, the discussion of changes in epcam expression in lmx1bb mutants in Figure 7 seems somewhat tacked onto the end of the study and not motivated by the analyses presented in the manuscript.

      Overall, I do not think this study requires any major revisions to be appropriate and useful to the community. This study would be potentially stronger with a more formal analysis of what gene expression changes occurred in otic tissue in lmx1bb mutants, but it is also useful without this. I did have a couple of minor suggestions for the presentation of some aspects that would have made it easier for me as a reader.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use single-cell transcriptomic analysis to identify distinct cell types in the zebrafish inner ear. They identify markers of hair cells and supporting cells associated with sensory patches, cells that generate the semicircular canals, endolymphatic duct and sac, and periotic mesenchymal cells.

      Strengths:

      The computational analysis is thorough, and the findings are clearly described. In situ hybridization provides corroboration of cell identities in many cases. This resource atlas will be of particular interest for studies of inner ear morphogenesis. Indeed, the identification of a smooth muscle marker in the endolymphatic sac suggests future analysis of the degree to which this structure undergoes contraction. Identification of cell signaling components in BMP, Wnt, FGF, and other signaling pathways will also provide a resource for understanding signals coordinating ear development.

      Weaknesses:

      The manuscript is incomplete. Important details that would allow replicable analysis are not provided, with notebooks not available on the referenced GitHub site, and additional files are missing.

      The authors make a detailed description of hair cells and supporting cells that are consistent with previous findings (Figures 2 and 3). By contrast, the analysis of distinct cell types that have not been previously well characterized in zebrafish is somewhat incomplete. Markers are described for cells forming the semicircular canals, including ccn1l1 (Figure 4). The authors report an intriguing pattern of its expression before overt bud formation; however, they provide no detailed expression analysis to support this assertion.

      The authors also identify new markers for subsets of periotic mesenchyme (Figure 6). These include epyc and otos, which mark distinct populations within the mammalian inner ear - cochlea supporting cells, spiral limbus, and ligament, respectively. Identification of the equivalent of the spiral ligament would be of particular interest. However, the expression analysis is not of sufficient resolution to identify which cell types these represent in the zebrafish inner ear.

      Differences in gene expression are reported for lmx1bb mutants. However, none of the single-cell data for mutants is provided, and the table (S8) of differential gene expression is missing. Significantly more detail would be needed to interpret these findings.

    5. Author response:

      We thank the editors and reviewers for their careful consideration of our manuscript and for their constructive feedback, which we will address in detail in our revised version. We value that Reviewer 1 considered that “data they compiled and submitted to public databases is a valuable resource for the community.” We are also encouraged by Reviewer #2 when they stated that “The data set is very nice, and the annotations are extremely rigorous and more in-depth than other datasets that include these tissues, since these investigators have enriched significantly for this tissue of interest. Their use of PAGA to identify potential developmental relationships within the data is rigorous. I also would like to specifically point out how incredibly gorgeous the microscopy of the lmx1bb phenotype is in Figure 7. Wow.” We were encouraged by Reviewer #3’s comments that “The computational analysis is thorough, and the findings are clearly described. In situ hybridization provides corroboration of cell identities in many cases. This resource atlas will be of particular interest for studies of inner ear morphogenesis.”

      We spent a significant effort and time considering and addressing the reviewers’ public criticisms.

      Below we address the criticisms of the reviewers’ Public Reviews individually.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell type/state-specific ones.

      Major comments:

      (1) It would be very useful if the cluster numbers in the Excel files also had the associated cell type annotations as a second column (at least for the ones that are known). E.g., in Supplemental Table 2, the text states which clusters represent which neuromast and ear cell type, but these are not mentioned in the Excel table.

      Thank you for the suggestion, we will include additional annotations in the revised version.

      (2) Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell-type/state-specific ones.

      We recognize the need to evaluate potential new markers, we will include a heat map of markers and clusters to assess cell-type/state specificity in the revised version.

      (3) Uploading the data to gEAR (https://umgear.org/dataset_explorer.html), a web-based, publicly available ear database, would further increase the usefulness of this study to the broader community.

      We appreciate the suggestion to upload to gEAR and will upload to the database in the near future.

      Method:

      The authors should provide the details about how many cells were sequenced for each ear developmental stage, how many cells were present per cluster (page 8), and how many cells were present in each subcluster of ear and lateral line clusters (page 10).

      We will add cell numbers for each cluster in the revised version as an additional column in the supplemental tables.

      Reviewer #2 (Public review):

      Weaknesses:

      A missed opportunity is that the authors describe creating an additional scRNAseq dataset from lmx1bb mutants, but do not show any comparative scRNAseq analyses that would identify broader sets of differentially expressed genes. It seems almost as if a key element of the study was removed at the last minute, and as a result, the discussion of changes in epcam expression in lmx1bb mutants in Figure 7 seems somewhat tacked onto the end of the study and not motivated by the analyses presented in the manuscript.

      Overall, I do not think this study requires any major revisions to be appropriate and useful to the community. This study would be potentially stronger with a more formal analysis of what gene expression changes occurred in otic tissue in lmx1bb mutants, but it is also useful without this. I did have a couple of minor suggestions for the presentation of some aspects that would have made it easier for me as a reader.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation. We will work on irmpoving presentation of the mutant data, including a UMAP with the WT cells in one color and the mutant cells in another color.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript is incomplete. Important details that would allow replicable analysis are not provided, with notebooks not available on the referenced GitHub site, and additional files are missing.

      Python notebooks will be added shortly, and files for mapping in Drops data will be provided at the GitHub site.

      The authors make a detailed description of hair cells and supporting cells that are consistent with previous findings (Figures 2 and 3). By contrast, the analysis of distinct cell types that have not been previously well characterized in zebrafish is somewhat incomplete. Markers are described for cells forming the semicircular canals, including ccn1l1 (Figure 4). The authors report an intriguing pattern of its expression before overt bud formation; however, they provide no detailed expression analysis to support this assertion.

      The authors also identify new markers for subsets of periotic mesenchyme (Figure 6). These include epyc and otos, which mark distinct populations within the mammalian inner ear - cochlea supporting cells, spiral limbus, and ligament, respectively. Identification of the equivalent of the spiral ligament would be of particular interest. However, the expression analysis is not of sufficient resolution to identify which cell types these represent in the zebrafish inner ear.

      Thank you for your input regarding the analysis of the periotic mesenchyme. In the revised version, we will attempt to improve resolution of different populations, first by comparing epyc and otos expression by HCR. It is unclear how to correlate any patterns with structures that have yet to evolve, but we will look for similarities and differences to studies performed in mice (PMID: 37720106).

      Differences in gene expression are reported for lmx1bb mutants. However, none of the single-cell data for mutants is provided, and the table (S8) of differential gene expression is missing. Significantly more detail would be needed to interpret these findings.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation.

    1. eLife Assessment

      This study analyzes the temporal dynamics of gene expression following TNF stimulation in macrophages. The work brings valuable data and new methodological approaches to implicate the splicing rate of certain introns as a mechanism regulating mature mRNA expression. This will be of interest to audiences in RNA biology and innate immune response regulation. The experimental design is solid for the core findings, although in places the data limit the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors revisit a well-defined experimental system for studying temporal gene expression mechanisms in TNF-alpha-stimulated macrophages, bringing new tools to the process. Using a hybrid-capture approach, they are able to obtain deeper RNA sequencing of target genes, which allows them to identify potential differences in splicing kinetics of individual introns. Further implementing transcriptional blocks to measure intron half-lives, and predictive machine learning models to identify potential contributing cis-acting RNA elements, they define a group of 'bottleneck' introns whose delayed splicing is a rate-limiting step in mRNA maturation.

      Strengths:

      (1) The hybrid-capture approach enables deeper RNA sequencing of target transcripts.

      (2) The neural network application to identify motifs outside of splice sites could be related to intron removal kinetics.

      (3) The paper uses splicing reporters with modulation of 5' splice sites to test the effect on reporter gene expression in the context of 'bottleneck' introns.

      Weaknesses:

      (1) While evidence is provided that these introns are distinct from previously published splicing kinetics studies, 'bottleneck' introns are not adequately placed in context for assessment of how they are similar or different.

      (2) Splicing reporters are a good approach, but the complexities of post-transcriptional gene expression regulation are not adequately addressed

      (3) Deep learning models are a potentially powerful tool for identifying novel regulatory sequences; however, their use here is underdeveloped.

    3. Reviewer #2 (Public review):

      Summary:

      The authors analyzed the temporal dynamics of gene expression patterns within the inflammatory response transcriptome following TNF stimulation, and proposed that the splicing rate of certain introns is a key mechanism of regulating mature mRNA expression rate.

      Strengths:

      The measurement strategy is generally well-designed to understand the core question of splicing rate and gene expression. The following computation analysis, as well as the mutation or repair studies, further supported the claims. The writing and presentation of the results are also generally clear and easy to follow. I think this manuscript will be of interest to a wide audience.

      Weaknesses: 

      I do have some questions regarding some of the results and conclusions, and I think either more analysis or more explanation and discussion can make the claims more solid. Please see below for details:<br /> <br /> (1) On the hybrid capture method and the RNA coverage results: The strategy of enriching for the last exon before sequencing does have significance in linking pre-mRNA and mature mRNA. If I understand correctly, this enriches for pre-mRNA molecules that are about to finish the full-length elongation of RNA polymerase. However, is this strategy biased towards measuring the splicing rate variation on introns closer to the 3-prime end? For example, if a gene takes 5 minutes for the RNA polymerase to elongate through the full length of the gene, for intron #1 that's very close to the 5' end, you can't tell if it takes 20s to be spliced out or 4 minutes, as both will show as fully spliced out in the sequencing library. In other words, for introns near the 5' end, a consistent "CoSI=1" pattern in the data doesn't necessarily suggest a true consistent fast splicing of that intron. Do you observe any general pattern of the measured "slowliness" in relation to the 5'-3' location of the introns? If so, should the 5' introns be specially considered or even excluded from certain analyses that use all introns?<br /> <br /> (2) Following on my last point, it may benefit the readers if the author can provide a more detailed comparison of possible sequencing library construction choices. For example, is it feasible to also enrich for other exons for the sequencing library, etc?<br /> <br /> (3) Figure 1C: Are there biological replicates, and should there be error bars and statistics on the plot? Similarly, in places like Figure 2, Supplemental Figure 4C, Supplemental Figure 6, etc., is there any statistical analysis that can be done to show if the claimed differences are statistically significant?<br /> <br /> (4) The logic behind measuring the half-lives of introns seems a little unclear to me.  From the time-dependent RNA coverage plots in Figure 2, it seems that, if we assume a constant transcription elongation rate, then the splicing rate of a specific intron can vary across time after TNF stimulation, as represented by the temporal change of CoSI values, or the heights of the coverage plot relative to neighboring exons. This means the splicing rate or half-life of an intron is not necessarily constant but may be time-dependent, at least in the case of TNF stimulation. Shouldn't the half-life measurements be designed in a way to measure the half-life at multiple time points after TNF stimulation? And maybe the measured half-lives of some introns will show as time-dependent?<br /> <br /> (5) In Supplemental Figure 6, the interpretation is a little confusing to me: If delayed splicing is causing delayed expression of the corresponding gene, shouldn't the non-immediate gene groups (early/intermediate/Late) have low CoSI beginning from the early time points (e.g. 4 minutes)? Why does the slowdown of splicing seem to peak at a later time point? Does it mean immediately after TNF stimulation, there's a different mechanism in delaying the expression of the non-immediate gene groups? Maybe it's better to have more explanation or use a different visualization to show what non-immediate gene groups are experiencing at very early time points.<br /> <br /> (6) On the fine-tuning of the deep sequence model: it's a little unclear whether the input and output are time-dependent. It's stated that expression at multiple time points is used for training, but it's unclear whether the model outputs time-dependent expression patterns and whether the time information is used as input.