10,000 Matching Annotations
  1. Jun 2025
    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. if you are freaking out that 75% of this class is based on Essays, remember that you can resubmit Essays 1 and 2 for a different grade. It's truly all about the learning.

      I appreciate the two extra opportunities to revise essays 1 and 2 for a different grade. I am not a big fan of essays therefore, I find it really helpful just in case I don't get a satisfying grade. This shows how you value learning and growth for every student.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript de las Mercedes Carro et al investigated the role of Ago proteins during spermatogenesis by producing a triple knockout of Ago 1, 3 and 4. They first describe the pattern of expression of each protein and of Ago2 during the differentiation of male germ cells, then they describe the spermatogenesis phenotype of triple knockout males, study gene deregulation by scRNA seq and identify novel interacting proteins by co-IP mass spectrometry, in particular BRG1/SMARCA4, a chromatin remodeling factor and ATF2 a transcription factor. The main message is that Ago3 and 4 are involved in the regulation of XY gene silencing during meiosis, and also in the control of autosomal gene expression during meiosis. Overall the manuscript is well written, the topic, very interesting and the experiments, well-executed. However, there are some parts of the methodology and data interpretation that are unclear (see below).

      Major comments

      1= Please clarify how the triple KO was obtained, and if it is constitutive or specific to the male germline. In the result section a Cre (which cre?) is mentioned but it is not mentioned in the M&M. On Figure S1, a MICER VECTOR is shown instead of a deletion, but nothing is explained in the text nor legend. Could the authors provide more details in the results section as well as in the M&M ? This is essential to fully interpret the results obtained for this KO line, and to compare its phenotype to other lines (such as lines 184-9 Comparison of triple KO phenotype with that of Ago4 KO). Also, if it is a constitutive KO, the authors should mention if they observed other phenotypes in triple KO mice since AGO proteins are not only expressed in the male germline.

      Response: We apologize for omitting this vital information. We have now incorporated a more detailed description of how the Ago413 mutant was created in the results and M&M sections (line 120 and 686 respectively).

      As mentioned in the manuscript, Ago4, Ago1 and Ago3 are widely expressed in mammalian somatic tissues. Mutations or deletions of these genes does not disrupt development; however, there is limited research on the impact of these mutations in mammalian models in vivo. In humans, mutations in Ago1 and Ago3 genes are associated with neurological disorders, autism and intellectual disability (Tokita, M.J.,et al. 2015- doi: 10.1038/ejhg.2014.202., Sakaguchi et al. 2019- doi: 10.1016/j.ejmg.2018.09.004, Schalk et al 2021- doi: 10.1136/jmedgenet-2021-107751). In mouse, global deletion of Ago1 and Ago3 simultaneously was shown to increase mice susceptibility to influenza virus through impaired inflammation responses (Van Stry et al 2012- doi.org/10.1128/jvi.05303-11). Studies performed in female Ago413 mutants (the same mutant line used herein) have shown that knockout mice present postnatal growth retardation with elevated circulating leukocytes (Guidi et al 2023- doi: 10.1016/j.celrep.2023.113515). Other studies of double conditional knockout of Ago1 and Ago3 in the skin associated the loss of these Argonautes with decreased weight of the offspring and severe skin morphogenesis defects (Wang et al 2012- doi: 10.1101/gad.182758.111). In our study, we did not observe major somatic or overt behavioral phenotypes, and we did not observe statistical differences in body weights of null males compared to WT as shown in figure below.

      2= The paragraph corresponding to G2/M analysis is unclear to me. Why was this analysis performed? What does the heatmap show in Figure S4? What is G2/M score? (Fig 2D). Lines 219-220, do the authors mean that Pachytene cells are in a cell phase equivalent to G2/M? All this paragraph and associated figures require more explanation to clarify the method and interpretation.

      __Response: __We have modified the methods to include more information about how the cell cycle scoring used in Figures 2D and S4 were calculated and will add more information regarding the interpretation of these figures.

      3= I have concerns regarding Fig2G: to be convincing the analysis needs to be performed on several replicates, and, it is essential to compare tubules of the same stage - which does not seem to be the case. This does not appear to be the case. Besides, co (immunofluorescent) staining with markers of different cell types should be shown to demonstrate the earlier expression of some markers and their colocalization with markers of the earlier stages.

      __Response: __We agree with the Reviewer. New images with staged tubules will be added to the analysis of Figure 2G.

      4= one important question that I think the authors should discuss regarding their scRNAseq: clusters are defined using well characterized markers. But Ago triple KO appears to alter the timing of expression of genes... could this deregulation affects the interperetation of scRNAseq clusters and results?

      __Response: __We thank the reviewer for this suggestion and agree that including this information is important. We expect that, at most, this dysregulation impacts the edges of these clusters slightly. Given that marker genes that have been used to define cell types in these data are consistently expressed between the knockout and wildtype mice (see Figure S4A), we do not think that the cells in these clusters have different identities, just dysregulated expression programs. We have added the relevant sentence to the discussion, and will include additional supplemental figure panels to document this point more comprehensively.

      5= XY gene deregulation is mentioned throughout the result section but only X chromosome genes seem to have been investigated.... Even the gene content of the Y is highly repetitive, it would be very interesting to show the level of expression of Y single copy and Y multicopy genes in a figure 3 panel.

      __Response: __We agree with the reviewer that including analysis of Y-linked genes is important. We will add a supplemental figure which includes the Y:Autosome ratio and differential expression analysis.

      6= Can the authors elaborate on the observation that X gene upregulation is visible in the KO before MSCI; that is in lept/zygotene clusters (and in spermatogonia, if the difference visible in 3A is significant?)

      Response: We do see that X gene expression is upregulated before pachynema. Previous scRNA-seq studies that have looked at MCSI have seen that silencing of genes on the X and Y chromosomes starts before the cell clusters that are defined as pachynema, though silencing is not fully completed until pachynema. We have clarified this point in the manuscript.

      7 = miRNA analysis: could the authors indicate if X encoded miRNA were identified and found deregulated? Because Ago4 has been shown to lead to a downregulation of miRNA, among which many X encoded. It is therefore puzzling to see that the triple KO does not recapitulate this observation. Were the analyses performed differently in the present study and in Ago4 KO study?

      __Response: __The analysis identifying downregulation of miRNA in the original Ago4 mutant analysis was conducted relative to total small RNA expression. Amongst those altered miRNA families in the Ago4 mutants, we demonstrated both upregulation and downregulation of miRNA. We agree that confirming a similar global downregulation of miRNA counts compared to other small RNAs is important. Therefore, in a revised manuscript, we will add this information to the miRNA analysis section, especially highlighting the X chromosome-associated miRNAs, as well as whether the ratios between other small RNA classes change.

      8 = The last results paragraph would also benefit from some additional information. It is not clear why the authors focused on enhancers and did not investigate promoters (or maybe they were but it's unclear). Which regions (size and location from TSS) were investigated for motif enrichment analyses? To what correspond the "transcriptional regulatory regions previously identified using dREG" mentioned in the M&M? I understand it's based on a previous article, but more info in the present manuscript would be useful.

      Response: We thank the reviewer for this suggestion. The regions that were used for motif enrichment will be included as a supplementary information in the fully revised manuscript. We have also clarified in the methods that these transcriptional regulatory regions were downloaded from GEO and obtained from previous ChRO-seq data (from GEO) analysis. These data are run through the dREG pipeline that identifies regions predicted to contain transcription start sites, which include promoters and enhancers.

      Minor comments

      1) In the introduction: The sentence "Ago1 is not expressed in the germline from the spermatogonia stage onwards allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis." is misleading because Ago1 is expressed at least in spermatogonia; It would be more precise to write "after spermatogonia stage" and rephrase the sentence. Otherwise it is surprising to see AGO1 protein in testis lysate and it is not in line with the scRNA seq shown in figure 2.

      __Response: __We agree with the Reviewers suggestion and have edited the sentence on line 100. This sentence now reads "Ago1 is not expressed in the germline after the spermatogonia stage allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis".

      2) Could the authors precise if AGO proteins are expressed in other tissues? In somatic testicular cells?

      __Response: __Expression patterns of mammalian AGOs have been described in somatic and testicular tissues for the mouse by Gonzales-Gonzales et al (2008) by qPCR. They found that Ago2 is expressed in all the somatic tissues analyzed (brain, spleen, heart, muscle and lung) as well as the testis, with the highest expression in brain and lowest in heart. Ago1 is highly expressed in spleen compared to all the tissues analyzed, while Ago3 and Ago4 showed highest expression in testis and brain. Within somatic tissues of the testis, the four argonautes are expressed in Sertoli cells, however, Ago1,3 and 4 expression is very low compared to Ago2, with the latter showing a 10-fold higher transcript level. We have included a sentence with this information in the introduction in line 89.

      3) Pattern of expression: How do the authors explain that AGO3 disappears at the diplotene stage and reappears in spermatids?

      __Response: __ Single cell RNAseq data in the germline shows reduced transcript for Ago3 from the Pachytene stage onwards, suggesting minimal if any new transcription in round spermatids. We hypothesize that the AGO3 protein present in the round spermatid stage is cytoplasmic, presumably coming from the pool of AGO3 in the chromatoid body, a cytoplasmic structure with functional association with the nucleus in round spermatids (Kotaja et al, 2003 doi: 10.1073/pnas.05093331).

      4) It would be useful to show the timing of expression of AGO 1 to 4 throughout spermatogenesis in the first paragraph of the article. Maybe the authors could present data from fig2B earlier?

      Response: We understand the Reviewers concern, however, given that Ago expression throughout spermatogenesis was obtained from scRNA seq, we consider that this data should be presented after introducing the Ago413 knockout and the scRNA seq experiment. As Ago1-4 expression was also described in an earlier manuscript by Gonzales-Gonzales et al in the mouse male germline, and our data aligns with this report, we included a sentence about these previous findings in the earlier results section.

      5) Line 190: please modify the sentence "reveal no differences in cellular architecture of the seminiferous tubules when compared to wild-type males" to " reveal no gross differences..." since even without quantification of the different cell types it is visible that KO seminiferous tubules are different from WT tubules.

      __Response: __We agree with the reviewer, and we modified line 190 (now 173) as suggested. Grossly, seminiferous tubules from Ago413 null males contain the same cell types as in wild type tubules, including spermatozoa. However, our studies show that the number and quality of germ cells is compromised in knockouts, as shown by sperm counts and TUNEL staining.

      6) TUNEL analysis: please stage the tubules to determine the stage(s) at which apoptosis is the most predominant.

      __Response: __We have complied with the reviewer suggestion. Figure 1G now shows staged seminiferous tubules, and we have replaced the wild type image for one where the staged tubules match the knockout image.

      7) Figure S4B does not show an increase of cells at Pachytene stage but at Lepto/zygotene stage (as well as an increase of spermatogonia). Please comment this discrepancy with results shown in Fig2.

      __Response: __Figures 2 and S4 show distribution of cells in different substages of spermatogenesis and prophase I measured with very different methods: a cytological approach using chromosome spreads cells vs a transcriptomic approach that involves clustering of cells. We attribute the differences in cell type distribution to differences in the sensitivity of the methods to identify each cell type and therefore identify differences between the number of cells for each group. Moreover, our scRNA-seq data groups the leptotene and zygotene stages together, while the cytological approach allows for separation of these two sub-stages. Importantly, both results show that Ago413 spermatocytes are progressing slower from pachynema into diplonema and/or are dying after pachynema, as stated in line 194 in our manuscript.

      8) Fig5H and 5I are not mentioned in the result section. Also, it would be useful to label them with "all chromosomes" and "XY" to differentiate them easily

      __Response: __We apologize for the omission and have now cited Figures 5H and 5I in the manuscript (line 453). We have added the suggested labels.

      9) Line 530 "data provide further evidence for a functional association between AGO-dependent small RNAs and heterochromatin formation, maintenance and/or silencing." Please rephrase, the present article does not really show that AGO nuclear role depends on small RNAs.

      __Response____: __We agree with the reviewer that these data do not directly show a dependence on small RNAs. As our identified localization of AGO proteins to the pericentric heterochromatin coincides with localization of DICER shown previously by Yadav and collaborators (2020, doi: 10.1093/nar/gkaa460), we do believe that our data further implicates small RNAs in the silencing of heterochromatin. Yadav et al shows that DICER localizes to pericentromeric heterochromatin and processes major satellite transcripts into small RNAs in mouse spermatocytes, and cKO germ cells have reduced localization of SUV39H2 and H3K9me3 to the pericentromeric heterochromatin. Given the colocalization of both small RNA producing machinery and AGOs at pericentromeric heterochromatin, the AGOs may bind these small RNAs, and the statement in line 530 refers to how our results provide evidence for the involvement of other RNAi machinery in the silencing of pericentromeric heterochromatin investigated by Yadav et al which likely includes small RNAs.

      To clarify this point, we have modified the text accordingly.

      10) Line 1256: replace "cite here " by appropriate reference

      __Response: __The reference was added to line 1256.

      11) Please use SMARCA4 instead of BRG1 name as it is its official name.

      __Response: __We have replaced BRG1 with SMARCA4 in the text and figures.

      Figures:

      Figure 1: Are the pictures shown for Ago3-tagged and floxed from the same stages ? The leptotene stage in 1A looks like a zygotene, while some pachytene/diplotene stage pictures do not look alike.

      __Response: __New representative images have been added to figure 1 to match the same substages across the figure.

      Figure 1D, please label the Y scale properly (testis weight related to body weight)

      __Response: __We have fixed this.

      FigS1: Please comment the presence of non-specific bands in the figure legend

      __Response: __We have added a sentence in Figure S1 Legend.

      Fig 2E and F, please indicate on the figure (in addition to its legend), what are the X and Y axes respectively to facilitate its reading.

      __Response: __X and Y axes are now labelled in Figure 2E and F.

      2F: please use an easier abbreviation for Spermatocyte than Sp (which could spermatogonia, sperm etc..) such as Scyte I ? (same comment for Fig 3C)

      Response: The abbreviation for spermatocyte was changed from Sp to Scyte I in Figures 2 and 3.

      Overall, for all figures showing GSEA analyses, could the authors explain what a High positive NES and a High negative NES mean in the results section?

      Response: Thank you for this suggestion. We have added this information where the GSEA score of the cell markers is initially introduced.

      Significance

      Ago proteins are known for their roles in post transcriptional gene regulation via small RNA mediated cleavage of mRNA, which takes places in the cytoplasm. Some Ago proteins have been shown to be also located in the nucleus suggesting other non-canonical roles. It is the case of Ago4 which has been shown to localize to the transcriptionally silenced sex chromosomes (called sex body) of the spermatocyte nucleus, where it contributes to regulate their silencing (Modzelewski et al 2012). Interestingly, Ago4 knockout leads to Ago3 upregulation, including on the sex body indicating that Ago3 and Ago4 are involved in the same nuclear process. In their manuscript, de las Mercedes Carro et al., investigate the consequences of loss of both Ago3 and Ago4 in the male germline by the production of a triple knockout of Ago1, 3 and 4 in the mouse. With this model, the authors describe the role of Ago3 and Ago4 during spermatogenesis and show that they are involved in sex chromosome gene repression in spermatocytes and in round spermatids, as well as in the control of autosomal meiotic gene expression. Triple KO males have impaired meiosis and spermiogenesis, with fewer and abnormal spermatozoa resulting in reduced fertility. Since Ago1 male germline expression is restricted to pre-meiotic germ cells, it is not expected to contribute to the meiotic and postmeiotic phenotypes observed in the triple KO. The strengths of the study are i) the thorough analyses of mRNA expression at the single cell level, and in purified spermatocytes and spermatids (bulk RNAseq), ii) the identification of novel nuclear partners of AGO3/4 relevant for their described nuclear role: ATF2, which they show to also co-localize with the sex body, and BRG1/SMARCA4, a SWI/SNF chromatin remodeler. The main limitation of the study is the lack of information in the method regarding the production of the triple KO, as well as some aspects of the transcriptome and motif analyses. It is also surprising to see that the triple KO does not recapitulate the miRNA deregulation observed in Ago4 KO. The characterization of a non-canonical role of AGO3/4 in male germ cells will certainly influence researchers of the field, and also interest a broader audience studying Argonaute proteins and gene regulation at transcriptional and posttranscriptional levels.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript titled "Argonaute proteins regulate the timing of the spermatogenic transcriptional program" by Carro et al., the authors present their findings on how Argonaute proteins regulate spermatogenic development. They utilize a mouse model featuring a deletion of the gene cluster on chromosome 4 that contains Ago1, Ago3, and Ago4 to investigate the cumulative roles of AGO3 and AGO4 in spermatogenic cells. The authors characterize the distribution of AGO proteins and their effects on key meiotic milestones such as synapsis, recombination, meiotic transcriptional regulation, and meiotic sex chromosome inactivation (MSCI). They analyze stage-specific transcriptomes in spermatogenic cells using single-cell and bulk RNA sequencing and determine the interactome of AGO3 and AGO4 through mass spectrometry to examine how AGO proteins may regulate gene expression in these cells during meiotic and post-meiotic development. The authors conclude that both AGO3 and AGO4 are essential for regulating the overall gene expression program in spermatogenic cells and specifically modulate MSCI to repress sex-linked genes in pachytene spermatocytes, which may be partially mediated by the proper distribution of DNA damage repair factors. Additionally, AGO3 is suggested to interact with the chromatin remodeler SWI/SNF factor BRG1, facilitating its removal from the sex-chromatin to enable the repression of sex-linked genes during MSCI.

      Major Comments: 1. The study utilized a triple knockout mouse model to determine the effect of AGO3 on spermatogenesis, following up on their previous report about the role of AGO4 in spermatogenesis, which resulted from an upregulation of AGO3 in Ago4-/- spermatocytes. However, the results are more difficult to interpret and ascertain the role of AGO3 in these cells, given the absence of any observable phenotype from Ago3 interruption. AGO4 regulates sex body formation, meiotic sex chromosome inactivation (MSCI), and miRNA production in spermatocytes, all of which were noted in the absence of both AGO3 and AGO4, with only an increased incidence of cells containing abnormal RNAPII at the sex chromosomes. It will be necessary to characterize how AGO3 regulates spermatogenic development, including meiotic progression and the regulation of the meiotic transcriptome, and compare these findings with the current observations to determine if the proposed mechanism involving AGO3, BRG1, and possibly AP2 is relevant in this context.

      __Response: __While we agree with Reviewer that a single Ago3 knockout will help understand distinct roles of AGO3 and AGO4 in spermatogenesis, the time and resources required to generate a new mouse model are substantial. The analysis included in this current manuscript has already taken over seven years, and with the lengthy production of a new single mutant mouse, validation of the new mouse, and then final analysis, we would be looking at another 3-5 years of analysis. In the current funding climate, and with strong concerns over ensuring reduction in utilization of laboratory mice, we consider this request to be far in excess of what is required to move this important story forward.

      The Ago413-/- mouse model has allowed us to associate a nuclear role of Argonaute proteins with a strong reproductive phenotype in the mouse germline. Given the redundancy between Ago3 and Ago4, it is likely that a single Ago3 knockout would have a mild phenotype just like the Ago4 KO. All this said, we agree with the reviewer that analysis of an Ago3 knockout mouse is a valuable next step, just not within this chapter of the story.

      1. Does Ago413-/- mice recapitulate the early meiotic entry phenotype observed in Ago4-/- mice? If not, could it be possible that AGO3 promotes meiotic entry, given its strong mRNA expression in spermatogonia according to the scRNAseq data (Fig. 2B)

      Response: Our scRNA-seq data shows strong expression of Ago3 in spermatogonia, as mentioned by the Reviewer. Analysis of cell cycle marker expression also shows that the transcriptomic profile of spermatogonia is altered, with higher levels of transcripts corresponding to the later G2/M stages (Figure 2D). Moreover, Ago413 knockouts present an increase in the number of spermatogonial stem cells (Supplementary Figure S4B). However, this cluster represents a pool of quiescent and mitotically active cells entering meiosis, therefore interpretation of these data might be challenging. While specific experiments could be conducted to answer this question, this is outside of the scope of our manuscript. The manuscript as it stands is already rather large, and a full analysis of meiotic entry dynamics would dilute the core message relating to chromatin regulation in the sex body.

      1. The authors suggested that the removal of BRG1 by AGO3 is necessary during sex body formation and the eventual establishment of MSCI. However, the BAF complex subunit ARID1A has been shown to facilitate MSCI by regulating promoter accessibility. It will be interesting to determine how BRG1 distribution changes across the genome in the absence of AGO proteins and how that correlates with alterations in sex-linked gene expression.

      __Response: __We agree that changes in BRG1 distribution across the genome would be very interesting to identify. However, in this work we show that BRG1/SMARCA4 protein changes its localization in the sex body very rapidly between early to late pachynema. These two substages are only discernable by immunofluorescence using synaptonemal complex markers, as there are currently no available techniques to enrich for these subfractions. Therefore, study of genome occupancy of BRG1 in these specific substages by techniques such as CUT&Tag are not currently possible. However, we are currently working on new methods to distinguish these cell populations and hope eventually to use these purification strategies to perform the studies suggested by this reviewer. Alternatively, the hope is that single cell CUT&Tag methods will become more reliable, and will enable us to address these questions. Both of these options are not currently available to us. The studies by Menon et al (2024-doi:10.7554/eLife.88024.5) provide strong evidence to support that ARID1A is needed to reduce promoter accessibility of XY silenced genes in prophase I through modulation of H3.3 distribution. However, this mechanism and our identification of the removal of BRG1 between early and late pachytema are not inconsistent with one another, as either SMARCA4 or SMARCA2 can associate with ARID1A as part of the cBAF complex, and ARID1A is also not in all forms of the BAF complex which BRG1 are in. The difference between our results and those seen in Menon et al likely indicate that there are multiple forms of the BAF complex which are differentially regulated during MSCI and play different roles in silencing transcription. Further studies of specific BAF subunits are needed to elucidate how different flavors of the BAF complex act at specific genomic locations and meiotic time points.

      1. The observations presented in this manuscript (Fig. 1D, 2C, 3D, and 4) suggest a haploinsufficiency of the deleted locus in spermatogenic development. How does this compare with the ablation of either Ago3 or Ago4? Please explain.

      Response: Our previous studies in single Ago4 knockouts did not present a heterozygous phenotype (Modzelewski et al 2012, doi: 10.1016/j.devcel.2012.07.003, data not shown). Triple Ago413 knockouts show a much stronger fertility phenotype than single Ago4 knockout. Testis weight of Ago413 homozygous null present a 30% reduction while heterozygous mice show a 15% reduction (Figure 1D), comparable to the 13% reduction previously observed in Ago4-/- males. Sperm counts of Ago413 null and heterozygous males are reduced by 60% and 39% compared to wild type (Figure 1E), respectively, whereas Ago4 null mice have a milder phenotype, with only a 22% reduction in sperm counts. At the MSCI level, both homozygous and heterozygous Ago413 mutant spermatocytes show a similar increase in pachytene spermatocytes with increased RNA pol II ingression into the sex body with respect to wild-type of 35% and 30%, respectively. Ago4 single knockouts show an almost 18% increase in Pol II ingression when compared to wild type. These comparisons are now included in our manuscript in lines 170, 172 and 288. A milder phenotype of the Ago4 knockout and haploinsufficiency in triple Ago413 knockouts but not in Ago4 single knockouts is likely a consequence of the overlapping functions of Ago3 and Ago4 in mammals (and/or overexpression of Ago3 in Ago4 knockouts). In the context of their role in RISC, Wang et al (doi: 10.1101/gad.182758.111) studied the effects of single and double conditional knockouts for Ago1 and Ago2 in miRNA-mediated silencing. They discovered that the interaction between miRNAs and AGOs is highly correlated with the abundance of each AGO protein, and only double knockouts presented an observable phenotype.

      Minor Comments: Based on the interactome analysis, it was argued that AGO3 and AGO4 may function separately. Please discuss how AGO3 might compensate for AGO4 (Line 109).

      Response: We hypothesize that the combined function of AGO3 and AGO4 is needed for proper sex chromosome inactivation during meiosis. We base this hypothesis on the facts that (i) both proteins localize to the sex body in pachytene spermatocytes, (ii) loss of Ago4 leads to upregulation of Ago3, and (iii) the MSCI phenotype of Ago413 knockout mice is much stronger than the single Ago4 knockout (see above). However, AGO3 and AGO4 might not induce silencing through the same mechanism or pathway. In this work, we observed that their temporal expression in prophase I is different; while AGO3 protein seems to disappear by the diplotene stage, AGO4 is present in the sex body of these cells. Moreover, the proteomic analysis revealed a very low number of common interactors, an observation which could support the idea of AGO3 and AGO4 acting by different (albeit perhaps related) mechanisms to achieve MSCI. It is also possible that common interactors were not identified in our proteomic analysis due to the low abundance of AGO3 and AGO4 in the germ cells, limiting the resolution of the proteomics analysis (note that in order to visualize AGO proteins in WB experiments, at least 60 μg of enriched germ cell lysate must be loaded per lane). Moreover, given the difficulty in obtaining enough isolated pachytene and diplotene spermatocytes to perform immunoprecipitation experiments, we performed IP experiments in whole germ cell lysates, which limits the interpretation of our analysis. If AGO3 and AGO4 protein interactors overlap, then AGO3 would directly substitute for AGO4 leading to silencing in single Ago4 knockouts. However, if AGO3 and AGO4 work together through different, complementary mechanisms, then Ago4 mutant mice likely compensates loss of Ago4 by upregulation of Ago3along with specific interactors of the given pathway. We have added a sentence addressing this matter in line 411 of the results section and lines 506 and 513 of the discussion in the revised manuscript.

      In Line 221, it is unclear what is meant by 'cell cycle transcripts'. Does this refer to meiotic transcripts? It is also important to discuss the relevance of the G2/M cell cycle marker genes at later stages of meiotic prophase.

      Response: Thank you for this suggestion. We have changed the relevant text to remove redundancies and include more information. We agree that considering the importance of these genes across meiotic prophase is needed, as cells which are in the dividing stage will already have produced the proteins necessary for division. These cells likely correspond to the diplotene/M cluster cells that have a lower G2/M score, potentially causing the bimodal distribution seen in Figure 2D. We have added a sentence addressing this to the manuscript.

      While identified as a common interactor of both AGO3 and AGO4 in lines 440-445, HNRNPD is not listed among AGO4 interactors in Table S6. Please correct or explain this discrepancy.

      Response: HNRPD was originally identified as an AGO4 interactor using a less strict criteria than the one used in our manuscript: we required consistent enrichment in at least two rounds of IP MS experiments. This reference to HNRNPD was a mistake, given that HNRPD was only enriched in one of our three replicates. Thus, we apologize and have removed the sentence in lines 440-445.

      It is unclear whether wild-type cell lysate or lysate containing FLAG-tagged AGO3 was used for BRG1 immunoprecipitation, and which antibody was used to detect AGO3 in the BRG1 IP sample. A co-IP experiment demonstrating interaction between BRG1 and wild-type AGO3 would be ideal in this context. Furthermore, co-localization by IF would be beneficial to determine the subcellular localization and the cell stages the interaction may be occurring. Additionally, co-IP and Western blot methodologies should be included in the methods section.

      __Response: __MYC-FLAG tagged AGO3 protein lysates were used for BRG1 Co-Immunoprecipitation, along with an anti MYC antibody to detect AGO3. This is now detailed in the Methods section of our revised manuscript (line 1133).

      Regarding BRG1 and AGO3 colocalization by IF, we can confidently show that both AGO3 and BRG1 localize to the sex chromosomes in early pachynema by comparing BRG1/SYCP3 and FLAG-AGO3/SYCP3 stained spreads. We were not able to show colocalization simultaneously on the same cells, given the lack of appropriate antibodies. Our anti FLAG antibody is raised in mouse, while anti BRG1 is raised in rabbit, therefore a non-rabbit, non-mouse anti SYCP3 would be needed to identify prophase I substages, and our lab does not possess such a validated antibody. However, we now have access to a multiplexing kit that allows to use same-species antibodies for immunofluorescence and we can perform these experiments for a revised manuscript.

      __Response: __The methods section now includes description of co-IP methodologies (line 1132). Western Blot methodologies are explained in lane 718, under the "Immunoblotting" title.

      In line 599, it is unclear what is meant by 'persistence of sex chromosome de-repression'. Please correct or clarify this.

      Response: This sentence has been changed and reads: "The persistence of sex chromosome gene expression".

      If possible, please add an illustration to summarize the findings together.

      Response: We thank the reviewer for this suggestion, and have now added this in Figure 6

      Significance

      Overall, this study enhances the understanding of gene expression regulation by AGO proteins during spermatogenesis. Several approaches, including functional, histological, and molecular characterization of the triple knockout phenotype, were instrumental in elucidating the role of AGO proteins in MSCI and meiotic as well as postmeiotic gene regulation. The main limitation of the study is that it is challenging to appreciate the role of AGO3 in addition to the previously published role of AGO4 without the inclusion of necessary control groups. Furthermore, the mechanism of action for AGO proteins in meiotic gene regulation was left relatively unexplored. This study presents new findings that will be significant for the research community interested in gene regulation, chromatin biology, and reproductive biology with the above suggestions considered.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The authors characterize a CRISPR-Cas9 mouse mutant that targets 3 genes that encode AGO family proteins, 2 of which are expressed during spermatogenesis (AGO3 and AGO4) and one that is said is not expressed, AGO1. This mouse mutant showed that AGO3 and AGO4 both contribute to spermatogenesis success as the "Ago413" mutation gave rise to an additive reduction in testis weight, due to spermatocyte apoptosis, and reduction in sperm count. Furthermore, they use insertion mouse mutants for Ago3 and Ago2 that express tagged versions of their corresponding proteins, which they use in combination with pan-AGO antibodies and Ago mutants to show differential expression and localization properties of AGO2, AGO3, and AGO4 (and the absence of AGO1) during spermatogenesis with a particular focus on meiotic prophase. They perform single-cell RNAseq and intricate analyses to demonstrate a change in distribution of meiotic stages in Ago413 mutants, and the overall cell cycle in spermatogonia and spermatocytes is altered. This analysis shows that the mutation leads to an inability to downregulate prior spermatogonia/spermatocyte stage transcripts in a timely manner. On the other hand, later-stage spermatocytes are abnormally expressing spermiogenesis genes. Similar to the Ago4 mutant previously characterized MSCI is disrupted. The authors also show that AGO3 has different interaction partners compared to AGO4 and focus their final assessment on a novel interaction partner of AGO3, BRG1. They show that this factor, which is involved in chromatin remodeling, is aberrantly localized to the sex body during meiotic prophase and diplonema. As BRG1 is involved in open chromatin, it is proposed that AGO3 restricts BRG1 (and related proteins) from the XY chromosome to ensure MSCI. Overall, this paper is very well constructed with mechanistic insights that make this a very impactful contribution to the research community. Major Comments:

      1. The abstract contains "Ago413-/- mouse" without any explanation of what that is. The abstract needs to be a stand-alone document that does not require any referencing for context.

      Response: We have included a sentence describing Ago413 in line 27

      Figure 2C. - The significance bars are confusing as they appear to overlap strangely.

      Response: We have modified this figure and now present the significance bars are on top of the data points.

      On line 235, the authors state that "we first identified the top non-overlapping upregulated genes for Ago413+/+ germ cells in each cluster. Why did the authors not also select down-regulated genes in each cluster to perform a similar analysis?

      __Response: __Thank you for this question. As our goal was to identify genes that are markers of the transcriptional program in each cell type, we used only uniquely upregulated genes for each cluster. Genes that are downregulated for a cluster may be indicative of the transcription in several other cell types, which is not easily interpretable. For a revised manuscript, we will perform this analysis to determine if there is any specific alterations in these downregulated genes.

      Their Ago413 mutant characterization does a good job of assessing meiotic prophase and spermatozoa. However, their assessment of the stages in between these is lacking (meiotic divisions and spermiogenesis).

      Response: We understand the reviewer's concern, however, it is not usual to study stages between the first meiotic division and spermiogenesis because meiosis II is so rapid and thus we lack tools to dissect it. In general, any defect that impacts meiosis I (and particularly prophase I) leads to cell death during prophase I or at metaphase I due to strictly adhered checkpoints that eradicate defective cells. Thus, the increased TUNEL staining in prophase I indicates to us that defective cells are cleared before exit from meiosis I, and those cells progressing to the spermatid stage are "normal" for meiosis II progression. For these cells that did complete meiosis I and progressed normally through meiosis II, we analyzed their spermiogenic outcome extensively (see section entitled "Post-meiotic spermatids from Ago413-/- males exhibit defective spermiogenesis and poor spermatozoa function"). This section included extensive sperm morphology, sperm motility and sperm fertility through in vitro fertilization assays. That said, we have added a sentence on line 268 to explain the transit through meiosis II.

      The discovery of the interaction between BRG1 and AGO3 is exciting. They should assess BRG1 localization in later sub-stages, including late diplonema and diakinesis.

      __Response: __BRG1(SMARCA4) was analyzed throughout prophase I, as shown in image 5G, including quantification of fluorescence intensity included the analysis of diplonema (5H-I). However, diakinesis was not included here since there was no observable signal of BRG1 in these cells. We have explained this in lines 459.

      ATF2 should have been assessed in more detail, as was done for BRG1 in Figure 5.

      __Response: __We agree with the Reviewer, however, staining of chromosome spreads with the anti ATF2 antibody was not possible in our hands after several attempts and changes in staining conditions. However, as staining of sections was successful, we showed localization of ATF2 on spermatocytes by co staining sections with SYCP3 and ATF2.

      Reviewer #3 (Significance (Required)): Overall, this paper is very well constructed with mechanistic insights, as described in my reviewer comments, that make this a very impactful contribution to the research community.

    1. Reviewer #2 (Public review):

      Summary:

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. But the artifacts of the New Literalism seem to embrace mediation, even to double down on it with their supplementary signposts, historical snapshots, and expository tics. Many works insist precisely on the value of ambiguity—that liberal shibboleth “It’s complicated”—just in a ham-fisted, didactic way. And while Kornbluh finds immediacy narcissistic, I’m inclined to diagnose us instead with what Freud called repetition compulsion, a phenomenon that he linked to the death drive.

      Include in paper

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This article investigates the phenotype of macrophages with a pathogenic role in arthritis, particularly focusing on arthritis induced by immune checkpoint inhibitor (ICI) therapy. 

      Building on prior data from monocyte-macrophage coculture with fibroblasts, the authors hypothesized a unique role for the combined actions of prostaglandin PGE2 and TNF. The authors studied this combined state using an in vitro model with macrophages derived from monocytes of healthy donors. They complemented this with single-cell transcriptomic and epigenetic data from patients with ICI-RA, specifically, macrophages sorted out of synovial fluid and tissue samples. The study addressed critical questions regarding the regulation of PGE2 and TNF: Are their actions co-regulated or antagonistic? How do they interact with IFN-γ in shaping macrophage responses? 

      This study is the first to specifically investigate a macrophage subset responsive to the PGE2 and TNF combination in the context of ICI-RA, describes a new and easily reproducible in vitro model, and studies the role of IFNgamma regulation of this particular Mф subset. 

      Strengths: 

      Methodological quality: The authors employed a robust combination of approaches, including validation of bulk RNA-seq findings through complementary methods. The methods description is excellent and allows for reproducible research. Importantly, the authors compared their in vitro model with ex vivo single-cell data, demonstrating that their model accurately reflects the molecular mechanisms driving the pathogenicity of this macrophage subset. 

      Weaknesses: 

      Introduction: The introduction lacks a paragraph providing an overview of ICI-induced arthritis pathogenesis and a comparison with other types of arthritis. Including this would help contextualize the study for a broader audience.

      Thank you for this suggestion, we have added a paragraph on ICI-arthritis to intro (pg. 4, middle paragraph).  

      Results Section: At the beginning of the results section, the experimental setup should be described in greater detail to make an easier transition into the results for the reader, rather than relying just on references to Figure 1 captions.

      We have clarified the experimental setup (pg. 5).  

      There is insufficient comparison between single-cell RNA-seq data from ICI-induced arthritis and previously published single-cell RA datasets. Such a comparison may include DEGs and GSEA, pathway analysis comparison for similar subsets of cells. Ideally, an integration with previous datasets with RA-tissue-derived primary monocytes would allow for a direct comparison of subsets and their transcriptomic features.

      We thank the Reviewer for this suggestion, which has increased the impact of our data and analysis. A computationally rigorous representation mapping approach showed that ICI-arthritis myeloid subsets predominantly mapped onto 4 previously defined RA subsets including IL-1β+ cells. This result was corroborated using a complementary data integration approach. Analysis of (TNF + PGE)-induced gene sets (TP signatures) in ICI-arthritis myeloid cells projected onto the RA subsets using the AUCell package showed elevated TP gene expression in similar ICI-arthritis and RA monocytic cell subsets. We also found mutually exclusive expression of TP and IFN signatures in distinct RA and ICI-arthritis myeloid cell subsets, which supports that the opposing cross-regulation between IFN-γ and PGE2 pathways that we identified in vitro also functions similarly in vivo. This analysis is shown in the new Fig. 3, described on pg. 7, and discussed on pp. 13-14.

      While it's understandable that arthritis samples are limited in numbers and myeloid cell numbers, it would still be interesting to see the results of PGE2+TNF in vitro stimulation on the primary RA or ICI-RA macrophages. It would be valuable to see RNA-Seq signatures of patient cell reactivation in comparison to primary stimulation of healthy donor-derived monocytes.

      We agree that this would be interesting but given limited samples and distribution of samples amongst many studies and investigators this is beyond the scope of the current study.  

      Discussion: Prior single-cell studies of RA and RA macrophage subpopulations from 2019, 2020, 2023 publications deserve more discussion. A thorough comparison with these datasets would place the study in a broader scientific context. 

      Creating an integrated RA myeloid cell atlas that combines ICI-RA data into the RA landscape would be ideal to add value to the field. 

      As one of the next research goals, TNF blockade data in RA and ICI-RA patients would be interesting to add to such an integrated atlas. Combining responders and non-responders to TNF blockade would help to understand patient stratification with the myeloid pathogenic phenotypes. It would be great to read the authors' opinion on this in the Discussion section. 

      Please see our response to point 3 above. This point is addressed in Fig. 3, pg. 7, and pp. 13-14, which includes a discussion of responders and nonresponders and patient stratification.  

      Conclusion: The authors demonstrated that while PGE2 maintains the inflammatory profile of macrophages, it also induces a distinct phenotype in simultaneous PGE2 and TNF treatment. The study of this specific subset in single-cell data from ICI-RA patients sheds light on the pathogenic mechanisms underlying this condition, however, how it compares with conventional RA is not clear from the manuscript. 

      Given the substantial incidence of ICI-induced autoimmune arthritis, understanding the unique macrophage subsets involved for future targeting them therapeutically is an important challenge. The findings are significant for immunologists, cancer researchers, and specialists in autoimmune diseases, making the study relevant to a broad scientific audience. 

      Reviewer #2 (Public review): 

      Summary/Significance of the findings: 

      The authors have done a great job by extensively carrying out transcriptomic and epigenomic analyses in the primary human/mouse monocytes/macrophages to investigate TNF-PGE2 (TP) crosstalk and their regulation by IFN-γ in the Rheumatoid arthritis (RA) synovial macrophages. They proposed that TP induces inflammatory genes via a novel regulatory axis whereby IFN-γ and PGE2 oppose each other to determine the balance between two distinct TNF-induced inflammatory gene expression programs relevant to RA and ICI-arthritis. 

      Strengths: 

      The authors have done a great job on RT-qPCR analysis of gene expression in primary human monocytes stimulated with TNF and showing the selective agonists of PGE2 receptors EP2 and EP4 22 that signal predominantly via cAMP. They have beautifully shown IFN-γ opposes the effects of PGE2 on TNF-induced gene expression. They found that TP signature genes are activated by cooperation of PGE2-induced AP-1, CEBP, and NR4A with TNF-induced NF-κB activity. On the other hand, they found that IFN-γ suppressed induction of AP-1, CEBP, and NR4A activity to ablate induction of IL-1, Notch, and neutrophil chemokine genes but promoted expression of distinct inflammatory genes such as TNF and T cell chemokines like CXCL10 indicating that TP induces inflammatory genes via IFN-γ in the RA and ICI-arthritis. 

      Weaknesses: 

      (1) The authors carried out most of the assays in the monocytes/macrophages. How do APCcells like Dendritic cells behave with respect to this TP treatment similar dosing? 

      We agree that this is an interesting topic especially as TNF + PGE2 is one of the standard methods of maturing in vitro generated human DCs and promoting antigen-presenting function. As DC maturation is quite different from monocyte activation this would represent a new study and is beyond the scope of the current manuscript. We have instead added a paragraph to the discussion (pg. 12) and cited the literature on DC maturation by TNF + PGE2 including one of our older papers (PMID: 18678606; 2008)  

      (2) The authors studied 3h and 24h post-treatment transcriptomic and epigenomic. What happens to TP induce inflammatory genes post-treatment 12h, 36h, 48h, 72h. It is critical to see the upregulated/downregulated genes get normalised or stay the same throughout the innate immune response.

      We now clarify that subsets of inducible genes showed distinct kinetics of induction with transient expression at 3 hr versus sustained expression over the 24 hr stimulation period as shown in Supplementary Fig. 1 (pg. 5).

      (3) The authors showed IL1-axis in response to the TP-treatment. Do other cytokine axes get modulated? If yes, then how do they cooperate to reduce/induce inflammatory responses along this proposed axis?

      This is an interesting question, which we approached using a combination of pathway analysis and targeted inspection of pathways important pathogenesis of RA, which is the inflammatory condition most relevant for this study. In addition to genes in the IL-1-NF-κB core inflammatory pathway, pathway analysis of genes induced by TP co-stimulation showed enrichment of genes related to leukocyte chemotaxis, in particular neutrophil migration. Accordingly, TP costimulation increased expression of CSF3, which plays a key role in mobilizing neutrophils from the bone marrow, and major neutrophil chemokines CXCL1, CXCL2, CXCL3 and CXCL5 that recruit neutrophils to sites of inflammation including in inflammatory arthritis. Analysis of the late response to TNF similarly showed enrichment of genes important in chemotaxis, and suppression of genes in the cholesterol biosynthetic pathway, which we and others have previously linked to IFN responses. Targeted inspection of genes in additional pathways implicated in RA pathogenesis showed increased expression of genes in the Notch pathway. We believe that these pathways work together with the IL-1 pathway to increase immune cell recruitment and activation in inflammatory responses; these results are described on pp. 5-6 and are incorporated into Figures 1, 2 and Supplementary Fig. 2. 

      Overall, the data looks good and acceptable but I need to confirm the above-mentioned criticisms. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):   

      The discussion section of the manuscript claims: "In this study, we utilized transcriptomics to demonstrate a 'TNF + PGE2' (TP) signature in RA and ICI-arthritis IL-1β+ synovial macrophages." This statement is misleading, as no new transcriptomic data from RA synovial samples were generated in this study. To support such a claim, the authors would need to compare primary monocytes or macrophages from RA patients using bulk RNA-seq or singlecell RNA-seq. Based on the current data, the comparison is limited to bulk RNA-seq findings from the authors' in vitro model and prior monocyte-fibroblast coculture studies. 

      We have modified the abstract and discussion (pg. 10) to reflect that we have compared an in vitro generated TP signature with gene expression in previously identified RA macrophage subsets.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Aggressive return to office policies cooled the flight from cities, but ultimately, the need for such aggressive return to office policies underscores that we often conflate the demand for urban living with the demand for a well-paying and high status jobs.

      I wonder if there's anywhere where the urban housing supply isn't so terribly constrained where you could compare to see if it's just that the cost math is insane

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We would like to thank the reviewers for their comments, we see great value in the suggestions they made to strengthen our work. We are glad to see that they are in general positive about the manuscript. In the following, we include a point-by-point response to their comments, which are in general consistent with each other.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Sanchez-Cisneros and colleagues, examine how tracheal cell adhesion to the ECM underneath the epidermis helps shape the tracheal system. They show that if cell-ECM adhesion is perturbed the development of the tracheal system and the epidermis is disrupted. They also detect protrusions extending from the dorsal trunk cells towards the ECM. The work is novel, the figures are clear, and the questions are well addressed. However, I find that some of the claims are not completely supported by the data presented. I have some suggestions that will, I believe, clarify certain points.

      Major comments

      At the beginning of the results section as in the introduction the authors claim that "It is generally assumed that trunk displacement occurs due to tip cells pulling on the trunks so that they follow their path dorsally." This sentence is not referenced, and I do not know where it has been shown or proposed to be like this. In addition, the comparison with the ventral branches is also not referenced and the movie does not really show this. Forces generated by tracheal branch migration have been shown to drive intercalation (Caussinus E, Colombelli J, Affolter M. Tip-cell migration controls stalk-cell intercalation during Drosophila tracheal tube elongation. Curr Biol. 2008;18(22):1727-1734. doi:10.1016/j.cub.2008.10.062), but not dorsal trunk (DT) displacement.

      • *

      We agree that dorsal trunk displacement has not been discussed in previous works, just the fact that tip-cell migration influences stalk cell intercalation. We will rephrase this sentence, stating that dorsal trunk displacement has not been studied.

      However, to rule out the possibility that DT displacement and the phenotype observed in XXX is due to dorsal branch pulling forces, the authors should analyze what happens in the absence of dorsal branches (in condition of Dpp signalling inhibition as in punt mutants or Dad overexpression conditions).

      This is a great idea, and we thank the reviewer for suggesting this. We tried to achieve a similar goal by expressing a Dominant Negative FGFR (Breathless-DN) in the tracheal system, since its expression under btl-gal4 affects tip cell migration. However, the phenotype arises too late to have an effect in dorsal branch migration during the stages we were interested in analyzing. The alternative proposed by the reviewer should be more efficient, as blocking Dpp signalling prevents the formation of dorsal branches completely. We have just received flies carrying the UAS-Dad construct. We will express Dad under btl-gal4 and see how this affects dorsal trunk displacement.

      I am concerned about the TEM observations. The authors claim they can identify tracheal cells by their lumen (Fig. 2 C'). However, at stage 15, the tracheal lumen should be clearly identifiable, and the interluminal DT space should be wider relative to the size of the cells. In this case, there is nothing telling us that we are not looking at a dorsal branch or lateral trunk cell. Furthermore, at embryonic stage 15, the tracheal lumen is filled with a chitin filament, which is not visible in these micrographs. Also, there is quite a lot of tissue detachment and empty spaces between cells, which might be a sign of problems in sample fixing. Better images and more accurate identification of dorsal trunk cells is necessary to support the claim that "These experiments revealed a novel anatomical contact between the epidermis and tracheal trunks".

      The protocol that we use for TEM involves performing 1-μm sections that allow us to stage embryos and to identify the anatomical regions using light microscopy and then switch to ultra-thin sections for electron microscopy once we have found the right position within the sample. This approach also allows us to determine the integrity of the sample. We attach here a micrograph of the last section we analyzed before we decided to do the EM analysis. The asterisk (*) points to a region where the multicellular lumen of the trunk is visible. Due to its proximity to the posterior spiracles, we are confident this is the dorsal trunk and not the lateral trunk. We realize now, after comparing this image with an atlas of development (Campos-Ortega and Hartenstein, 2013), that the stage we chose to illustrate the interaction is a stage 14 embryo instead of the stage 15 we indicated in the manuscript. We will change the stage but given that dorsal closure has already started by stage 14, this does not affect our analysis. Still, we apologize for the mis-staging of the embryo.

      In the light-microscopy image, we have overlaid the EM section to the corresponding region of interest. We agree that the lumen should be thicker compared to the length of the cells, if the section would be cutting the trunk through its largest diameter. However, the protrusions we see do not emerge from the middle part of the trunk where the lumen is found but are seen towards the dorsal side of the trunk, where the lumen will no longer be visible in a longitudinal section as the ones we present. In the embryo shown in Figure 2A-C, our interpretation is that the section was done through a very shallow section of the lumen (represented below). We interpret this from the fact that we see abundant electron-dense areas which we think are adherens junctions from multiple cells. These junctions are visible in Figure 2C but are currently not labelled. We will add arrows to increase their visibility.

      Given that protruding cells lie at the base of dorsal branches, it would be expected that in some sections we would find the protrusions close to the dorsal branches. This is in fact what we show in the micrograph shown in Figure 2D, with a lower magnification overview image shown in Figure S2D. In this case, we see a cell in close proximity to the tendon cells on one side (Figure 2D), which is connected to a dorsal branch on the opposite side (shown in Figure S2D). This dorsal branch is clearly autocellular and chitin deposition is visible as expected for the developmental stage. Again, in Figure S2E we see an electron-dense patch near the lumen that corresponds to the adherens junctions that seal the lumen. We see that all this needs to be better explained in the manuscript, so we will elaborate on the descriptions, and incorporate the light microscopy micrograph to the supplemental figures. This should also aid with the anatomical descriptions requested by Reviewer #3. Nevertheless, we think these observations confirm that what we are describing are the contact points between the dorsal trunk and tendon cells.

      Timelapse imaging of the protrusions in DT cells is done with frames every 4 minutes (Video S3). This is not enough to properly show cellular protrusions and the images do not really show interaction with the epidermis. Video S4 has a better time resolution but it is very short and only shows the cut moment. Video S4, shows the cut, but the reported (and quantified recoil) is not clear. Nevertheless, the results are noteworthy and should be further analysed.

      We will acquire high temporal resolution time-lapse images using E-Cadherin::GFP and btl-gal4, UAS-PH::mCherry to show the behaviour of the protrusions on a short time scale.

      • *

      Provided these embryos survive, would it be possible to check if embryos after laser cutting will develop wavy DTs?

      We think it would be interesting to carry out this experiment, but the laser cut experiments were done under a collaborative visit and we would not be able to repeat it in a short-term period.

      What happens to the larvae under the genetic conditions presented in Fig.S3? Do they reach pupal stages? Do these animals reach adult stages?

      We have seen escapers out of these crosses, but we have not quantified the lethality of the experiment. We will analyse this and include it in the manuscript.

      The kayak phenotypes are very interesting and perhaps the authors could explore them more. As in inhibition of adhesion to the ECM, kay mutants display wavy dorsal trunks. Do they have defective adhesion? Fos being a transcription factor, this is a possibility. The authors should at least discuss the kay phenotypes more extensively and present a suitable hypothesis for the phenotype.

      We agree that the kayak experiments might bring more consequences than just preventing dorsal closure. We will complement this approach by blocking dorsal closure by other independent means. We will use pannier-gal4 (a lateral epidermis driver), engrailed-gal4 (a driver for epidermal posterior compartment), and 332-gal4 (an amnioserosa driver) to express dominant-negative Moesin. In our experience, this also delays dorsal closure and it should result in a similar tracheal phenotype as the one we see in kayak embryos.

      Minor comments

      Page 2 Line 9/10 The sentence "tracheal tubes branch and migrate over neighbouring tissues of different biochemical and mechanical properties to ventilate them." should be rewritten. Tracheal cells do not migrate over other tissues to ventilate them.

      We meant to say that tracheal cells migrate over other tissues at the same time as they branch and interconnect to allow gas exchange in their surroundings after tracheal morphogenesis is completed. Ventilation is used here as a synonym for gas exchange or breathing. We will rephrase this if the reviewer considers it confusing.

      Page 2 Line 24/25 The sentence "It has been generally assumed that trunks reach the dorsal side of the embryo because of the pulling forces of dorsal branch migration." needs to be backed up by a reference.

      As explained above, we will rephrase this sentence.

      Page 7 Line 32/23 In this sentence, the references are not related to dorsal closure "Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development, or vice versa (Letizia et al., 2023; Reichman-Fried et al., 1994)."

      Our goal in this sentence was to explain that while JNK is required for proper epidermal dorsal closure, loss of JNK signaling in the trachea does not affect tracheal development (as shown by Letizia et al., 2023). At the same time, Reichman-Fried et al., 1994 described the phenotypes of loss of breathless (btl). We will remove this last reference as the work does not study the epidermis. We will rephrase the sentence as: “Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development; namely, JNK signaling (Letizia et al., 2023).”

      Page 12 Line 1 "Muscles attach to epidermal tendon cells through a dense meshwork of ECM" this sentence must be referenced.

      We will add the corresponding references for this statement: (Fogerty et al., 1994; Prokop et al., 1998; Urbano et al., 2009). We will change “dense” for “specialized”.

      Fig. S1- Single channel images (A'-C' and A'-C') should be presented in grayscale.

      Fig. S4- Single channel images (A'-D' and A'-D') should be presented in grayscale.

      We will add the grayscale, single-channel images for these figures.

      Reviewer #1 (Significance (Required)):

      The findings shown in this manuscript shed light on the interactions and cooperation between two organs, the tracheal system and the epidermis. These interactions are mediated by cell-ECM contacts which are important for the correct morphogenesis of both systems. The strengths of the work lie on its novelty and live analysis of these interactions. However, its weaknesses are related to some claims not completely backed by the data, some technical issues regarding imaging and some over-interpreted conclusions.

      This basic research work will be of interest to a broad cell and developmental biology community as they provide a functional advance on the importance of cell-ECM interactions for the morphogenesis of a tubular organ. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, the authors explore the relationships between two Drosophila tissues - the epidermis and tracheal dorsal trunk (DT) - that get dorsally displaced during mid-late embryogenesis. The show a nice temporal correlation between the movements of the epithelia during dorsal closure and DT displacement. They also show a correlation between the movement of an endogenously tagged version of collagen and the DT, suggesting that the ECM may contribute to this coordinated movement. Through high magnification TEM, they show that tracheal cells make direct contact with the subset of epithelial cells, known as tendon cells, that also serve as muscle attachment sites. In between these contact sites, tracheae are separated from the epithelia by the muscles. Furthermore, the TEMs and confocal imaging of tracheal cells expressing a membrane marker at these contact sites show that the tracheal cells are extending filopodia toward the tendon cells. The authors then explore how a variety of perturbations to the ECM produced by the tendon and DT cells affect DT and epithelial movement. They find that expressing membrane-associated matrix metalloproteases (MMP1 or MMP2) in tendon cells as well as perturbations in integrin or integrin signaling components leads to delays in dorsal displacement as well as defective lengthening of the tracheal DT tubes. They find that defects in the association between the tracheal and epidermal ECM attachments affect dorsal displacement of the epidermis, disrupting dorsal closure.

      Major comments: I like the goals of this paper testing the idea that the ECM plays important roles in the coordination of tissue placement, and I think they have good evidence of that from this study. However, I disagree with the conclusions of the authors that disrupting contact between DT and the tendon cells has no effect on DT dorsal displacement. DT tracheal positioning is clearly delayed; the fact that it takes a lot longer indicates that the ECM does affect the process. It's just that there are likely backup systems in place - clearly not as good since the tracheal tubes end up being the wrong length.

      We agree with this view; in our deGradFP experiments we see a delayed DT displacement. We focused our analyses on the coordination with epidermal remodelling, which remained unaltered, but we in fact see a delayed progression in dorsal displacement of both tissues (Figure 5I-J). We will emphasize this in the corresponding section of the Results.

      It also seems important that the parts of the DT where the dorsal branches (DB) emanate are moving dorsally ahead of the intervening portions of the trachea. This suggests to me that the DB normally does contribute to DT dorsal displacement and that this activity may be what helps the DT eventually get into its final position. The authors should test whether the portions of the DT that contact the DB are under tension. If the DB migration is providing some dorsal pulling force on the DT, this may also contribute to the observed increases in DT length observed with the perturbations of the ECM between the tendon cells and the trachea - if tube lengthening is a consequence of the pulling forces that would be created by parts of the trachea moving dorsally ahead of the other parts. Here again, it would be good to test if the DT itself is under additional tension when the ECM is disrupted.

      • *

      We thank the reviewer for the suggested experiments. We agree with the fact that the dorsal branches should pull on the dorsal trunk and that this interaction should generate tension. Unfortunately, we are unable to test this with the experiments proposed by the reviewer, but we propose an alternative strategy to overcome this. We understand that the reviewer suggests we do laser cut experiments in dorsal branches to see if there is a recoil in the opposite direction of dorsal branch migration. We carried out our laser cut experiments using a 2-photon laser through a visit to the EMBL imaging facility, using funds from a collaborative grant. Funding a second visit would require us to apply for extra funding, which would delay the preparation of the experiments. We are aware of UV-laser setups within our university, however, UV-laser cuts would also affect the epidermis above the dorsal branches, which we think might contribute to recoil we would expect to see.

      Instead of doing laser cuts, we have designed an experiment based on the suggestion of reviewer #1 of blocking Dpp signaling (with UAS-Dad), which would prevent the formation of dorsal branches. We expect that in this experimental setup, the trunk will bend ventrally in response to thepulling forces of the ventral branches. We will also co-express UAS-Dad (to prevent dorsal branch formation) and UAS-Mmp2 (to ‘detach’ the dorsal trunk from the epidermis), and we would expect to at least partially rescue the wavy trunk phenotype.

      Minor comments: The authors need to do a much better job in the intro and in the discussion of citing the work of the people who made many of the original findings that are relevant to this study. Many citations are missing (especially in the introduction) or the authors cite their own review (which most people will not have read) for almost everything (especially in the discussion). This fails to give credit to decades of work by many other groups and makes it necessary for someone who would want to see the original work to first consult the review before they can find the appropriate reference. I know it saves space (and effort) but I think citing the original work is important.

      • *

      The reviewer is right; we apologize for falling into this practice. We will reference the original works wherever it is needed.

      Figure 7 is not a model. It is a cartoon depicting what they see with confocal and TEM images.

      We will change the figure; we will include our interpretations of the phenotypes we observed under different experimental manipulations.

      Reviewer #2 (Significance (Required)):

      Overall, this study is one of the first to focus on how the ECM affects coordination of tissue placement. The coordination of tracheal movement with that of the epidermis is very nicely documented here and the observation that the trachea make direct contact with the tendon cells/muscle attachment sites is quite convincing. It is less clear from the data how exactly the cells of the trachea and the ECM are affected by the different perturbations of the ECM. It seems like this could be better done with immunostaining of ECM proteins (collagen-GFP?), cell type markers, and super resolution confocal imaging with combinations of these markers. What happens right at the contact site between the tendon cell and the trachea with the perturbation? I think that at the level of analysis presented here, this study would be most appropriate for a specialized audience working in the ECM or fly embryo development field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary The manuscript by Sanchez-Cisneros et al provides a detailed description of the cellular interactions between cells of the Drosophila embryonic trachea and nearby tendon and epidermal cells. The researchers use a combination of genetic experiments, light sheet style live imaging and transmission electron microscopy. The live imaging is particularly clear and detailed, and reveals protruding cells. The results overall suggest that interactions mediated through the ECM contribute to development of trachea and dorsal closure of epidermis. One new aspect is the existence of dorsal trunk filipodia that are under tension and may impact tracheal morphogenesis through required integrin/ECM interactions.

      Major comments: - Are the key conclusions convincing? Generally, the key conclusions are well supported by the data, and the movies are very impressive. Interactions between the cell types are clearly shown, as is the correlations in their development. However, some of the images are challenging to decipher for a non-expert in Drosophila trachea, especially the EM images, and some of the data is indirect or a bit weak.

      We thank the reviewer for their observations. As mentioned above in response to Reviewer #1, we will add an overview image of the embryo we processed for TEM that is presented in Figure 2.

      The data related to failure of dorsal closure affecting trachea relies on one homozygous allele of one gene (kayak), and so this is somewhat weak evidence. Even though kay is not detected in trachea, there could be secondary effects of the mutation or another lesion on the mutant chromosome. The segments look a bit uneven in the mutant examples.

      • *

      The reviewer is right; as we proposed before, we will complement the kayak experiments with independent approaches that will delay dorsal closure.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Some of the experiments have low n values, especially in imaging experiments, so these may be more preliminary, but they are in concordance with other data.

      The problem we face in our live-imaging experiments is related to the probability of finding the experimental embryos. In most of our experiments we combine double-tissue labelling plus the expression of genetic tools. This generally corresponds to a very small proportion of the progeny. We will aim to have at least 4 embryos per condition.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Higher n-values would substantiate the claims. To strengthen the argument that dorsal closure affects trachea morphogenesis mechanically, the authors might consider using of a combination of kay mutant alleles or other mutant genes in this pathway to provide stronger evidence. Or they could try a rescue experiment in epidermis and trachea separately for the kay mutants.

      We think our experiments delaying dorsal closure using the Gal4/UAS system and a variety of drivers should address the point of the possible indirect effects of kay in tracheal development.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Imaging data can take awhile to obtain, but the genetic experiments could be done in a couple of months, and the authors should be able to obtain any needed lines within a few weeks.

      The reviewer is correct, we will be able to plan our crosses for the proposed experiments within a couple of months.

      • Are the data and the methods presented in such a way that they can be reproduced? Generally, yes. For the deGrad experiments, it is not clear how the fluorescent intensity was normalized - was this against a reference marker?

      Briefly, we used signals from within the embryo as internal controls. In the case of en-gal4, we normalized the signal to the sections of the embryo where en is not expressed and therefore, beta-integrin levels should not be affected. In the case of btl-gal4, we normalized against the signal surrounding the trunks which should also not be affected by the deGradFP system. We will elaborate on these analyses in the methods section.

      Are the experiments adequately replicated and statistical analysis adequate? There are several experiments with low n values, so this could fall below statistical significance. For example, data shown in Fig 1G: n=3; Fig 4D n=4, n=3; Fig 6J n=4

      As mentioned above, we will increase our sample sizes.

      Minor comments: - Specific experimental issues that are easily addressable. To make the TEM images more easily interpreted, it would be helpful to provide a fluorescent image of all the relevant cell types (especially trachea, epidermis, muscle, and tendon cells, plus segmental boundaries) labelled accordingly, so that reader can correlate them more easily with the TEM images. They might also include a schematic of an embryo to show where the TEM field of view is.

      We believe this should be addressed by adding the light microscopy section of the embryo with the TEM image overlaid as illustrated above.

      It is hard to be confident that the EM images reflect the cells they claim and that the filopodia are in fact that, at least for people not used to looking at these types of images.

      As we explained in the response to Reviewer #1, we will elaborate on the descriptions of our TEM data. We think that adding the reference micrograph will aid with the interpretations of the TEM images.

      • Are prior studies referenced appropriately? yes
      • Are the text and figures clear and accurate? yes

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? The writing could be revised to be a bit clearer. Since the results of the experiments do not support the initial hypothesis, I found it a bit confusing as I read along. It may help to introduce an alterative hypothesis earlier to make the paper more logical and easy to follow. To be more specific, On page 3, the authors say they "show that dorsal trunk displacement is mechanically coupled to the remodelling of the epidermis" and also in the results comment that "With two opposing forces pulling the trunks other factors likely participate in their dorsal displacement, but so far these have remained unstudied." But that doesn't end up being what they find. The results from figure 5 and related interpretation on page 17 says "cell-ECM interactions are important for proper trunk morphology, but not for its displacement." So this was confusing to read and I would encourage the authors to frame the issues a bit differently in terms of tube morphogenesis.

      We see how this might be confusing. We will rewrite the introduction so that the work is easier to follow. To achieve this, we will state from the beginning the mechanisms we anticipate that regulate trunk displacement: 1) adhesion to the epidermis, 2) pulling forces from the dorsal branches and 3) a combination of both.

      Some minor presentation issues: What orientation is the cross-sectional view in figure 1C and movie 1?

      We will add a dotted box that indicates the region that we turned 90° to show the cross-section.

      On page 12, the authors say the "Electron micrographs also suggested high filopodial activity" but activity suggests dynamics that are not clear from EM. This could be re-phrased.

      As the reviewer indicates, we cannot conclude dynamics from a static image. We will replace “suggested high filopodial activity” with “revealed filopodial abundance”.

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The results of the paper are significant in that they characterize a mechanical interaction between two tissue types in development, which are linked by the extracellular matrix that sits between them. It is not clear to me that this describes a "novel mechanism for tissue coordination" as stated in the abstract, but it does characterize this type of interaction in a detailed cellular way.

      • Place the work in the context of the existing literature (provide references, where appropriate). For specialists, the work identifies a novel protruding cell type in the fly embryonic trachea, and provides beautiful and detailed imaging data on tracheal development. The "wavy" trachea phenotype is also uncommon and very interesting, so this result could be linked to the few papers that also describe this phenotype and be built up.

      • State what audience might be interested in and influenced by the reported findings. As it stands, this is most interesting for a specialized audience because it requires some understanding of the development of this system in particular. As it characterizes this to a new level of detail, it could be influential to those in the field. Some addition clarification of the results and re-framing could make the manuscript more clear and interesting for non-specialists.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. I work with Drosophila and have studied embryonic and adult cell types, although not trachea specifically. I am familiar with all the genetic techniques and imaging techniques used here.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: In this paper, the authors explore the relationships between two Drosophila tissues - the epidermis and tracheal dorsal trunk (DT) - that get dorsally displaced during mid-late embryogenesis. The show a nice temporal correlation between the movements of the epithelia during dorsal closure and DT displacement. They also show a correlation between the movement of an endogenously tagged version of collagen and the DT, suggesting that the ECM may contribute to this coordinated movement. Through high magnification TEM, they show that tracheal cells make direct contact with the subset of epithelial cells, known as tendon cells, that also serve as muscle attachment sites. In between these contact sites, tracheae are separated from the epithelia by the muscles. Furthermore, the TEMs and confocal imaging of tracheal cells expressing a membrane marker at these contact sites show that the tracheal cells are extending filopodia toward the tendon cells. The authors then explore how a variety of perturbations to the ECM produced by the tendon and DT cells affect DT and epithelial movement. They find that expressing membrane-associated matrix metalloproteases (MMP1 or MMP2) in tendon cells as well as perturbations in integrin or integrin signaling components leads to delays in dorsal displacement as well as defective lengthening of the tracheal DT tubes. They find that defects in the association between the tracheal and epidermal ECM attachments affect dorsal displacement of the epidermis, disrupting dorsal closure.

      Major comments: I like the goals of this paper testing the idea that the ECM plays important roles in the coordination of tissue placement, and I think they have good evidence of that from this study. However, I disagree with the conclusions of the authors that disrupting contact between DT and the tendon cells has no effect on DT dorsal displacement. DT tracheal positioning is clearly delayed; the fact that it takes a lot longer indicates that the ECM does affect the process. It's just that there are likely backup systems in place - clearly not as good since the tracheal tubes end up being the wrong length. It also seems important that the parts of the DT where the dorsal branches (DB) emanate are moving dorsally ahead of the intervening portions of the trachea. This suggests to me that the DB normally does contribute to DT dorsal displacement and that this activity may be what helps the DT eventually get into its final position. The authors should test whether the portions of the DT that contact the DB are under tension. If the DB migration is providing some dorsal pulling force on the DT, this may also contribute to the observed increases in DT length observed with the perturbations of the ECM between the tendon cells and the trachea - if tube lengthening is a consequence of the pulling forces that would be created by parts of the trachea moving dorsally ahead of the other parts. Here again, it would be good to test if the DT itself is under additional tension when the ECM is disrupted.

      Minor comments: The authors need to do a much better job in the intro and in the discussion of citing the work of the people who made many of the original findings that are relevant to this study. Many citations are missing (especially in the introduction) or the authors cite their own review (which most people will not have read) for almost everything (especially in the discussion). This fails to give credit to decades of work by many other groups and makes it necessary for someone who would want to see the original work to first consult the review before they can find the appropriate reference. I know it saves space (and effort) but I think citing the original work is important.

      Figure 7 is not a model. It is a cartoon depicting what they see with confocal and TEM images.

      Significance

      Overall, this study is one of the first to focus on how the ECM affects coordination of tissue placement. The coordination of tracheal movement with that of the epidermis is very nicely documented here and the observation that the trachea make direct contact with the tendon cells/muscle attachment sites is quite convincing. It is less clear from the data how exactly the cells of the trachea and the ECM are affected by the different perturbations of the ECM. It seems like this could be better done with immunostaining of ECM proteins (collagen-GFP?), cell type markers, and super resolution confocal imaging with combinations of these markers. What happens right at the contact site between the tendon cell and the trachea with the perturbation? I think that at the level of analysis presented here, this study would be most appropriate for a specialized audience working in the ECM or fly embryo development field.

    1. Theory is all about the question “why?”

      I never thought about how much of my design education has been focused on “how” to do things. This line helped me realize that asking “why” we design something gives it more meaning and direction. It’s not just about using tools it’s also about purpose.

    1. The code from which thismessage has been taken is none other than thatof the French language; the only knowledgerequired to decipher it is a knowledge of writingand of French.

      Barthes says that reading the linguistic message mainly just needs language skills, like knowing how to read and understand French. But I think the look of the text—especially the font—also matters a lot. Different typefaces give off different vibes. For example, a handwritten font might feel friendly or personal, while a clean, modern font might feel professional or serious. So even though it’s still text, the style of the typography adds another layer of meaning. It’s not just what the words say, but also how they visually come across.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Flowers et al describe an improved version of qFit-ligand, an extension of qFit. qFit and qFit-ligand seek to model conformational heterogeneity of proteins and ligands, respectively, cryo-EM and X-ray (electron) density maps using multi-conformer models - essentially extensions of the traditional alternate conformer approach in which substantial parts of the protein or ligand are kept in place. By contrast, ensemble approaches represent conformational heterogeneity through a superposition of independent molecular conformations.

      The authors provide a clear and systematic description of the improvements made to the code, most notably the implementation of a different conformer generator algorithm centered around RDKit. This approach yields modest improvements in the strain of the proposed conformers (meaning that more physically reasonable conformations are generated than with the "old" qFit-ligand) and real space correlation of the model with the experimental electron density maps, indicating that the generated conformers also better explain the experimental data than before. In addition, the authors expand the scope of ligands that can be treated, most notably allowing for multi-conformer modeling of macrocyclic compounds.

      Strengths:

      The manuscript is well written, provides a thorough analysis, and represents a needed improvement of our collective ability to model small-molecule binding to macromolecules based on cryo-EM and X-ray crystallography, and can therefore have a positive impact on both drug discovery and general biological research.

      Weaknesses:

      There are several points where the manuscript needs clarification in order to better understand the merits of the described work. Overall the demonstrated performance gains are modest (although the theoretical ceiling on gains in model fit and strain energy are not clear!).

      We thank the reviewer for their thoughtful review. To address comments, we have added clarifying statements and discussion points around the extent of performance gains, our choice of benchmarking metrics, and the “standards” in the field for significance. We expanded our analysis to highlight how to use qFit ligand in “discovery” mode, which is aimed at supporting individual modeling efforts. As we now write in the discussion:

      “It is advisable to employ qFit-ligand selectively, focusing on cases with a moderate correlation between your input model and the experimental data, strong visual density in the binding pocket, high map resolution, or when your single-conformer ligand model is strained.”

      Additionally, we note in the discussion:

      “qFit-ligand primarily serves as a “thought partner” for manual modeling. Modelers still must resolve many ambiguities, including initial ligand placement, to fully take advantage of qFit capabilities. In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Flowers et al. aimed to enhance the accuracy of automated ligand model building by refining the qFit-ligand algorithm. Recognizing that ligands can exhibit conformational flexibility even when bound to receptors, the authors developed a bioinformatic pipeline to model alternate ligand conformations while improving fitting and more energetically favorable conformations.

      Strengths:

      The authors present a computational pipeline designed to automatically model and fit ligands into electron density maps, identifying potential alternative conformations within the structures.

      Weaknesses:

      Ligand modeling, particularly in cases of poorly defined electron density, remains a challenging task. The procedure presented in this manuscript exhibits clear limitations in low-resolution electron density maps (resolution > 2.0 Å) and low-occupancy scenarios, significantly restricting its applicability. Considering that the maps used to establish the operational bounds of qFit-ligand were synthetically generated, it's likely that the resolution cutoff will be even stricter when applied to real-world data.

      We thank Reviewer #2 for their comments on the role of conformational flexibility and how our tool addresses the complexity involved in modeling alternative conformations. We agree that there are limitations at low resolution, limiting the application of our algorithm. That is the case with all structural biology tools. Automatically finding alternative conformations of ligands in high-resolution structures is an enhancement to the toolbox of ligand fitting. Expanding the algorithm to work with fragment screening data is important in this realm, as almost all of this data fits in the high-resolution range where qFit-ligand works best.

      The reported changes in real-space correlation coefficients (RSCC) are not substantial, especially considering a cutoff of 0.1. Furthermore, the significance of improvements in the strain metric remains unclear. A comprehensive analysis of the distribution of this metric across the Protein Data Bank (PDB) would provide valuable insights.

      We agree that the changes are small, partially because the baseline (manually modeled ligands) is very high. To provide additional evidence, we added evaluations using EDIAm, which is a more sensitive metric. In Figure 2 (page 10), representing the development dataset, we see more improvements above 0.1. With this being said, it is unclear what constitutes a ‘substantial’ improvement for either of these metrics, especially considering alternative conformations may only change the coordinates of a subset of ligands, just slightly improving the fit to density.

      We agree that looking across the PDB on strain would provide valuable insight. To explore this, we looked to see how qFit-ligand could improve the fitting of deposited ligands with high strain (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, Page 15). While only a subset of these structures had alternative conformers placed (24.6%), we observed that in this subset, the ligands often improved the RSCC and strain. This figure also demonstrates that while RSCC may not change much numerically, the alternative conformers explain previously unexplained density with lower energy conformers than what is currently deposited.

      To mitigate the risk of introducing bias by avoiding real strained ligand conformations, the authors should demonstrate the effectiveness of the new procedure by testing it on known examples of strained ligand-substrate complexes.

      See above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A - Specific comments:

      (1) It appears necessary to provide qFit-ligand with an initial model with the ligand already placed. This is not clear from the start of the introduction on page 3. It appears that ligand position is only weakly adjusted fairly late in the process, in step F of Figure 1. It seems, therefore, that the accuracy of initial placement is rather critical (see the example discussed on page 21). At the same time, in my experience, ambiguous cases are quite common, for example with flat ligands with a few substituents sticking out or with ligands with highly mobile tails. It would be helpful for the authors to comment on the sensitivity to initial ligand placement, either in the discussion or, better yet, in the form of an analysis in which the starting model position is randomly perturbed.

      In our revised version, we have modified the introduction to clarify the necessity of including an initial ligand model (page 4).

      “The qFit-ligand algorithm takes as input a crystal or cryo-EM structure of an initial protein-ligand complex with a single conformer ligand in PDBx/mmCIF format, a density map or structure factors (encoded by a ccp4 formatted map or an MTZ), and a SMILES string for the ligand.”

      We also describe our sampling algorithm more clearly (see: Biasing Conformer Generation, page 6). Steps A-E generate many conformations (using RDKit), which are then selected/fit into experimental density (using quadratic programming). To help with additional shifting issues in the input ligand, after the first selection, we do additional rotation/translation of the generated conformers that are kept. We then do another round of fitting to the density (quadratic programming followed by mixed integer quadratic programming).

      Given this sampling, we have not elected to do an additional computational experiment to test the “radius of convergence” or dependence on initial conditions. However, we outline the fundamental procedure here so that someone can build on the work and test the idea:

      - Create single conformer models as we currently do

      - randomly perturb the coordinates of the ligand by 0.1-0.3Å

      - refine to convergence, creating a series of “perturbed, modified true positives” for each dataset

      - Run qFit ligand

      - Evaluate the variability in the resulting multi-conformer models

      (2) Top of page 6 ("Biasing Conformer Generation"): the authors say "as we only want to generate ligands that physically fit within the protein binding pocket, we bias conformation generation towards structures more likely to fit well within the receptor's binding site". Apart from the odd redundancy of this sentence, I am confused: at the stage that seems to be referred to here (A-C in Figure 1) is the fit to the electron density already taken into account, or does this only happen later (after step E)?

      Thank you for pointing this out. We have edited the statement to clarify it:

      “To guide the conformation generation from the Chem.rdDistGeom based on the ligand type and protein pocket, we developed a suite of specialized sampling functions to bias the conformational search towards structures more likely to fit well into the receptor’s binding site.”

      We do not consider the electron density during conformer generation (only selection from the generated conformers). The sampling is additionally biased by the type of ligand and the size of the binding pocket.

      (3) qFit-ligand appears to be quite slow. Are there prospects for speedup? Can the code take advantage of GPUs or multi-CPU environments?

      We agree with this. We have made some algorithmic improvements, most notably removing duplicate conformers based on root mean squared distance. This, along with parallelization, decreased the average runtime from ~19 minutes to ~8 minutes (see additional details: qFit-ligand runtime, page 8). We do not currently take advantage of GPU specific code.

      (4) Section: Detection of experimental true positive multi-conformer ligands:

      a) Why are carbohydrate ligands excluded? This seems like an important class of ligands that one would like qFit to be able to treat! Which brings me to a related question: can covalently attached groups (e.g., glycosylation sites!) be modeled using qFit-ligand, or is qFit-ligand restricted to non-covalently bound groups?

      Currently, qFit-ligand does not support covalently bound ligands, but this is an area of interest we are hoping to expand into. In the revised version, we added the non-covalently attached carbohydrates back into the true positive dataset. In Figure 4 (page 14), we show that qFit-ligand is able to improve fit to the experimental density in around 80% of structures, while also often reducing torsion strain (see additional details: qFit-ligand applied to unbiased dataset of experimental true positives, page 14).

      b) "as well as 758 cases where the ligand model's deposited alternate conformations (altlocs) were not bound in the same chain and residue number" - I do not understand what this means, or why it leads to the exclusion of so many structures. Likewise, a number of additional exclusions are described in Figure S3. Some more background on why these all happened would be helpful. Are you just left with the "easy" cases?

      Sometimes modelers will list the multiple conformations of a bound ligand as a separate residue within the PDB file, rather than as a single multiconformer model. For example, rather than writing a multiconformer LIG bound at A, 201 with altlocs ‘A’ and ‘B’, a modeler might write this instead as LIG, A, 201 and LIG A, 301. We initially excluded these kinds of structures. However, we agree that this choice resulted in the removal of many potentially valid true positives. We have since updated our data processing pipeline to include these cases, and they are examined in the updated manuscript.

      c) I do not follow the argument made at the end of this section (last two paragraphs on page 9): "when using a single average conformation to describe density from multiple conformations, the true low-energy states may be ignored". I get that, but the conformations in the "modified true positives" dataset derive directly from models in which two conformations were modeled, so this cannot be the explanation for why qFit-ligand models result in somewhat lower average strain. It would seem that the paper could be served by providing examples where single conformations were modeled in deposited structures, but qFit detects multiple conformations.

      We agree with this comment that the strain obtained from the modified true positives is likely higher than the deposited models. However, the modified structure is refined with a single conformation, and therefore changed from the deposited “A” conformation. Thus, the reduced strain observed in our qFit-ligand models relative to the modified true positives is not unexpected.

      To expand our dataset, we also looked at deposited structures with high strain, all of which were modeled as single conformers. Here, we saw a decrease in strain when alternative conformers were placed (see section: Evaluating qFit-ligand on a set of structures known to be highly strained, page 15). Further, we provide an example from the XGen macrocycle dataset where a ligand initially modeled as a single conformer exhibited relatively high strain. After qFit‐ligand modeled a second conformation, the overall strain was reduced (Figure 6C, page 19; Figure 6—figure supplement 1C, page 59).

      (5) Section: qFit-ligand applied to an unbiased dataset of experimental true positives Bottom of page 14: The paragraph starting with "qFit-ligand shows particular strength in scenarios with strong evidence..." is enigmatic: there's no illustration (unless it directly relates to the findings in Figure 4, in which case this should be more explicit). Since this points out when the reader will and will not benefit from using qFit-ligand, it should be clear what the authors are talking about.

      This claim considers all the evidence presented in the manuscript, not necessarily one particular aspect of it. We advise using qFit-ligand when there is a moderate correlation between the input model and the experimental data, strong visual density in the binding pocket, high map resolution, and/or when your single conformer ligand model is strained. We have made all of these points clearer in the updated manuscript.

      B  - Section: qFit-ligand can automatically detect and model multiple conformations of macrocycles:

      This is an exciting extension of qFit-ligand, but some aspects of the analysis strike me as worrisome. Of the initial dataset of 150 structures, fewer than half make it all the way through analysis. It's hard to believe that this is a fully representative subset. Why, for example, could 29 structures not be refined against the deposited structure factors? Why does strain calculation (in RDKit?) fail on 30 ligands? What about the other 18 cases--why did these fail (in PHENIX?).

      We agree that this is a striking number of failures, however, we note that they are not specific shortcomings of qFit-ligand (in fact, most are because standard structural biology and/or cheminformatics software fail on many PDB depositions). Therefore, these failures reflect broader limitations in standard bioinformatics and refinement restraint files when handling macrocycles. The strain calculator we used was not built for macrocycles, and after consulting with many experts in the field, the consensus was that no method works well with macrocycles. We discuss these issues in additional detail in the discussion (page 27):

      “Additionally, our algorithm’s placement within the larger refinement and ligand modeling ecosystem highlighted other areas that need improvement. We note that macrocycles, due to their complicated and interconnected degrees of freedom, suffer acutely from the refinement issues, as demonstrated by the failure of approximately one-third of datasets in our standard preparation or post-refinement pipelines due to ligand parameterization issues. Many of these stemmed from problematic ligand restraint files, highlighting the difficulty of encoding the geometric constraints of macrocycles using standard restraint libraries. Improved force-field or restraints for macrocycles are desperately needed to improve their modeling.”

      C  - Minor issues:

      (1) "Fragment-soaked event maps" - this is a semantically strange section title!

      We have updated the section title in our revised manuscript. The new title is ‘qFit-ligand recovers heterogeneity in fragment-soaked event maps’.

      (2) Too many digits! All over the manuscript, percentages are displayed with 0.01% precision, while these mostly refer to datasets with ~150 structures. Shifting just one structure from one category to another changes these percentages by nearly 1%.

      We have updated the sig figs in our revised manuscript.

      (3) The authors are keen to classify decreases in RSCC as significant only when these changes exceed 0.1, but do not apply the same standard for increases. For instance, in Figure 4B if we were to classify improvements as significant if ΔRSCC > 0.1, there would be fewer significant improvements than decreases in performance (although it is visually clear that for most datasets things get better. Similarly, in Figure 5A if we were to classify improvements as significant if ΔRSCC > 0.1, qFit-ligand would only yield significant improvements for two out of 73 cases-not a lot).

      We agree with the reviewer that there needs to be more consistency in our analysis of improvements/deteriorations. However, we note that operationally, when the decreases in model quality are observed, the modeler would simply reject the new model in favor of the input model. We have added to the discussion:

      “In active modeling workflows or large scale analyses, the workflow would only accept the output of qFit-ligand when it improves model quality. In cases where qFit-ligand degrades map-to-model fit and/or strain, we can simply revert to the input model. In practice, users can easily remove poorly fitting conformations using molecular modeling software such as COOT, while keeping the well modeled conformations, which is an advantage of the multiconformer approach over ensemble refinement methods.”

      There is generally no consensus in the field as to what might indicate a ‘significant’ change in RSCC, and any threshold we choose would be arbitrary. We note that in our manuscript, we had previously characterized a decrease in RSCC to be ‘significant’ if it exceeded 0.1. However, as there is no real scientific justification for this cutoff, or any cutoff, we moved away from this framing in the revised manuscript. Therefore, we just classify if we improve RSCC. For example, on page 9:

      “qFit-ligand modeled an alternative conformation in 72.5% (n=98) of structures. Compared with the modified true positive models, 83.7% (n=113) of qFit-ligand models have a better RSCC and 77.0% (n=104) structures saw an improvement in EDIAm, representing an improved fit to experimental data in the vast majority of structures.”

      In addition, we have conducted additional experiments using more sensitive metrics (EDIAm) to further illustrate qFit-ligand’s performance.

      (4) Small peptides are not discussed as a class of ligands, although these are quite common.

      Canonical peptides can be modeled with standard qFit. Non-canonical peptides present failure modes similar to the macrocycles discussed above, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons we have not included an analysis outside of the macrocycle section. We have noted this caveat in the discussion:

      “We note that even linear non-canonical peptides present similar failure modes to macrocycles, with a mix of ATOM and HETATM records and the need for custom cif definitions and link records. For these reasons, we did not include analysis on small peptide ligands; however, canonical peptides can be modeled with standard qFit [8].”

      (5) Top of page 10: "while refinement improves": what kind of refinement does this refer to?

      This refers to refinement with Phenix. We have updated this language to reflect this (page 8). “We refer to these altered structures as our ‘modified true positives’, which we use as input to qFit-ligand, and subsequent refinement using Phenix.”

      (6) Bottom of page 11: "they often did" -> "it often did"

      We have made this change in the revised version.

      (7) Top of page 14: RMSDs and B factors do have units.

      We have added the units in our revision.

      (8) Top of page 24. In the generation of a composite omit map, why are new Rfree flags being generated? Did I misunderstand that?

      r_free_flags.generate=True only creates R-free flags if they are not present in the input file as is the case for many (especially older) PDB depositions.

      (9) Bottom of page 27: how large is the mask? Presumably when alt confs of the ligand are possible, it would be helpful for the mask to cover those?

      We agree that this mask should be updated. In our revision, we define the mask around the coordinates of the full qFit-ligand ensemble. The same mask is used to calculate the RSCC of the input (single conformer) model versus the qFit-ligand model.

      (10) Middle of page 29: "These structure factors are then used to compute synthetic electron density maps." - It is not clear whether the following three sentences are an explanation of the details of that statement or rather things that are done afterwards.

      We clarify this in the manuscript (page 36).

      “These structure factors are then used to compute synthetic electron density maps. To each of these maps, we generate and add random Gaussian noise values scaled proportionally to the resolution. This scaling reflects the escalation of experimental noise as resolution deteriorates, a common occurrence in real-life crystallographic data.”

      (11) Chemical synthesis: I am not qualified to assess this and am surprised to see some much detail here rather than in some other manuscript. Are the corresponding structures deposited anywhere?

      All of the structures we discuss in this manuscript are deposited in the PDB and listed in Supplementary Table 5.

      Reviewer #2 (Recommendations for the authors):

      The data should consistently present the number of structures that exhibit improvements or deterioration in particular metrics, like RSCC and strain, using a cutoff that should be significant. For instance, stating that "85.93% (n=116) of structures having a better RSCC in the qFit-ligand models compared to the modified true positive models" without clarifying the magnitude of improvement (e.g., a marginal increase of 0.01 in RSCC) lacks meaningful context. The figures should clearly indicate the specific cutoff values used for each metric. The accompanying text should provide a detailed explanation for the selection of these cutoff values, justifying their significance in the context of the study.

      Currently, there is no established consensus within the field on what constitutes a 'significant' improvement in RSCC or strain values. As such, we chose not to impose an arbitrary cutoff and just look at which structures improve RSCC. We also removed all language stating significance, as there isn’t a good standard in the field to assess significance. This is especially important as only improvements would be considered in an active modeling project. In cases where qFit ligand degrades the RSCC (or strain) to a large extent, the modeler would simply revert to the input model.

      In the first section of Results: "First, for all ligands, we perform an unconstrained search function allowing the generated conformers to only be constrained from the bounds matrix (Figure 1A). This is particularly advantageous for small ligands that benefit from less restriction to fully explore their conformational space. We then perform a fixed terminal atoms search function (Figure 1B)." It is unclear whether a fixed terminal atom search was conducted for each conformer generated in the initial step to further explore the conformational space. This aspect should be clarified to provide a more comprehensive understanding of the methodology.

      Each independent conformer generation function (A-E) is initialized with only the input ligand model and runs in parallel with the other functions. These functions do not build on each other, but rather perturb the input molecule independently of one another. In our updated manuscript, we have clarified the methodology (page 6).

      “First, in all cases, we perform an unconstrained search function (Figure 1A), a fixed terminal atoms search function (Figure 1B), and a blob search function (Figure 1C).”

      Phrase: "We randomly sampled 150 structures and, after manual inspection of the fit of alternative conformations, chose 135 crystal structures as a development set for improving qFit-ligand." The authors should explain why they filtered 10% of the structures.

      To develop qFit-ligand, we wanted to use a very high-quality dataset. We needed to know with some degree of certainty that if qFit-ligand failed to produce an alternate conformation (or generated conformations low in RSCC or high in strain), the failure was due to an algorithmic limitation rather than poor-quality input data. Therefore, after selection based on numerical metrics, we manually examined each ligand in Coot to observe if we believed the alternative conformers fit well into the density.

    1. We hope that by the end of this book you know a lot of social media terminology (e.g., context collapse, parasocial relationships, the network effect, etc.), that you have a good overview of how social media works and is used, and what design decisions are made in how social media works, and the consequences of those decisions. We also hope you are able to recognize how trends on internet-based social media tie to the whole of human history of being social and can apply lessons from that history to our current situations.

      How do social media platforms balance helping users share authentically while also managing the risks of context collapse? It feels like this tension is at the heart of how we interact online—and a big reason why sometimes it’s easier just to post nothing at all.

    1. Why would users want to be able to make bots?

      I think users want to make bots because they can be a really helpful way to save time and make tasks easier. Instead of doing the same things over and over, like answering simple questions or organizing information, a bot can handle that automatically. In my opinion, bots also make online experiences better, whether it’s getting quick customer support or just having a fun, interactive tool to use. For businesses especially, I believe bots are a smart way to stay connected with people without needing someone to always be available.

    1. We just need to be clear on terms. There are a few terms that are often confused or used interchangeably—“learning,” “education,” “training,” and “school”—but there are important differences between them. Learning is the process of acquiring new skills and understanding. Education is an organized system of learning. Training is a type of education that is focused on learning specific skills. A school is a community of learners: a group that comes together to learn with and from each other. It is vital that we differentiate these terms: children love to learn, they do it naturally; many have a hard time with education, and some have big problems with school.

      This whole paragraph really hit the nail right on the head in my opinion. Those four terms cannot be used interchangeably because they all mean different things entirely. I believe that everyone loves to learn naturally, it's the vector through which their education is brought to them in that plays a key role in determining if the training or education was valuable.

    2. Our freedoms in democratic societies are not automatic. They come from centuries of struggle against tyranny and autocracy and those who foment sectarianism, hatred, and fear. Those struggles are far from over.

      This statement reminds us that the freedoms we enjoy today weren’t just handed to us—they were hard-won through generations of resistance against oppression and fear. It’s a powerful reminder that democracy must be actively protected and never taken for granted. The fight for justice and equality is ongoing.

    1. How have your views on ethics changed (or been reinforced)?

      Throughout this course (or experience), my views on ethics have been both reinforced and expanded. I’ve always believed that ethics involves doing the “right thing,” but I now see that ethical decisions are often complex, situational, and require balancing competing interests. What was reinforced for me is the importance of empathy and fairness—considering how my actions affect others and striving for transparency.

      At the same time, my understanding of ethics has evolved. I used to think of ethical behavior as primarily personal—about individual honesty or integrity—but I now see how deeply ethics is tied to systems, power structures, and social responsibility. Learning about ethical dilemmas in business, technology, or healthcare (insert your relevant field) helped me understand that being ethical isn't just about avoiding harm; it's about actively promoting justice and accountability.

      Overall, I now approach ethical questions more thoughtfully, with a greater awareness of nuance, context, and the need to ask deeper questions about who benefits and who may be harmed.

    2. How have your views on ethics changed (or been reinforced)?

      My views on ethics have deepened over time. I used to think ethics was just about knowing what’s right and wrong, but now I see it’s more complicated than that. Real-life situations often don’t have clear answers, and doing the “right” thing can still feel uncomfortable or messy. I’ve learned that empathy, listening, and understanding different perspectives matter just as much as the rules. It’s made me more thoughtful and careful about the choices I make and how they affect others.

    1. I think that the students’ voice is not always heard entirely, even through dialogue. I feel that by doing this journal we can make a difference with our personal experience and touch the heart of someone who is willing to stand by us. I also wanted to get the attention of other students who may be feel-ing the same frustration I have felt

      Rashida’s words remind me that being asked to speak is not the same as being truly heard. Even when dialogue happens, students’ insights can be filtered or dismissed by adults who hold more power. Her hope that personal experience can move someone to take action reveals a quiet kind of strength. It’s thoughtful and brave—she’s using her voice not just to describe injustice, but to change who listens and how they respond

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Olivia Solon. 'It's digital colonialism': how Facebook's free internet service has failed its users.

      This article is about how Facebook offers a free, tester-version of the internet to people around the world. It is criticized and called digital colonialism the service it only provides certain info and restricts its users access. Meta already has a huge hold on the digital market and this is just another way of exploiting people that can't afford to know better.

  3. May 2025
    1. The real threat to your progress isn’t failure—it’s lack of focus. There are plenty of opportunities out there that are clear, obvious, and entirely wrong. Just because you can do it, doesn’t mean you should do it.

      You can only win if you can focus on what matters most

    1. Once English became the standard language for programming, people who learn programming learn English (or enough to program with it). Attempts to create a non-English programming language face an uphill battle,

      It's sad to learn that people who want to learn how to program have to learn english since the majority of programming is in english. It is just another way of making minorities stay in a box and adds a layer of struggle in gaining an education in programming. Its truly unfortunate and unfair, I understand that some common words stay in english in order to keep some things consistent but it gets to a point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The data are generated using ATP read-out (CTG assay). For any inhibitor of mitochondrial function, ATP assays are highly sensitive reflecting metabolic stress, yet these do not necessarily translate into cell growth inhibition using standard Trypan blue assays and tend to overestimate the effects. Please show orthogonal more robust assays of cell growth or proliferation.

      We acknowledge the sensitivity of the ATP read-out assay in reflecting metabolic stress. While additional cell growth assays such as Trypan blue exclusion could provide further insights, we believe that the current ATP assay data robustly demonstrate the effect of the IMT and venetoclax combination on cellular metabolism, which is a critical aspect of our study. The scope of our current work focused on metabolic inhibition, and we suggest that future studies could further explore cell proliferation assays to complement these findings.

      (2) It is concluded that AML cells do not utilize glucose for ATP production. Please provide formal measurements of glycolysis/lactate upon combinatorial treatment.

      We appreciate the reviewer’s suggestion to include glycolysis and lactate measurements, which could indeed add further granularity to our metabolic analysis. However, the primary focus of our study is on mitochondrial function and oxidative phosphorylation (OXPHOS) in AML cells treated with IMT and venetoclax. We believe the data presented in Figure 3 provide strong support for the conclusion that glycolysis is not a major energy source in these cells.

      Specifically, in Figure 3C, we demonstrate that AML cells maintain ATP levels and viability when cultured in galactose, a condition that restricts ATP production through glycolysis and forces cells to rely on OXPHOS. This result strongly suggests that these AML cells are not dependent on glycolysis for ATP production. Furthermore, in Supplementary Figure S3B, we show that oxygen consumption rate (OCR) measurements remain stable in the presence of excess glucose, further supporting our conclusion that the cells do not switch to glycolysis when OXPHOS is inhibited.

      These findings collectively indicate a primary reliance on OXPHOS for energy generation in AML cells, consistent with our study’s objectives to explore mitochondrial dependency and the therapeutic potential of targeting mitochondrial transcription in AML. Future studies could certainly expand on these insights by incorporating a more detailed analysis of glycolytic flux and lactate production under combinatorial treatment, but we believe the current data are sufficient to support our main conclusions.

      (3) The transcriptome data are shown without any analysis of pathways. The conclusion from this data beyond the higher number of genes impacted in the combination arm is unclear. Please provide analysis for example GO pathways and interpret in the context of the drugs' mechanism of action.

      In response to the reviewer’s question, we have added gene ontology (GO) pathway analysis to clarify the transcriptomic impact of our combination treatment with IMT and venetoclax. Functional annotation identified significant enrichment in pathways relevant to innate immune response, mitochondrial function, and cellular signaling processes. Specifically, pathways associated with immune defense, mitochondrial signaling, and intracellular signaling were notably affected. These findings suggest that the combination treatment not only disrupts cellular energy metabolism but also potentially primes immune signaling mechanisms. This aligns with the proposed mechanism, where IMT targets mitochondrial transcription and venetoclax induces apoptosis, together enhancing sensitivity in AML cells. The enriched pathways, therefore, support the mechanism of action of both drugs, showing how the combined inhibition of BCL-2 and mitochondrial transcription creates a compounded cellular disruption that enhances the therapeutic effect.

      (4) Please demonstrate (could be in supplement) matrix of combination to support the statement that the combination is synergistic using Bliss index. The actual Bliss values are missing.

      For the revision, we have now included a matrix of combination treatment effects with the corresponding Bliss synergy index values to substantiate our claim of synergy between IMT and venetoclax. This analysis, provided in the supplement, demonstrates that the observed effects exceed the expected additive impact of each drug alone, as calculated by the Bliss independence model. Specifically, the Bliss values confirm a synergistic interaction in venetoclax-sensitive AML cell lines, highlighting that the combined treatment significantly enhances inhibition of cell viability and apoptosis induction compared to single treatments. This data supports our interpretation of synergy and strengthens the mechanistic conclusions drawn from our findings on the combination therapy’s efficacy.

      (5) Please show KG1 data (OCR), here or in Supplement.

      In response to the reviewer’s request to include OCR data for the KG-1 cell line, we would like to clarify that OCR measurements were attempted; however, they did not yield conclusive results. This is noted in the revised manuscript (Results section), where we explain that the KG-1 cell line did not provide usable OCR data, likely due to limitations in detecting reliable mitochondrial respiration in this particular line under our experimental conditions. Therefore, we were unable to include KG-1 OCR data in the main figures or the supplement.

      Reviewer #2:

      (1) It's important that the authors show that the drug's effects in AML are due to on-target inhibition. It's critical that they show that IMT actually inhibits the mito polymerase in the AML cells in the dose range employed.

      We appreciate the importance of demonstrating on-target inhibition of mitochondrial RNA polymerase by IMT1, especially in light of the detailed characterization of IMT1b, a closely related compound, as presented in Bonekamp et al., Nature 2020. The work by Bonekamp et al. established the specificity and efficacy of IMT1b in targeting mitochondrial RNA polymerase across various tumor models. Building on these findings, we designed our study to primarily evaluate the combinatorial efficacy of IMT1 with venetoclax in AML models, assuming a similar mechanism of action as described for IMT1b. While direct confirmation of on-target inhibition in AML cells by IMT1 would undoubtedly provide additional mechanistic insight, we focused on translational aspects in this study. We believe that the foundational work provided by Bonekamp et al. supports the assumption of on-target effects by IMT1, and we suggest that future studies could explicitly verify this in the context of AML.

      (2) For Fig 1, the stated synergism between Venetoclax (Vex) and IMT in p53 mutant THP1 cells is really not evident, despite what the statistical analysis says. In some ways, the more interesting conclusion is that inhibiting mitochondrial transcription does NOT potentiate the efficacy of Bcl2 inhibition in TP53 mutant AML.

      We appreciate the reviewer’s observation regarding the lack of evident synergy between IMT and venetoclax in TP53 mutant THP-1 cells. In line with this comment, we have now expanded the discussion to emphasize that, while statistical analysis suggested a potential interaction, the biological response in TP53 mutant cells was minimal. This contrasts with the strong synergy observed in TP53 wild-type cell lines, such as MV4-11 and MOLM-13. We have now highlighted that TP53 mutation status may limit the effectiveness of mitochondrial transcription inhibition in potentiating BCL-2 inhibition. This addition underscores the importance of mutation profiles, such as TP53 status, in predicting response to combination therapies in AML and is now clearly addressed in the revised discussion.

      (3) They combine IMT with Vex, but Vex plus azacytidine or decitabine is the approved therapy for AML. Any clinical trial would likely start with this backbone (like Vex+Aza). They should test combinations of IMT with Vex/Aza or Vex/Dec.

      While we recognize the importance of testing IMT in combination with clinically approved therapies like Vex+Aza, our current study was designed to explore the potential of IMT in combination with venetoclax alone. Expanding to other combinations would be an excellent direction for future research but is beyond the scope of our current investigation.

      (4) It's interesting that AML cell lines do not show any reliance on ATP generation from glycolysis, but would this still be the case when OxPhos is inhibited with IMT? Such a simple experiment would be much more interesting and could help them better understand the mechanism of IMT efficacy.

      We thank the reviewer for highlighting this point regarding the reliance of AML cell lines on glycolysis under OxPhos inhibition. In our study, we observed that AML cells predominantly rely on OxPhos, and we did test for ATP production in conditions that favored glycolysis by growing AML cells with galactose instead of glucose in the medium. As described in the manuscript, we did not observe significant ATP production or cell viability from glycolysis, even under these conditions. This finding suggests that AML cells have a low capacity to adapt to glycolytic ATP generation when OxPhos is disrupted by IMT, reinforcing the view that they are highly dependent on mitochondrial function for energy production. We agree that this adaptation—or lack thereof—is an intriguing aspect of IMT efficacy in targeting energy metabolism in AML cells, and we have clarified this point in the discussion.

      (5) OxPhos measurements need statistical analyses.

      We appreciate the reviewer’s suggestion to include statistical analyses for the OXPHOS measurements. We would like to clarify that statistical analyses were included in the initial submission. These are detailed in Figure 3 and its legend, as well as in the Statistical Analysis section, where we specify methods such as the calculation of standard error across replicates. This approach was implemented to ensure the rigor of our OCR data and its conclusions on OXPHOS inhibition in AML cells.

      (6) Given that the combo-treated mice do not exhibit much leukemia in the blood through ~180 days, and yet start dying after 100 days, the authors should comment on this, given that the bone marrow has been shown to be a refuge that protects leukemia cells from various therapies.

      We thank the reviewer for highlighting the observed discrepancy between peripheral blood leukemia levels and survival in combo-treated mice. While leukemic cells were minimally detected in the blood up to approximately 180 days, treated mice began to show signs of disease progression and reduced survival around 100 days. This may suggest that residual leukemic cells persist within the bone marrow, which has been established as a sanctuary site for leukemic cells, providing protection from various therapies. The bone marrow environment likely supports a survival niche, enabling these residual cells to evade treatment effects and potentially initiate disease relapse. We have added this interpretation to the discussion to acknowledge the possibility of bone marrow as a protective refuge, which may limit the full eradication of leukemia in these models despite apparent peripheral blood clearance.

      (7) For Fig 5C, the authors should statistically compare the Combo with Vex alone.

      We have now included statistical comparisons between the combination treatment and venetoclax alone in Fig 5C to provide a clearer interpretation of the data.

      (8) The analyses of gene expression using RNAseq of harvested leukemia cells from the PDX model (Table S2), some more discussion of these results would be helpful, particularly given that neither drug is directly targeting nuclear gene expression.

      We thank the reviewer for their suggestion to discuss the RNAseq findings in more detail. In the revised manuscript, we have expanded on the functional annotation of the gene expression changes observed in leukemia cells from the PDX model following combination treatment (Table S2). The enriched pathways include innate immune involvement, mitochondrial function and immune signaling, and intracellular signaling. This suggests that while neither IMT nor venetoclax directly targets nuclear gene expression, the combined treatment induces secondary effects that alter these pathways, potentially contributing to the treatment’s efficacy in AML. This expanded discussion provides greater insight into how the drug combination impacts gene expression and cellular pathways.

      (9) We need more information on the PDX models, in terms of the classification (M1 to M6) of the patient AMLs and genetics (specific mutations, not just the genes mutated, and chromosomal alterations).

      Additional details regarding the classification and genetic background of the PDX models have been included in the manuscript to better contextualize our findings.

      (10) The authors should discuss whether or not IMT represents an improvement over other therapies intended to target Oxphos in AML (clearly, the low toxicity of IMT is a plus, at least in mice).

      We appreciate the reviewer’s suggestion to discuss IMT in comparison with other OXPHOS-targeting therapies for AML. In the revised discussion, we highlight IMT’s unique properties, particularly its low toxicity profile, which may offer advantages over other OXPHOS inhibitors. This low toxicity, demonstrated in preclinical studies, suggests that IMT might improve patient tolerability compared to existing therapies that target mitochondrial function.

      (11) The authors examined toxicity by weighing the mice and performing CBCs. Measurements of liver and kidney toxicity will be necessary for further clinical development.

      We thank the reviewer for the suggestion to further investigate liver and kidney toxicity. In our study, we assessed toxicity through regular weight monitoring and complete blood counts (CBCs) to evaluate overall health status. While additional liver and kidney toxicity measurements will indeed be important in future studies, resource limitations currently prevent us from performing these additional analyses in this model. We agree that these assessments will be essential as we progress towards clinical development, and we plan to address them in upcoming preclinical studies.

    1. The tendency for people to exaggerate, after knowing that something occurred, how much they could have predicted it before it occurred

      Really though, who doesn't do that. I am not totally sure that sometimes it's an exaggeration or something that we think we know subconsciously or if we just want to feel like we know it all. Who can relate to this?

    1. Author response:

      The following is the authors’ response to the previous reviews

      General Response to Reviewers:

      We thank the Reviewers for their comments, which continue to substantially improve the quality and clarity of the manuscript, and therefore help us to strengthen its message while acknowledging alternative explanations.

      All three reviewers raised the concern that we have not proven that Rab3A is acting on a presynaptic mechanism to increase mEPSC amplitude after TTX treatment of mouse cortical cultures.  The reviewers’ main point is that we have not shown a lack of upregulation of postsynaptic receptors in mouse cortical cultures. We want to stress that we agree that postsynaptic receptors are upregulated after activity block in neuronal cultures.  However, the reviewers are not acknowledging that we have previously presented strong evidence at the mammalian NMJ that there is no increase in AChR after activity blockade, and therefore the requirement for Rab3A in the homeostatic increase in quantal amplitude points to a presynaptic contribution. We agree that we should restrict our firmest conclusions to the data in the current study, but in the Discussion we are proposing interpretations. We have added the following new text:

      “The impetus for our current study was two previous studies in which we examined homeostatic regulation of quantal amplitude at the NMJ.  An advantage of studying the NMJ is that synaptic ACh receptors are easily identified with fluorescently labeled alpha-bungarotoxin, which allows for very accurate quantification of postsynaptic receptor density. We were able to detect a known change due to mixing 2 colors of alpha-BTX to within 1% (Wang et al., 2005).  Using this model synapse, we showed that there was no increase in synaptic AChRs after TTX treatment, whereas miniature endplate current increased 35% (Wang et al., 2005). We further showed that the presynaptic protein Rab3A was necessary for full upregulation of mEPC amplitude (Wang et al., 2011). These data strongly suggested Rab3A contributed to homeostatic upregulation of quantal amplitude via a presynaptic mechanism.  With the current study showing that Rab3A is required for the homeostatic increase in mEPSC amplitude in cortical cultures, one interpretation is that in both situations, Rab3A is required for an increase in the presynaptic quantum.”

      The point we are making is that the current manuscript is an extension of that work and interpretation of our findings regarding the variability of upregulation of postsynaptic receptors in our mouse cortical cultures further supports the idea that there is a Rab3Adependent presynaptic contribution to homeostatic increases in quantal amplitude.

      Public Reviews:

      Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.

      We have not removed the statement, but altered it to soften the conclusion. It now reads, “We found that the increase in postsynaptic AMPAR levels in wild type cultures was more variable than that of mEPSC amplitudes, which might be explained by a presynaptic contribution, but we cannot rule out variability in the measurement.”.

      Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

      This particular statement is in the previous response to reviewers only, we deleted the sentence that starts, “The simplest explanation Rab3A regulates a presynaptic contributor….”. and “Imaging of immunofluorescence more variable…”. We deleted “ our data suggest….consistently leads to an increase in mEPSC amplitude and sometimes leads to….” We added “…the lack of a robust increase in receptor levels leaves open the possibility that there is a presynaptic contributor to quantal size in mouse cortical cultures. However, the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is wellsupported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      The main body of this comment is addressed in the General Response to Reviewers. In addition, we deleted text “current data, coupled with our previous findings at the mouse neuromuscular junction, support the idea that there are additional sources contributing to the homeostatic increase in quantal size.” We added new text, so the sentence now reads: “Increased receptors likely contribute to increases in mESPC amplitudes in mouse cortical cultures, but because we do not have a significant increase in GluA2 receptors in our experiments, it is impossible to conclude that the increase is lacking in cultures from Rab3A<sup>-/-</sup> neurons.”

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture?

      Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. 

      A major goal of performing the immunofluorescence measurements in the same cultures for which we had electrophysiological results was to address the common impression that the homeostatic effect itself is highly variable, as the reviewer notes in the comment “…GluA2 increases are variable across cultures…” Presumably, if GluA2 increases are the mechanism of the mEPSC amplitude increases, then variable GluA2 increases should correlate with variable mEPSC amplitude increases, but that is not what we observed. We are left with the explanation that the immunofluorescence method itself is very variable. We have added the point to the Discussion, which reads, “the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent homeostatic plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Finally, the implication of “Shouldn’t this be an indicator that something unusual has happened in this culture?” if it is not due to culture to culture variability in the homeostatic response itself, is that there was a technical problem with accurately measuring receptor levels. We have no reason to suspect anything was amiss in this set of coverslips (the values for controls and for TTX-treated were not outside the range of values in other experiments). In any of the coverslips, there may be variability in the amount of primary anti-GluA2 antibody, as this was added directly to the culture rather than prepared as a diluted solution and added to all the coverslips. But to remove this one experiment because it did not give the expected result is to allow bias to direct our data selection.

      The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size.”

      We have the same situation; our effect sizes match (19.7% increase for mEPSC amplitude; 18.1% increase for GluA2 receptor cluster size, see Table 1), but in our case, the p value for receptors does not reach statistical significance. Our point here is that there is published evidence that the variability in receptor measurements is greater than the variability in electrophysiological measurements. But we have softened this point, removing the sentences containing “…consistently leads and sometimes...” and “……additional sources contributing…”.

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      We have removed all uses of the word “mismatch,” but feel the presentation of the 3 matched experiments, 23-24 cells (Figure 5A, D), and the additional 2 experiments for a total of 5 cultures, 48-49 cells (Figure 5C, F), is important in order to demonstrate that the lack of statistically significant receptor response is due neither to a variable homeostatic response in the mEPSC amplitudes, nor to a small number of cultures.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffic to synapses if possible at all).

      We were unable to identify a key reference that characterized GluA2 homomers vs. heteromers in native cortical neurons, but we have rewritten the section in the manuscript to acknowledge the low conductance of homomers:

      “…to assess whether GluA2 receptor expression, which will identify GluA2 homomers and GluA2 heteromers (the former unlikely to contribute to mEPSCs given their low conductance relative to heteromers (Swanson et al., 1997; Mansour et al., 2001)…”

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      Figure 3 in Turrigiano et al., shows that the increase in glutamate responsiveness is higher at the cell body than along the primary dendrite. We have revised our description to indicate that an increase in responsiveness on the primary dendrite has been demonstrated in Turrigiano et al. 1998.

      “We focused on the primary dendrite of pyramidal neurons as a way to reduce variability that might arise from being at widely ranging distances from the cell body, or, from inadvertently sampling dendritic regions arising from inhibitory neurons. In addition, it has been shown that there is a clear increase in response to glutamate in this region (Turrigiano et al., 1998).”

      “…synaptic receptors on the primary dendrite, where a clear increase in sensitivity to exogenously applied glutamate was demonstrated (see Figure 3 in (Turrigiano et al., 1998)).

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

      We have modified the paragraph in the Discussion that addresses the glial results, to describe more clearly the data that supported an astrocytic TNF-alpha mechanism: “TNF-alpha accumulates after activity blockade, and directly applied to neuronal cultures, can cause an increase in GluA1 receptors, providing a potential mechanism by which activity blockade leads to the homeostatic upregulation of postsynaptic receptors (Beattie et al., 2002; Stellwagen et al., 2005; Stellwagen and Malenka, 2006).”

      We have also acknowledged that we cannot rule out TNF-alpha coming from neurons in the cortical cultures: “…suggesting the possibility that neuronal Rab3A can act via a non-TNF-alpha mechanism to contribute to homeostatic regulation of quantal amplitude, although we have not ruled out a neuronal Rab3A-mediated TNF-alpha pathway in cortical cultures.”

      Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Regarding the concern that the lack of increase in receptors is due to a technical issue, please see General Response to Reviewers, above. We have also softened our conclusions throughout, acknowledging we cannot rule out a technical issue.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      The key finding in Figure 3 is that NASPM did not eliminate the statistically significant increase in mEPSC amplitude after TTX treatment (Fig 3A).  Whether or not NASPM sensitive receptors contribute to mESPC amplitude is a separate question (Fig 3B). We are open to the possibility that NASPM reduces mEPSC amplitude in both control and TTX treated cells (p = 0.08 for both), but that does not change our conclusion that NASPM has no effect on the TTX-induced increase in mEPSC amplitude. The mechanism underlying the decrease in mEPSC frequency following NASPM is interesting, but does not alter our conclusions regarding the role of Rab3A in homeostatic synaptic plasticity of mEPSC amplitude. In addition, the Reviewer does not acknowledge the Supplemental Figure #1, which shows a similar lack of correspondence between homeostatic increases in mEPSC amplitude and GluA1 receptors in two cultures where matched data were obtained. Therefore, we do not think our lack of a robust increase in receptors can be explained by our failing to look at the relevant receptor.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      We agree that definitive evidence for a presynaptic role of Rab3A in homeostatic plasticity of mEPSC amplitudes in mouse cortical cultures requires demonstrating that loss of Rab3A in postsynaptic neurons does not disrupt the plasticity, whereas loss in presynaptic neurons does. Without these data, we can only speculate that the Rab3A-dependence of homeostatic plasticity of quantal size in cortical neurons may be similar to that of the neuromuscular junction, where it cannot be receptors. We have added to the Discussion that the mechanism of Rab3A regulation of homeostatic plasticity of quantal amplitude could different between cortical neurons and the neuromuscular junction (lines 448-450 in markup,). Establishing a way to co-culture Rab3A-/- and Rab3A+/+ neurons in ratios that would allow us to record from a Rab3A-/- neuron that has mainly Rab3A+/+ inputs (or vice versa) is not impossible, but requires either transfection or transgenic expression with markers that identify the relevant genotype, and will be the subject of future experiments.

      (2): Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

      We agree with the Reviewer, that it is important to determine the generality of Rab3A function in homeostatic plasticity. Establishing the homeostatic effect on mIPSCs and then examining them in Rab3A-/- cultures is a large undertaking and will be the subject of future experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor (remaining) points:

      (1) The figure referenced in the first response to the reviewers (Figure 5G) does not exist.

      We meant Figure 5F, which has been corrected in the current response.

      (2) I recommend showing the data without binning (despite some overlap).

      The box plot in Origin will not allow not binning, but we can make the bin size so small that for all intents and purposes, there is close to 1 sample in each bin. When we do this, the majority of data are overlapped in a straight vertical line. Previously described concerns were regarding the gaps in the data, but it should be noted that these are cell means and we are not depicting the distributions of mEPSC amplitudes within a recording or across multiple recordings.

      (3) Please auto-scale all axes from 0 (e.g., Fig 1E, F).

      We have rescaled all mEPSC amplitude axes in box plots to go from 0 (Figures 1, 2 and 6).

      (4) Typo in Figure legend 3: "NASPM (20 um)" => uM

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 140, frequencies are reported in Hz while other places are in sec-1, while these are essentially the same, they should be kept consistent in writing.

      All mEPSC frequencies have been changed to sec<sup>-1</sup>, except we have left “Hz” for repetitive stimulation and filtering.

      (2) Paragraph starting from line 163 (as well as other places where multiple groups are compared, such as the occlusion discussion), the authors assessed whether there was a change in baseline between WT and mutant group by doing pairwise tests, this is not the right test. A two-way ANOVA, or at least a multivariant test would be more appropriate.

      We have performed a two-way ANOVA, with genotype as one factor, and treatment as the other factor. The p values in Figures 1 and 2 have been revised to reflect p values from the post-hoc Tukey test on the specific interactions (for each particular genotype, TTX vs CON effects). The difference in the two WT strains, untreated, was not significant in the Post-Hoc Tukey test, and we have revised the text. The difference between the untreated WT from the Rab3A+/Ebd colony and the untreated Rab3AEbd/Ebd mutant was still significant in the Post-Hoc Tukey test, and this has replaced the Kruskal-Wallis test. The two-way ANOVA was also applied to the neuron-glia experiments and p values in Figure 6 adjusted accordingly.

      (3) Relevant to the second point under minor concerns, I suggest this sentence be removed, as reducing variability and avoiding inhibitory projects are reasons good enough to restrict the analysis to the apical dendrites.

      We have revised the description of the Turrigiano et al., 1998 finding from their Figure 3 and feel it still strengthens the justification for choosing to analyze only synapses on the apical dendrite.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The comments on lines 256-7 could seem misleading - the NASPM results wouldn't rule out contribution of those other subunits, only non-GluA2 containing combinations of those subunits. I would suggest revising this statement. Also, NASPM does likely have an effect, just not one that changes much with TTX treatment.

      At new line 213 (markup) we have added the modifier “homomeric” to clarify our point that the lack of NASPM effect on the increase in mEPSC amplitude after TTX indicates that the increase is not due to more homomeric Ca<sup>2+</sup>-permeable receptors. We have always stated that NASPM reduces mEPSC amplitude, but it is in both control and treated cultures.

      Strong conclusions based on a single culture (lines 314-5) seem unwarranted.

      We have softened this statement with a “suggesting that” substituted for the previous “Therefore,” but stand by our point that the mEPSC amplitude data support a homeostatic effect of TTX in Culture #3, so the lack of increase in GluA2 cluster size needs an explanation other than variability in the homeostatic effect itself.

      Saying (line 554) something is 'the only remaining possibility' also seems unwarranted.

      We have softened this statement to read, “A remaining possibility…”.

      Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC (2002) Control of synaptic strength by glial TNFalpha. Science 295:2282-2285.

      Mansour M, Nagarajan N, Nehring RB, Clements JD, Rosenmund C (2001) Heteromeric AMPA receptors assemble with a preferred subunit stoichiometry and spatial arrangement. Neuron 32:841-853. Stellwagen D, Malenka RC (2006) Synaptic scaling mediated by glial TNF-alpha. Nature 440:1054-1059.

      Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Wang X, Li Y, Engisch KL, Nakanishi ST, Dodson SE, Miller GW, Cope TC, Pinter MJ, Rich MM (2005) Activity-dependent presynaptic regulation of quantal size at the mammalian neuromuscular junction in vivo. J Neurosci 25:343-351.

    1. The creative process is accessible to all learners. It’s flexible and can be altered and adapted to fit an individual student’s needs.

      I love how the creative process is accessible to all learners no matter what. This is great for students who struggle to learn. Being creative is one of the best ways to learn and this highlighted text says just that!

    1. What do you consider to be the most important factors in making an instance of public shaming bad?

      I think public shaming is bad because it's done in front of strangers which leads to one person's low self esteem. Especially, if if the person didn’t do something that bad, but people act like it’s huge that’s not fair. Majority of the time people are just being mean or trying to feel better than someone which is just embarrassing.

    2. What do you consider to be the most important factors in making an instance of public shaming bad?

      The most important factors that I think makes public shaming bad is the fact that it's public. There's a saying "correct in private" that I think is really important. Not many people learn when they are just humiliated in public.

    3. Jennifer Jacquet argues that shame can be morally good as a tool the weak can use against the strong:

      The idea of shame being a tool the weak can use against the strong is rather interesting- I can see examples of this in how minorities make a point to speak out against the parts of history that are hidden away for the sake of making history more sanitized for the children of the majority. It's just a shame that the strength of the majority, hubris, pride, prejudice, and quite a lot of other qualities is rather good at drowning out their shame

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Trauma and Shame. URL: https://www.oohctoolbox.org.au/trauma-and-shame (visited on 2023-12-10).

      What I found particularly compelling was the idea that shame can be “stored” in the body, and unless it’s addressed through safe relational repair or therapeutic processing, it can remain stuck and continue to influence behavior in harmful ways. This resonates with some of the course themes about the role of community in healing. It makes me think we need more conversations not just about individual recovery, but about how groups and institutions can either perpetuate shame or help alleviate it.

    1. After the Conference:

      Add before this section as a reminder. Could be in it's own separate box. Add a graphic, too.

      Active Listening Tips Pay Attention: Minimize distractions, make eye contact, and use open body language to show you are engaged. Show You're Listening: Use verbal cues ("yes," "I see") and nonverbal cues (nodding) to encourage the speaker. Provide Feedback: Paraphrase the speaker's points to confirm understanding and ask clarifying questions. Defer Judgment: Avoid interrupting or forming rebuttals while the speaker is talking. Respond Appropriately: Validate the speaker's emotions and respond thoughtfully, not just with advice.

      Examples of Active Listening Responses

      “I see, go on.”

      “Can you explain that part again?”

      “So, you’re saying…”

      “That must have been difficult for you.”

      “What happened next?”

      “I understand how you feel.”

      “It sounds like you’re upset.”

      “Can you give me an example?”

      “I appreciate your point of view.”

      “Let me make sure I got that right.”

    1. Table 5d: Common Sources of Tension

      Some helpful strategies would fit well here: Working through conflict with families can be challenging, but it's essential for fostering a positive and productive partnership that ultimately benefits the child. Here are some comprehensive tips:

      Tips for Working Through a Conflict with Families 1.Prioritize Relationship Building (Proactive Strategy): Start Positive: Don't let the first communication with a family be negative. Reach out early in the year with positive news about their child. Share successes, no matter how small. This builds a bank of goodwill that you can draw on during difficult conversations. Be Accessible and Approachable: Make it easy for families to connect with you. Offer various communication channels (email, phone, in-person, class app) and be responsive. A warm and welcoming demeanor can significantly de-escalate potential conflict before it even begins. Be Culturally Responsive: Understand and respect diverse family values, communication styles, and cultural norms. What might seem like a conflict could be a misunderstanding rooted in different cultural expectations.

      1. Prepare for the Conversation: Gather Facts and Documentation: Base your concerns on objective observations and specific examples, not emotions or assumptions. Document dates, times, and details of relevant incidents or academic patterns. "Shira hit another child after they took the toy she was playing with" is more helpful than "Shira was aggressive today." Identify the Core Issue: What is the specific problem you need to address? Be clear in your own mind. Anticipate Family Concerns: Put yourself in their shoes. What questions might they have? What might make them defensive? Think About Solutions: Don't just present a problem; come with potential solutions or ideas for how to move forward. This shifts the conversation from blame to problem-solving. Choose the Right Time and Place: Schedule a dedicated time for the conversation when you can minimize interruptions and all parties can be fully present. Avoid impromptu discussions at drop-off or pick-up times. Ensure privacy. Consider a Third Party: If emotions are high or the situation is particularly complex, consider having another school administrator or colleague present as a neutral witness or mediator.

      3.During the Conversation: Stay Calm and Professional: Your demeanor can significantly influence the tone of the conversation. Speak in a calm, even tone, and maintain open body language. Avoid fidgeting or crossing your arms. If you feel yourself getting emotional, take a deep breath or ask for a brief pause. Listen Actively and Empathetically: This is perhaps the most crucial step. Give the family your full attention. Let them speak without interruption, even if they're emotional or angry. Listen for the underlying concerns or feelings. Use phrases like, "It sounds like you're feeling frustrated about..." or "I hear that you're concerned about..." Validating their feelings doesn't mean you agree with their perspective, but it shows you're listening and respect their emotions. Focus on the Child's Best Interest (Common Goal): Frame the discussion around the child's well-being and success. Remind the family that you both share a common goal: supporting their child. Use "we" language: "How can we work together to help [child's name] with this?" Stick to Facts, Not Personalities or Assumptions: Present your observations objectively. Avoid assigning blame or making assumptions about the family's intentions or home life. Use "I" Statements: Instead of "Your child is always disruptive," try "I've observed [child's name] frequently talking during instruction time, which is making it difficult for them to complete their work." This focuses on your experience and observations. Avoid Jargon: Speak in clear, accessible language, avoiding educational acronyms or terminology that families might not understand. Be Prepared to Apologize (When Appropriate): If you or the school made a mistake, acknowledge it and apologize sincerely. This can significantly de-escalate tension and rebuild trust. An apology for how a situation made them feel can also be powerful, even if you don't agree with their interpretation of events. Set Clear Boundaries (If Necessary): If a family member becomes verbally abusive or disrespectful, calmly but firmly state your boundaries. "I understand you're upset, but I need you to speak to me respectfully for us to continue this conversation." You can offer to reschedule if they cannot maintain a respectful tone. "Sandwich" Difficult Information: Start with a positive comment about the child, introduce the concern, and end with another positive comment or a collaborative plan for support.

      1. Develop a Plan and Follow Up: Collaborate on Solutions: Work with the family to develop an action plan. What steps will you take? What steps can they take at home? This fosters a sense of partnership and shared responsibility. Set Measurable Goals: Agree on specific, achievable goals and a timeline for checking progress. Schedule a Follow-Up: This demonstrates commitment and allows both parties to assess progress and make adjustments. Document the Conversation: Keep a professional record of the date, attendees, topics discussed, agreed-upon actions, and follow-up plan. This is crucial for future reference and accountability.

      By approaching conflicts with empathy, clear communication, and a focus on collaboration, you can navigate challenging family conversations more effectively and strengthen the home-school partnership.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.

      First of all, we would like to thank the reviewer for their comprehensive feedback, and their insightful comments. Importantly, as you point out, our goal with this model was to build a minimal model of place cell representations, where representations were encouraged to be place-like, but free to vary in tuning and firing locations. By doing so, we could explore what upstream representations facilitate place-like representations, and even remapping (as it turned out) with minimal assumptions. However, we agree that our task does not capture some of the nuances of real-world navigation, such as sensory observations, which could be useful extensions in future work. Then again, the simplicity of our setup makes it easier to interpret the model, and makes it all the more surprising that it learns many behaviors exhibited by real world place cells.

      As to the distribution of phases - we also agree that a hexagonal arrangement likely reflects some optimal configuration for decoding of location.

      And we agree that the symmetry within the experimental data is important; we have revised analyses on experimental phase distributions, and included an analysis of ensemble grid score, to quantify any hexagonal symmetries within the data.

      Weaknesses:

      The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.

      Thank you for raising this point; we absolutely agree that the navigation task is somewhat niche. However, this was a conscious decision, to minimize any possible confounding from alternate input sources, such as observations. In part, this experimental design was inspired by the suggestion that grid cells support navigation/path integration in open-field environments with minimal sensory input (as they could, conceivably do so with no external input). This also pertains to your other point, that boundary interactions are necessary for navigation. In our model, using boundaries is one solution, but there is another way around this problem, which is conceivably better: to path integrate in an egocentric frame, starting from your initial position. Since the locations of place fields are inferred only after a trajectory has been traversed, the network is free to create a new or shifted representation every time, independently of the arena. In this case, one might have expected generalized solutions, such as grid cells to emerge. That this is not the case, seems to suggest that grid cells may somehow not be optimal for pure path integration, or at the very least, hard to learn (but may still play a part, as alluded to by place field locations). We have tried to make these points more evident in the revised manuscript.

      As for the point that the decoding may lead to place-like representations, this is a fair point. Indeed, we did choose this form of decoding, inspired by the localized firing of place cells, in the hope that it would encourage minimally constrained, place-like solutions. However, compared to other works (Sorscher and Xu) hand tuning the functional form of their place cells, our (although biased towards centralized tuning curves) allows for flexible functional forms such as the position of the place cell centers, their tuning width, whether or not it is center-surround activity, and how they should tune to different environments/rooms. This allows us to study several features of the place cell system, such as remapping and field formation. We have revised to make this more clear in the model description.

      The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.

      We agree that boundary interactions could facilitate the convergence of representations after noise injection. We did try to moderate this claim by the wording “attractor-like”, but we agree that boundaries could confound this result. We have therefore performed a modified noise injection experiment, where we let the network run for an extended period of time, before noise injection (and no velocity signal), see Appendix Velocity Ablation in the revised text. Notably, representations converge to their pre-scrambled state after noise injection, even without a velocity signal. However, place-like representations do not converge for all noise levels in this case, possibly indicating that boundary interactions do serve an error-correcting function, also. Thank you for pointing this out.

      As for the attractiveness of contexts, we agree that more analyses were required to demonstrate this. We have therefore conducted a supplementary analysis where we run the trained network with a mismatch in context/geometry, and demonstrate that the context signal fixes the representation, up to geometric distortions.

      The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.

      Thank you for pointing this out; we agree that this is not sufficiently explained and explored in the current version. We have therefore conducted a grid score analysis of the experimental place center distributions, to uncover possible hexagonal symmetries. The reason for choosing this particular animal was in part because it featured the largest number of included cells, while also demonstrating the most striking phase distribution, while including all distributions in the supplementary. Originally, this was only intended as a preliminary analysis, suggesting non-uniformity in experimental place field distributions, but we realize that these may all provide interesting insight into the distributional properties of place cells.

      We have explained these choices in the revised text, and expanded analyses on all animals to showcase these results more clearly.

      Reviewer #2 (Public Review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entorhinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.

      The suggestion about the remapping mechanism shows an interesting theoretical possibility.

      We thank the reviewer for their kind feedback.

      Weaknesses:

      The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.

      The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.

      We appreciate this point, and hope to clarify: From a purely architectural perspective, place-like representations are generated by linear combinations of recurrent unit representations, which, after training, appear border-like. During remapping, the network is simply evaluated/run in different geometries/contexts, which, it turns out, causes the network to exhibit different representations, likely as solutions to optimally encoding position in the different environments. We have attempted to revise the text to make some of these interpretations more clear. We have also conducted a supplementary analysis to demonstrate how representations are determined by the context signal directly, which helps to explain how recurrent and output units form their representations.

      We also agree that our model does not capture the full complexity of the Hippocampal formation. However, we would argue that its simplicity (focusing on a single cell type and a pure path integration task), acts as a useful baseline for studying the role of place cells during spatial navigation. The fact that our model captures a range of place cell behaviors (field formation, remapping and geometric deformation) without grid cells also point to several interesting possibilities, such that grid cells may not be strictly necessary for place cell formation and remapping, or that border cells may account for many of the peculiar behaviors of place cells. However, we wholeheartedly agree that including e.g. sensory information and memory storage/retrieval tasks would prove a very interesting extension of our model to more naturalistic tasks and settings. In fact, our framework could easily accommodate this, e.g. by decoding contexts/observations/memories from the network state, alongside location.

      Reviewer #3 (Public Review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking, and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Thank you for your insightful comments! Regarding the low path integration error, there is a slight statistical signal from the boundaries, as trajectories tend to turn away from arena boundaries. However, we agree, that studying path integration performance in the face of noise would make for a very interesting future development.

      Weaknesses:

      I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.

      First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.

      Thank you for pointing this out. This is an excellent point, that we agree could be addressed more rigorously. Note that there is no position encoding in our model; the initial state of the network is a vector of zeros, and the network must infer its location from boundary interactions and context information alone. So there is no way for positional information to leak through to the recurrent layer directly. However, one possible reason for the observed symmetry breaking, is the fact that the velocity input signal is aligned with the cardinal directions. To investigate this, we trained a new model, wherein input velocities are rotated 45 degrees relative to the horizontal, as you suggest. The results, shown and discussed in appendix E (Learned recurrent representations align with environment boundaries), do indicate that representations are tuned to environment boundaries, and not the cardinal directions, which hopefully improves upon this point.

      Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.

      This is a great point, and we agree. We do write that we perform this encoding to encourage minimally constrained place-like representations (to study their properties), but we have revised to make this more evident.

      However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the code, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?

      Thanks for raising this point! We agree that this finding is surprising, but we hold that it actually shows something quite important: that border-type units are sufficient to create place-like representations, and learns several of the behaviors associated with place cells and remapping (including global remapping and field stretching). In other words, a single cell type known to exist upstream of place cells is sufficient to explain a surprising range of phenomena, demonstrating that other cell types are not strictly necessary. However, we agree that understanding why the boundary type units sometimes rate remap, and whether that can be true for some border type cells in the brain (either directly, or through gating mechanisms) would be important future developments. Related to this point, we also expanded upon the influence of the context signal for representation selection (appendix F)

      Concerning the relationship to other models, we would argue that the simplicity of our model is one of its core strengths, making it possible to disentangle what different cell types are doing. While other models, including TEM, are highly important for understanding how different cell types and brain regions interact to solve complex problems, we believe there is a need for minimal, understandable models that allows us to investigate what each cell type is doing, and this is where we believe our work is important. As an example, our model not only highlights the sufficiency of boundary-type cells as generators of place cells, its lack of e.g. grid cells also suggest that grid cells may not be strictly necessary for e.g. open-field/sensory-deprived navigation, as is often claimed.

      One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!

      This is a good point, and you are correct: one cluster tends to correspond to one unit. To make this more clear, we have revised Fig. 7, so that each decoded center is shaded by unit identity, which makes this more evident. And yes, this is, seemingly in line with some form of optimal packing/encoding of space, yes!

      I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.

      Thanks for raising this point. We agree that we were not clear enough in our original manuscript. We included additional analyses in one animal, to showcase one preliminary case of non-uniform phases. To mitigate this, we have performed the same analyses for all animals, and included a longer discussion of these results (included in the supplementary material). We have also moderated the discussion on Ripley’s H to encompass only non-uniformity, and added a grid score analysis to showcase possible rotational symmetries in the data. We hope this gets our findings across more clearly

      Some smaller weaknesses:

      - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.

      Longer training time did not seem to affect representations. However, due to the long trajectories and statefulness involved, training was time-intensive and could become unstable for very long training. We therefore stopped training at the indicated time.

      - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?

      This is a good point; stability is not guaranteed. We have updated the text to reflect this.

      - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?

      We found that bias tended to have a detrimental effect on training, possibly related to the identity initialization used (see e.g. Le et al. 2015), and found that training improved when biases were fixed to zero.

      - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.

      We agree that other works also provide ways of formalizing this concepts. However, our goal by doing so was to elucidate common features across these seemingly disparate models. We also found that the concept of a learned and target map made it easier to come up with novel models, such as one wherein place cells are constructed to match a grid cell label.

      Aim Achieved? Impact/Utility/Context of Work

      Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.

      Thank you for this perspective; we agree that all of these are important works that arrive at complementary findings. We hold that the importance of our paper lies in its minimal nature, and its focus on place cells, via a purpose-built decoding that enables place-like representations. In doing so, we can point to possibly under explored relationships between cell types, in particular place cells and border cells, while challenging the necessity of other cell types for open-field navigation (i.e. grid cells). In addition, our work points to a novel connection between grid cells, place cells and even border cells, by way of the hexagonal arrangement of place unit centers. However, we agree that expanding our model to include more biologically plausible architectures and constraints would make for a very interesting extension in the future.

      Thank you again for your time, as well as insightful comments.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Even after reading Methods 5.3, I found it hard to understand how the ratemap population vectors that produce Fig 3e and Fig 5 are calculated. It's unclear to me how there can be a ratemap at a single timestep, because calculating a ratemap involves averaging the activity in each location, which would take a whole trajectory and not a single timestep. But I think I've understood from Methods 5.1 that instead the ratemap is calculated by running multiple 'simultaneous' trajectories, so that there are many visited locations at each timestep. That's a bit confusing because as far as I know it's not a common way to calculate ratemaps in rodent experiments (probably because it would be hard to repeat the same task 500 times, while the representations remain the same), so it might be worth explaining more in Methods 5.3.

      We understand the confusion, and have attempted to make this more clear in the revised manuscript. We did indeed create ratemaps over many trajectories for time-dependent plots, for the reasons you mentioned. We also agree that this would be difficult to do experimentally, but found it an interesting way to observe convergence of representations in our simulated scenario.

      Fig 3b-d shows multiple analyses to support output unit global remapping, but no analysis to support the claim that recurrent units remap by rate changes. The examples in Fig 3ai look pretty convincing, but it would be useful to also have a more quantitative result.

      We agree, and only showed that units turn off/become silent using ratemaps. We have therefore added an explicit analysis, showcasing rate remapping in recurrent units (see appendix G; Recurrent units rate remap)

      Reviewer #2 (Recommendations For The Authors):

      Some parts of the current manuscript are hard to follow. Particularly, the model description is not transparent enough. See below for the details.

      Major comments:

      (1) Mathematical models should be explained more explicitly and carefully. I had to guess or desperately search for the definitions of parameters. For instance, define the loss function L in eq.(1). Though I can assume L represents the least square error (in A.8), I could not find the definition in Model & Objective. N should also be defined explicitly in equation (3). Is this the number of output cells?

      Thank you for pointing this out, we have revised to make it more clear.

      (2) In Fig. 1d, how were the velocity and context inputs given to individual neurons in the network? The information may be described in the Methods, but I could not identify it.

      This was described in the methods section (Neural Network Architecture and Training), but we realize that we used confusing notation, when comparing with Fig. 1d. We have therefore changed the notation, and it should hopefully be clearer now. Thanks for pointing out this discrepancy.

      (3) I took a while to understand equations (3) and (4) (for instance, t is not defined here). The manuscript would be easier to read if equations (5) and (6) are explained in the main text but not on page 18 (indeed, these equations are just copies of equations 3 and 4). Otherwise, the authors may replace equations (3) and (4) with verbal explanations similar to figure legend for Fig. 1b.

      (4) Is there any experimental evidence for uniformly strong EC-to-CA1 projections assumed in the non-trainable decoder? This point should be briefly mentioned.

      Thank you for raising this point. The decoding from EC (the RNN) to CA1 (the output layer) consists of a trainable weight matrix, and may thus be non-uniform in magnitude. The non-trainable decoding acts on the resulting “CA1” representation only. We hope that improvements to the model description also makes this more evident.  

      (5) The explanation of Fig. 3 in the main text is difficult to follow because subpanels are explained in separate paragraphs, some of which are very short, as short as just a few lines.

      This presentation style makes it difficult to follow the logical relationships between the subpanels. This writing style is obeyed throughout the manuscript but is not popular in neuroscience.

      Thanks for pointing this out, we have revised to accommodate this.

      (6) Why do field centers cluster near boundaries? No underlying mechanisms are discussed in the manuscript.

      This is a good point; we have added a note on this; it likely reflects the border tuning of upstream units.

      (7) In Fig. 4, the authors presented how cognitive maps may vary when the shape and size of open arenas are modified. The results would be more interesting if the authors explained the remapping mechanism. For instance, on page 8, the authors mentioned that output units exhibit global remapping between contexts, whereas recurrent units mainly rate remapping.

      Why do such representational differences emerge?

      We agree! Thanks for raising this point. We have therefore expanded upon this discussion in section 2.4.

      (8) In the first paragraph of page 10, the authors stated ".. some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row)". I could not understand how Fig. 4d, middle row supports the argument. Similarly, they stated "..some output units reflect their main boundary input (with greater activity near one boundary)." I can neither understand what the authors mean to say nor which figures support the statement. Please clarify.

      This is a good point, there was an identifier missing; we have updated to refer to the correct “magnification”. Thanks!

      (9) The underlying mechanism of generating the hexagonal representation of output cells remains unclear. The decoder network uses a non-trainable decoding scheme based on localized firing patterns of output units. To what extent does the hexagonal representation depend on the particular decoding scheme? Similarly, how does the emergence of the hexagonal representation rely on the border representation in the upstream recurrent network? Showing several snapshots of the two place representations during learning may answer these questions.

      This is an interesting point, and we have added some discussion on this matter. In particular, we speculate whether it’s an optimal configuration for position reconstruction, which is demanded by the task and thus highly likely dependent on the decoding scheme. We have not reached a conclusive method to determine the explicit dependence of the hexagonal arrangement on the choice of decoding scheme. Still, it seems this would require comparison with other schemes. In our framework, this would require changing the fundamental operation of the model, which we leave as inspiration for future work. We have also added additional discussion concerning the relationship between place units, border units, and remapping in our model. As for exploring different training snapshots, the model is randomly initialized, which suggests that earlier training steps should tend to reveal unorganized/uninformative phase arrangements, as phases are learned as a way of optimizing position reconstruction. However, we do call for more analysis of experimental data to determine whether this is true in animals, which would strongly support this observation. We also hope that our work inspires other models studying the formation and remapping of place cells, which could serve as a starting point for answering this question in the future.

      (10) Figure 7 requires a title including the word "hexagonal" to make it easier to find the results demonstrating the hexagonal representations. In addition, please clarify which networks, p or g, gave the results shown here.

      We agree, and have added it!

      Minor comments:

      (11) In many paragraphs, conclusions appear near their ends. Stating the conclusion at the beginning of each paragraph whenever possible will improve the readability.

      We have made several rewrites to the manuscript, and hope this improves readability.

      (12) Figure A4 is important as it shows evidence of the CA1 spatial representation predicted by the model. However, I could not find where the figure is cited in the manuscript. The authors can consider showing this figure in the main text.

      We agree, and we have added more references to the experimental data analyses in the main text, as well as expanded this analysis.

      (13) The main text cites figures in the following format: "... rate mapping of Fig. 3a), i), boundary ...." The parentheses make reading difficult.

      We have removed the overly stringent use of double parentheses, thanks for letting us know.

      (14) It would be nice if the authors briefly explained the concept of Ripley's H function on page 14.

      Yes, we have added a brief descriptor.

    1. The missing element is simple: tension. The tension can be subtle, as it would be if the next line of the story was: …and behind the garden was a freshly dug grave.

      This explanation really helped me see why some stories are boring and others are gripping. Without tension, it’s just a list of facts, not something that makes you want to keep reading. I want to remember to always add some kind of conflict or challenge when I tell stories, even in professional settings

    2. This is not a story; not yet. Are you engaged? It’s unlikely. Are you engrossed? Almost certainly not.

      I think this is a good line about how stories are more than just different statements about something that happened. You need a lot more emotion, adjectives, and a lot of other things in a story to make it memorable or engaging. I think this is very similar to the way we talk as well. When we talk, especially when we are trying to teach something, stating facts is enough. Adding emotion, character, and a lot of other elements make it actually interesting.

    1. It’ll do anything to win the AI race. If that means burying the web, then so be it.

      Whoa. So in order to win a race, Google will kill off the web. Just so it can win the race. Boy, it really shows that this isn't just moving from stage to the next stage. Or a new medium developing while the old medium survives. But it's the old medium being killed off so the new medium can reign supreme.

    1. https://www.reddit.com/r/typewriters/comments/1kunlxr/the_rules_of_typewriter_club/

      Just like most areas of life relating to expertise, it's nice to have a broad set of rules when you start out. Then as your knowledge of the arts and sciences grow, you can begin to "paint outside the lines."

      Once you've used, tinkered on, collected, repaired, or restored more machines than there are rules, then you can consider them more like guidelines and feel free to experiment more freely. By that point you'll have enough experience to be a true typewriter artist. ⛵🧑‍🎨🎨🏴‍☠️

    1. stopgap

      I appreciate the idea of this, I just don't agree that it's a stopgap I think AI used in this way could instead be a gateway for non-offenders to become exposed to CSAM, and for offenders to seek out contact offenses.

    1. A common measurement of code bulk is to count the number of lines of code, and a common measure for software productivity is to count the number of lines of code produced per unit time. These numbers are relatively easy to collect, and have actually demonstrated remarkable success in getting a quick reading of where things stand. But of course the code bulk metric pleases almost no one. Surely more than code bulk is involved in software productivity. If productivity can really be measured as the rate at which lines of code are produced, why not just use a tight loop to spew code as fast as possible, and send the programmers home?

      The modern incarnation of the fallacy of bulk-is-better is judging the quality of a project (esp. a component) by being overly concerned with whether it is more or less "active" than some other project.

      The question posed here is an obvious response for when you encounter instances of the bulk-is-better mindset (although it's not as easy to automate as Cox suggests here).

      I've often mused about the thought of pulling a Reddit* and automating some level of churn within a codebase for a project that's hosted on e.g. GitHub for the sole purpose of making sure that it appears "active".

      * A story about the early days of Reddit has become well-known after the creators volunteered some information about the early days when they made use of a bunch of "fake" accounts to submit links that the creators had aggregated/curated for the purpose of seeding the site.

    1. // 2 cubicTo( x1 = height, y1 = 0f, x2 = height + cornerLength / 3, y2 = 0f, x3 = height + cornerLength, y3 = 0f )

      Interpolation - it's a function f(P1, P2, t) that gives you a point between 2 points for given t ∈ <0, 1> ( imagine you walk on path between point A and point B and you stop after you finished exactly 34,2353% of the whole road. To know what is your current point on path, you calculate interpolation f(A, B, 0.342353) )

      Polynomial interpolation - it's just a interpolation of two interpolations which are interpolations of two interpolations and on and on.... So quadratic interpolation is just a interpolation of two interpolations. (it's a situation when we have 3 points - cubic interpolation it's when we have 4 points)

      Quadratic Bezier curve - it's a curve drawn using interpolation of two interpolations (A -> B, A -> C)

      Cubic Bezier curve - it's a curve drawn using interpolation of two interpolations of two inteprolations 1st interpolations: A->B, B->C, C->D 2nd inteprolations: (A->B)->(B->C), (B->C)->(C->D) 3rd (final) interpolation: ((A->B)->(B->C))->((B->C)->(C->D))

      cubicTo(x1,x2,x3) - it's just drawing a cubic Bezier curve. You might ask, where is the fourth point? The fourth point is where you currently are on Path, it's a starting point.

    1. Jay Z tha mera Badshah, Nelly tha mera king

      "Jay Z tha mera Badshah, Nelly tha mera king"

      Literal Meaning

      • "Jay Z tha mera Badshah" Jay-Z was Raftaar's ultimate idol and "Emperor" (Badshah) in hip-hop – the undisputed master of the craft, business, and empire-building.
      • "Nelly tha mera king": Nelly was Raftaar's "King," a major inspiration for his more melodic flows, catchy delivery, and mainstream appeal.

      The Subtler Meaning within DHH

      • "Jay Z tha mera Badshah" - By saying Jay-Z was his "Badshah" (Emperor), Raftaar is making a nuanced statement about the Indian rapper Badshah. It subtly implies that in Raftaar's hierarchy of hip-hop, Jay-Z holds the ultimate position of "Badshah," implicitly placing any other "Badshah(rapper)" in a secondary or lesser role in his personal pantheon of greatness. It's a way of saying, "My 'Badshah' is a global legend, not just a peer from the Indian scene."

      • "Nelly tha mera king" - Listeners often speculate this also refers to the Indian rapper King (Rocco), as a subtle double entendre or competitive nod.

    2. Meri poori mob deep ’cause it’s tough down there

      "Meri puri mob deep 'cause it's tough down there"

      1. Literally means to have a large, loyal crew or entourage. This meaning fits perfectly with the latter part of the line, "'cause it's tough down there," implying that a strong, numerous crew is necessary for protection and support in a difficult or dangerous environment.

      2. Name-dropping "Mobb Deep", Mobb Deep was an American hip hop duo

      By using Mobb Deep, KR$NA isn't just describing the size of their crew; they are also evoking the specific gritty, street-hardened image and reputation associated with the legendary hip-hop group. This adds a layer of cultural resonance and meaning for listeners familiar with Mobb Deep's work. The phrase "it's tough down there" directly mirrors the kind of lyrical content Mobb Deep was famous for, making the double entendre even more potent.

    1. Poochhe mujhse label, what I brought to the tableJawab hoga kewal, I brought the fuckin’ table

      Poochhe mujhse label, what I brought to the table

      “What I brought to the table” is a common idiom that means what contributions, skills, ideas, or value a person brings to a project, organisation or Business.

      Jawab hoga kewal, I brought the fuckin’ table

      This is the powerful punchline. Instead of listing contributions on the table, KR$NA declares he brought the entire table itself.

      KR$NA has a history of being an independent artist who later signed with Kalamkaar. This line directly addresses the music industry or record labels that often seek to define an artist’s worth or dictate their path. For KR$NA, it’s a defiant statement that he doesn’t just contribute to their system; he defines the system itself.

    1. An annotation is a note added to a book, drawing or any other kind of text as a comment or explanation. It is an age-old learning practice, older than books themselves, one used by medieval scribes in the very process of transcription.

      This shows that annotation isn’t just something new or academic. Instead, it’s something people have been doing for centuries to make sense of what they read and remember it. When we annotate today, we’re really just continuing a long tradition of thinking deeply and staying connected to the text.

    1. reply to u/highspeed_steel at https://reddit.com/r/typewriters/comments/1krspvh/im_totally_blind_and_new_to_typewriters_wax/

      Your question is a great one, but I'll go another direction since I'd dug into some of the history and details of Helen Keller's mid-century typewriters a while back. You can find some details and descriptions here (and in the associated links which includes an accessible video of Ms. Keller using a solid and sexy black Remington Noiseless standard typewriter): https://www.reddit.com/r/typewriters/comments/1ihot96/helen_kellers_typewriters/

      She managed on both her Remington as well as her brailler as well as any sighted person, though obviously had someone to check her printed work.

      I recently saw another heavily modified midcentury typewriter for someone who, if I recall correctly was not only blind, but had no arms. It was set up so that they could move a selector and type using a custom chin rest. Sadly, I didn't index it at the time, but it's interesting to know that such things existed for accessibility reasons.

      As for Braillers, you might appreciate this recent article about a repairman in Britain who was retiring: https://www.theguardian.com/society/2025/jan/02/wed-be-stuck-alarm-as-uks-last-braille-typewriter-repairer-ponders-retirement

      I've got my own brailler, which is a sleek-looking art-deco industrial piece of art with the loveliest shade of dark shiny gray paint I've ever seen on a typewriter. (I'm both a mathematician and information theorist into the areas of coding and cryptography, so Morse code, Braille, etc. are professionally fascinating to me.) I still need to take it apart and repair a few portions to get it back to perfection, but it generally works well.

      As for the aesthetics, I personally enjoy the solid industrial look and feel of the machines from the 1930s-1960s. The early 30s and some 40s have glossy black enamel and machines like the Corona Standard/Silent from the 30s are low slung with flat tops that sort of resemble small pianos and just scream out "I'm a writer" with a flair for dark academia and just a hint of classical Roman design. Many of these machines come with gold tinged water-slide decals which really set themselves off against the black enamel, though on the majority of machines the gold is beginning to dim from time, wear, and uncareful application of cleaning solutions.

      I love the Royal KMM, KMG, and the Remington 17, Standard, and Super-Riter for their industrial chonkiness and (usually) their glass keytops. One of my favorites is the Henry Dreyfuss designed Royal Quiet De Luxe from 1948 which always gives me the feel of what it would look like if a typewriter wore a tuxedo or the 1948 gray and chrome model which is similar but has the feel of a sleek gray flannel suit on a 1950s advertising executive prone to wearing dapper hats, smoking cigarettes, and always with a cocktail in his hand. Into the 50s and 60s almost everyone had moved to plastic keytops which I don't think are as pretty as the older glass keytops with the polished metal rings around them.

      At the opposite end of that spectrum are the late 50s Royal FP and Futura 800s which have some colorful roundness which evokes the aesthetic of the coming space age. They remind me of the modern curves and star shapes of the television show The Jetsons. Similarly space-aged are the sexy curves of the silver metalic spray paint on wooden cases for the Olympia SM3 from the same period. These to me are quintessential typewriter industrial design. In gray, green, maroon, brown, and sometimes yellow crinkle paint with just a hint of sparkle in their keytops I really love the combination of roundedness and slight angularity these German designed machines provide. They have a definite understated sort of elegance most other typewriters just miss. I suspect that late-in-life Steve Jobs would have had an Olympia SM3.

      There's something comforting about the 40s and 50s sports-car vibe of the smaller Smith-Corona portables of the 5 series machines in the 1950s with their racing stripes on the hood. They feel like the sort of typewriter James Dean would have used as a student—just hip enough to be cool while still be solid and functional.

      Sadly into the 70s, while machines typically got a broader range of colors outside of the typical black, gray, and browns things became more plastic and angular. They also begin to loose some of the industrial mid-century aesthetic that earlier machines had. They often feel very 70s in an uncomplimentary way without the fun color combinations or whimsy that art and general design of of that period may have had in the music or fashion spaces. They make me think of politics and war rather than the burgeoning sexual revolution of the time period.

      Interestingly, for me, I feel like most typewriter design was often 10-20 years behind the general design aesthetic/zeitgeist for the particular decades in which they were made.

      Good luck in your search for the right typewriter(s) for your own collection.

  5. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Doxing. December 2023. Page Version ID: 1189390304. URL: https://en.wikipedia.org/w/index.php?title=Doxing&oldid=1189390304 (visited on 2023-12-10).

      The Wikipedia page on doxing defines it as the act of publicly releasing previously private personal information about someone, sometimes with hostile intent. One element that struck me was how doxing is sometimes utilized as a type of online vigilantism, with people believing they are "serving justice" but actually causing genuine harm. This adds to the chapter's thesis that individual harassment has evolved—now it's not just private messages or threats, but a coordinated public shaming that can have severe offline implications like job loss, safety hazards, or mental health crises.

    1. This sociotechnical system is sure to mark me as “risky.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Muhammad Khurram,” and that will trigger an escalation to the next level in the TSA security protocol..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11

      This sentence really resonated with me because even though I’m not trans, I’ve still experienced how rigid systems can make people feel “risky” just for being different. As a person of color, I’ve noticed that certain spaces whether it’s airport security or job interviews automatically treat me with more suspicion. I believe it’s unacceptable that these systems are designed in ways that don’t account for the diversity of real human bodies and identities. In my opinion, design should always prioritize inclusion and comfort, especially in places like airports where people already feel vulnerable.

    2. 14 Airport security is also systematically biased against Disabled people

      This reminded me of when I had to take my little brother, who’s in a wheelchair, to this one older building. The front entrance had a big staircase, and we had to go around the back just to find a ramp. It felt like he wasn’t really supposed to be there, like the design forgot people like him exist. I realize that it's hard to accomodate everyone, but it's important that every design tries its best to do that because the feeling of being an afterthought is very unpleasant and even humiliating.

  6. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. The major goals of a transformative curriculum that fosters multicultural literacy should be to help students to know, to care, and to act in ways that will develop and foster a democratic and just society in which all groups experience cultural democracy and cultural empowerment.

      This statement perfectly summarizes the heart of multicultural education. Banks emphasizes that true multicultural literacy goes beyond awareness—it involves cultivating empathy (to care) and encouraging civic engagement (to act). The goal isn't just academic—it’s deeply ethical and political. Multicultural education should prepare students to participate in building a society where all cultural groups are respected, heard, and empowered. “Cultural democracy” means not just inclusion, but shared ownership of the national narrative and equal opportunity to shape it.

    2. When curriculum transformation occurs, students and teachers make paradigm shifts and view the American and world experience from the perspectives of different racial, ethnic, cultural, and gender groups. Columbus's arrival in the Americas is no longer viewed as a "discovery" but as a cultural contact or encounter that had very different consequences for the Tainos (Arawaks), Europeans, and Africans (Bigelow & Peterson, 2003).

      This defines the heart of multicultural curriculum transformation: a shift in worldview. It’s not just about what we teach, but how we frame it. For example, teaching Columbus as a “discoverer” ignores the perspectives of Indigenous peoples. True transformation requires rethinking the foundational narratives that shape knowledge.

    3. The significant percentage of people of color-including African Americans and Latinos who are in positions of leadership in educational institutions-will continue to work to integrate the experiences of their people into the school and university curricula.

      This passage emphasizes the power of representation and advocacy in education reform. As people of color gain leadership roles in academia, they are pushing to ensure that the curriculum reflects the diverse realities of U.S. society—not just the dominant narratives. I found the statistic that over 50% of public school students are students of color especially compelling; it shows that the demand for inclusive curricula isn't just idealistic—it's necessary. The call for experimental schools for Black males also shows a willingness to rethink educational structures entirely to address long-standing disparities.

    1. n the meantime, for students across the spectrum of disability, navigating the system can be adraining battle. “We find that families of students with a range of needs struggle to get the evaluationsand services that their children need,” said Randi Levine, policy director of Advocates for Children ofAs happens with many children in special education, T.J. frequently did not get services that were recommended, anddeadlines to re-evaluate him came and went. Elizabeth D. Herman for The New York TimesAt 12, He Reads at a First-Grade Level: How New York Failed T.J. - T... https://www.nytimes.com/2018/10/05/nyregion/how-special-education-i...2 of 11 11/4/22, 7:32 PM

      This part reveals how even with funding and systems in place, bureaucracy and technical failure can still prevent students from receiving the support they need. It’s especially frustrating to see how statistics like on-time graduation rates show just how far behind students with disabilities are. Promises of reform sound good, but for families, the day-to-day experience is still exhausting and

    2. Nobody knows how many T.J.s there are in the system, children for whom years were lost andopportunities slipped away. And for each of them, the stakes could not be higher.“We hear stories about this all the time,” said Lori Podvesker, senior manager of disability andeducation policy at IncludeNYC, which advocates for people with disabilities, and a member of thecity’s Panel for Education Policy. “The same story, actually, in which the parents are involved andthey’ve pushed and they’ve been fighting.

      This closing quote is both devastating and deeply revealing. It points to the widespread and largely invisible crisis within the special education system: children like T.J., who are lost not because of a lack of effort from their families, but because of systemic neglect, procedural failure, and institutional inertia. The fact that even engaged, persistent parents—those who are “pushing and fighting”—still face dead ends underscores just how broken the system truly is.

      By stating that “nobody knows” how many students have been failed in this way, the article calls attention to the lack of accountability and transparency in public education systems. This is not a rare outlier; it’s a recurring pattern. Lori Podvesker’s observation that it’s “the same story” across many families turns T.J.’s experience from a personal tragedy into a collective systemic indictment. If parents who advocate persistently are still ignored, what does that say about access to justice for those with fewer resources or less familiarity with the system?

    3. T.J. was just being passed through the system, she recalled thinking. “He isn’t where he’s supposed tobe, and everyone is ignoring it.

      I think this sentence is particularly sad but also very true. At school, sometimes we can see that some students clearly need more help, but they are ignored due to insufficient system resources or teachers being too busy. Just like T.J., he clearly has his own study plan (I.E.P.), but no one really follows up or cares whether it has been implemented. This feeling of being pushed along by the system, I think, is actually experienced by many students. It's not because we don't work hard, but because no one really asks, "What do you need now?" This feeling of being ignored will make people lose confidence more and more and also make learning very lonely.

    1. Drawing on theories discussing gender as a process, homophobia, and intcr-sectionality, this chapter examines the pervasiveness of heteronormativityand the varieties of queerness to help readers understand where bias comesfrom, as well as be attuned to differences in the experiences of gender di-verse, creative, and/or nonconforming students and/or sexual minority stu-dents. Looking at the roots of homophobia in bias against gender diversitywill help link homophobia to transphobia and sexism as well. Examiningsexuality as racialized and gendered, in turn, will illuminate differences inexperiences of sexual minority students across diverse identities and providea fuller understanding of how race structures sexuality. This chapter willhelp readers understand the theories of gender, sexuality, and race rha t haveinfluenced writing and research on LGBTQ students as we

      This introduction makes it clear that understanding gender and sexuality in schools isn't just about identity—it’s also about power and systems. I think it's really important how the chapter connects homophobia with broader structures like sexism and racism. It shows that these biases aren’t isolated but deeply connected, and understanding them requires looking at how identities intersect.

    2. Concerned that the institutional culture of schools not only creates rigidideas about gender but also pits one gender against the other,

      It really is concerning how normalized gender dynamics are, especially amongst younger children, such as splitting teams up by gender or just overall establishing this sense of competition between them. Considering just how young these children are, it's clear that it would impact their development and how they think, allowing them to think that these gender roles or stereotypes are normal and okay.

    3. Transgen<ler stu<lents, too, un<lersta 11Jhow difficult it is to negotiate the dynamics of gender difference and confor-mity, having to strategize their own gender identity in the context of socialexpectations unused to their innovative approaches to enacting gcn<ler orrefusing their birth gender.

      This paragraph made me think more deeply about how much pressure transgender students face just to stay safe or be accepted. It’s painful that some feel forced to either hide who they are or try hard to match strict gender expectations. It shows how school environments can make identity feel like a risk, not a right. It’s not just about fitting in—it’s about avoiding harm.

    4. Research on sexual harassment points ro w;-iysthat girls especially feel pressure to conform to gcndereJ norms or feel thehostility of gender dynamics particularly keenly (American Association ofUniversity Women [AAUW], 2001).

      Thorne’s statement, as well as the studies mentioned above, remind us that gender-based traditional expectations, especially when it comes to conformity, can place an additional socialization burden on girls, just as dress codes often unfairly target women. It’s not just about what to wear; it’s about the control of gender by social rules and everyday activities. This is evident in the broader school environment, where gender-specific power relations influence students’ daily lives, often in subtle but harmful ways.

  7. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. owledge that is most worthwhile is already in place. This notion explains the popularity of E. D. Hirsch's series What Every [First, Second, Third ... ] Grader Needs to Know.9 Geared primarily to parents, this series builds on the fear that their children simply will not measure up if they do not possess the core knowledge ( usually in the form of facts) that they need to succeed in school

      This section stated that multicultural education is not an extra- it’s a basic education. So, I asked, and this text was very important for me that the author agrees to make this clear: the “canon” of knowledge leads to just a narrow strip of society (European, male, upper class) See, it is so sad what we do to our children by keeping them away from different voices, their knowledge out of reach (and thus making them feel as if their knowledge is not important now) and the exclusion of anything else than what naturally occurs in the world only allowing a single story to be told by so-called important others whose voices have been amplified above those of the other races makes a commitment to both of them, to all the children to say an understanding of themselves, to say the world is about something much larger, democratic in that sense. And so, it becomes the important thing to make multicultural literacy a basic.

    1. A recent study from Columbia University found that we’re bogged down by more than 70 decisions a day. The sheer number of decisions we have to make each day leads to a phenomenon called decision fatigue , whereby your brain actually tires like a muscle.

      I agree with this idea. It’s easy to overlook how many decisions we make in a day, but once you start paying attention, it’s overwhelming. From deciding what to wear, to figuring out what to eat, to even choosing which email to reply to first and it all adds up. By the end of the day, I can feel mentally drained, like my brain has just run out of steam. I definitely notice that when I’ve been making decisions non-stop, I become more prone to making mistakes or even procrastinating because I just don’t have the energy to decide anymore. It's like my brain is out of gas.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines a range of advanced ultrastructural imaging approaches to define the unusual endosomal system of African trypanosomes. Compelling images show that instead of a distinct set of compartments, the endosome of these protists comprises a continuous system of membranes with functionally distinct subdomains as defined by canonical markers of early, late and recycling endosomes. The findings suggest that the endocytic system of bloodstream stages has evolved to facilitate the extraordinarily high rates of membrane turnover needed to remove immune complexes and survive in the blood, which is of interest to anyone studying infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Bloodstream stages of the parasitic protist, Trypanosoma brucei, exhibit very high rates of constitutive endocytosis, which is needed to recycle the surface coat of Variant Surface Glycoproteins (VSGs) and remove surface immune complexes. While many studies have shown that the endo-lysosomal systems of T. brucei BF stages contain canonical domains, as defined by classical Rab markers, it has remained unclear whether these protists have evolved additional adaptations/mechanisms for sustaining these very high rates of membrane transport and protein sorting. The authors have addressed this question by reconstructing the 3D ultrastructure and functional domains of the T. brucei BF endosome membrane system using advanced electron tomography and super-resolution microscopy approaches. Their studies reveal that, unusually, the BF endosome network comprises a continuous system of cisternae and tubules that contain overlapping functional subdomains. It is proposed that a continuous membrane system allows higher rates of protein cargo segregation, sorting and recycling than can otherwise occur when transport between compartments is mediated by membrane vesicles or other fusion events.

      Strengths:

      The study is a technical tour-de-force using a combination of electron tomography, super-resolution/expansion microscopy, immune-EM of cryo-sections to define the 3D structures and connectivity of different endocytic compartments. The images are very clear and generally support the central conclusion that functionally distinct endocytic domains occur within a dynamic and continuous endosome network in BF stages.

      Weaknesses:

      The authors suggest that this dynamic endocytic network may also fulfil many of the functions of the Golgi TGN and that the latter may be absent in these stages. Although plausible, this comment needs further experimental support. For example, have the authors attempted to localize canonical makers of the TGN (e.g. GRIP proteins) in T. brucei BF and/or shown that exocytic carriers bud directly from the endosomes?

      We agree with the criticism and have shortened the discussion accordingly and clearly marked it as speculation. However, we do not want to completely abandon our hypothesis.

      The paragraph now reads:

      Lines 740 – 751:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions has been described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      Furthermore, we removed the lines 51 - 52, which included the suggestion of the TGN as a master regulator, from the abstract.

      Reviewer #2 (Public Review):

      The authors suggest that the African trypanosome endomembrane system has unusual organisation, in that the entire system is a single reticulated structure. It is not clear if this is thought to extend to the lysosome or MVB. There is also a suggestion that this unusual morphology serves as a trans-(post)Golgi network rather than the more canonical arrangement.

      The work is based around very high-quality light and electron microscopy, as well as utilising several marker proteins, Rab5A, 11 and 7. These are deemed as markers for early endosomes, recycling endosomes and late or pre-lysosomes. The images are mostly of high quality but some inconsistencies in the interpretation, appearance of structures and some rather sweeping assumptions make this less easy to accept. Two perhaps major issues are claims to label the entire endosomal apparatus with a single marker protein, which is hard to accept as certainly this reviewer does not really even know where the limits to the endosomal network reside and where these interface with other structures. There are several additional compartments that have been defined by Rob proteins as well, and which are not even mentioned. Overall I am unconvinced that the authors have demonstrated the main things they claim.<br /> The endomembrane system in bloodstream form T. brucei is clearly delimited. Compared to mammalian cells it is tidy and confined to the posterior part of the spindleshaped cell. The endoplasmic reticulum is linked to one side of the longitudinal cell axis, marked by the attached flagellum, while the mitochondrion locates to the opposite side. Glycosomes are easily identifiable as spheres, as are acidocalcisomes, which are smaller than glycosomes and – in electron micrographs – are characterized by high electron density. All these organelles extend beyond the nucleus, which is not the case for the endosomal compartment, the lysosome and the Golgi. The vesicles found in the posterior half of the trypanosome cell are quantitatively identifiable as COP1, CCVI or CCVII vesicles, or exocytic carriers. The lysosome has a higher degree of morphological plasticity, but this is not topic of the present work. Thus, the endomembrane system in T. brucei is comparatively well structured and delimited, which is why we have chosen trypanosomes as cell biological model.

      We have published EP1::GFP as marker for the endosome system and flagellar pocket back in 2004. We have defined the fluid phase volume of the trypanosome endosome in papers published between 2002 and 2007. This work was not intended to represent the entirety of RAB proteins. We were only interested in 3 canonical markers for endosome subtypes. We do not claim anything that is not experimentally tested, we have clearly labelled our hypotheses as such, and we do not make sweeping assumptions.

      The approaches taken are state-of-the-art but not novel, and because of the difficulty in fully addressing the central tenet, I am not sure how much of an impact this will have beyond the trypanosome field. For certain this is limited to workers in the direct area and is not a generalisable finding.

      To the best of our knowledge, there is no published research that has employed 3D Tokuyasu or expansion microscopy (ExM) to label endosomes. The key takeaway from our study, which is the concept that "endosomes are continuous in trypanosomes" certainly is novel. We are not aware of any other report that has demonstrated this aspect.

      The doubts formulated by the reviewer regarding the impact of our work beyond the field of trypanosomes are not timely. Indeed, our results, and those of others, show that the conclusions drawn from work with just a few model organisms is not generalisable. We are finally on the verge of a new cell biology that considers the plethora of evolutionary solutions beyond ophistokonts. We believe that this message should be widely acknowledged and considered. And we are certainly not the only ones who are convinced that the term "general relevance" is unscientific and should no longer be used in biology.

      Reviewer #3 (Public Review):

      Summary:

      As clearly highlighted by the authors, a key plank in the ability of trypanosomes to evade the mammalian host’s immune system is its high rate of endocytosis. This rapid turnover of its surface enables the trypanosome to ‘clean’ its surface removing antibodies and other immune effectors that are subsequently degraded. The high rate of endocytosis is likely reflected in the organisati’n and layout of the endosomal system in these parasites. Here, Link et al., sought to address this question using a range of light and three-dimensional electron microscopy approaches to define the endosomal organisation in this parasite.

      Before this study, the vast majority of our information about the make-up of the trypanosome endosomal system was from thin-section electron microscopy and immunofluorescence studies, which did not provide the necessary resolution and 3D information to address this issue. Therefore, it was not known how the different structures observed by EM were related. Link et al., have taken advantage of the advances in technology and used an impressive combination of approaches at the LM and EM level to study the endosomal system in these parasites. This innovative combination has now shown the interconnected-ness of this network and demonstrated that there are no ‘classical’ compartments within the endosomal system, with instead different regions of the network enriched in different protein markers (Rab5a, Rab7, Rab11).

      Strengths:

      This is a generally well-written and clear manuscript, with the data well-presented supporting the majority of the conclusions of the authors. The authors use an impressive range of approaches to address the organisation of the endosomal system and the development of these methods for use in trypanosomes will be of use to the wider parasitology community.

      I appreciate their inclusion of how they used a range of different light microscopy approaches even though for instance the dSTORM approach did not turn out to be as effective as hoped. The authors have clearly demonstrated that trypanosomes have a large interconnected endosomal network, without defined compartments and instead show enrichment for specific Rabs within this network.

      Weaknesses:

      My concerns are:

      i) There is no evidence for functional compartmentalisation. The classical markers of different endosomal compartments do not fully overlap but there is no evidence to show a region enriched in one or other of these proteins has that specific function. The authors should temper their conclusions about this point.

      The reviewer is right in stating that Rab-presence does not necessarily mean Rabfunction. However, this assumption is as old as the Rab literature. That is why we have focused on the 3 most prominent endosomal marker proteins. We report that for endosome function you do not necessarily need separate membrane compartments. This is backed by our experiments.

      ii) The quality of the electron microscopy work is very high but there is a general lack of numbers. For example, how many tomograms were examined? How often were fenestrated sheets seen? Can the authors provide more information about how frequent these observations were?

      The fenestrated sheets can be seen in the majority of the 37 tomograms recorded of the posterior volume of the parasites. Furthermore, we have randomly generated several hundred tiled (= very large) electron micrographs of bloodstream form trypanosomes for unbiased analyses of endomembranes. In these 2D-datasets the “footprint” of the fenestrated flat and circular cisternae is frequently detectable in the posterior cell area.

      We now have included the corresponding numbers in all EM figure legends.

      iii) The EM work always focussed on cells which had been processed before fixing. Now, I understand this was important to enable tracers to be used. However, given the dynamic nature of the system these processing steps and feeding experiments may have affected the endosomal organisation. Given their knowledge of the system now, the authors should fix some cells directly in culture to observe whether the organisation of the endosome aligns with their conclusions here.

      This is a valid criticism; however, it is the cell culture that provides an artificial environment. As for a possible effect of cell harvesting by centrifugation on the integrity and functionality of the endosome system, we consider this very unlikely for one simple reason. The mechanical forces acting in and on the parasites as they circulate in the extremely crowded and confined environment of the mammalian bloodstream are obviously much higher than the centrifugal forces involved in cell preparation. This becomes particularly clear when one considers that the mass of the particle to be centrifuged determines the actual force exerted by the g-forces. Nevertheless, the proposed experiment is a good control, although much more complex than proposed, since tomography is a challenging technique. We have performed the suggested experiment and acquired tomograms of unprocessed cells. The corresponding data is now included as supplementary movie 2, 3 and 4. We refer to it in lines 202 – 206: To investigate potential impacts of processing steps (cargo uptake, centrifugation, washing) on endosomal organization, we directly fixed cells in the cell culture flask, embedded them in Epon, and conducted tomography. The resulting tomograms revealed endosomal organization consistent with that observed in cells fixed after processing (see Supplementary movie 2, 3, and 4).

      We furthermore thank the reviewer for the experiment suggestion in the acknowledgments.

      iv) The discussion needs to be revamped. At the moment it is just another run through of the results and does not take an overview of the results presenting an integrated view. Moreover, it contains reference to data that was not presented in the results.

      We have improved the discussion accordingly.

      Recommendations for the authors:

      The reviewers concurred about the high calibre of the work and the importance of the findings.

      They raised some issues and made some suggestions to improve the paper without additional experiments - key issues include

      (1) Better referencing of the trypanosome endocytosis/ lysosomal trafficking literature.

      The literature, especially the experimental and quantitative work, is very limited. We now provide a more complete set of references. However, we would like to mention that we had cited a recent review that critically references the trypanosome literature with emphasis on the extensive work done with mammalian cells and yeast.

      (2) Moving the dSTORM data that detracts from otherwise strong data in a supplementary figure.

      We have done this.

      (3) Removal of the conclusion that the continuous endosome fulfils the functions of TGN, without further evidence.

      As stated above, this was not a conclusion in our paper, but rather a speculation, which we have now more clearly marked as such. Lines 740 to 751 now read:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      (4) Broader discussion linking their findings to other examples of organelle maturation in eukaryotes (e.g cisternal maturation of the Golgi)

      We have improved the discussion accordingly.

      Reviewer #1 (Recommendations For The Authors):

      What are the multi-vesicular vesicles that surround the marked endosomal compartments in Fig 1. Do they become labelled with fluid phase markers with longer incubations (e.g late endosome/ lysosomal)?

      The function of MVBs in trypanosomes is still far from being clear. They are filled with fluid phase cargo, especially ferritin, but are devoid of VSG. Hence it is likely that MVBs are part of the lysosomal compartment. In fact, this part of the endomembrane system is highly dynamic. MVBs can be physically connected to the lysosome or can form elongated structures. The surprising dynamics of the trypanosome lysosome will be published elsewhere.

      Figure 2. The compartments labelled with EP1::Halo are very poorly defined due to the low levels of expression of the reporter protein and/or sensitivity of detection of the Halo tag. Based on these images, it would be hard to conclude whether the endosome network is continuous or not. In this respect, it is unclear why the authors didn't use EP1-GFP for these analyses? Given the other data that provides more compelling evidence for a single continuous compartment, I would suggest removing Fig 2A.

      We have used EP1::GFP to label the entire endosome system (Engstler and Boshart, 2004). Unfortunately, GFP is not suited for dSTORM imaging. By creating the EP1::Halo cell line, we were able to utilize the most prominent dSTORM fluorescent dye, Alexa 647. This was not primarily done to generate super resolution images, but rather to measure the dynamics of the GPI-anchored, luminal protein EP with single molecule precision. The results from this study will be published separately. But we agree with the reviewer and have relocated the dSTORM data to the supplementary material.

      The observation that Rab5a/7 can be detected in the lumen of lysosome is interesting. Mechanistically, this presumably occurs by invagination of the limiting membrane of the lysosome. Is there any evidence that similar invagination of cytoplasmic markers occurs throughout or in subdomains of the endocytic network (possibly indicative of a 'late endosome' domain)?

      So far, we have not observed this. The structure of the lysosome and the membrane influx from the endosome are currently being investigated.

      The authors note that continuity of functionally distinct membrane compartments in the secretory/endocytic pathways has been reported in other protists (e.g T. cruzi). A particular example that could be noted is the endo-lysosomal system of Dictyostelium discoideum which mediates the continuous degradation and eventual expulsion of undigested material.

      We tried to include this in the discussion but ultimately decided against it because the Dictyostelium system cannot be easily compared to the trypanosome endosome.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Not sure that 'common' is the correct term here. Frequent, near-universal..... it would be true that endocytosis is common across most eukaryotes.

      We have changed the sentence to “common process observed in most eukaryotes” (line 33).

      Immune evasion - the parasite does not escape the immune system, but does successfully avoid its impact, at least at the population level.

      We have replaced the word “escape” with “evasion” (line 35).

      The third sentence needs to follow on correctly from the second. Also, more than Igs are internalised and potentially part of immune evasion, such as C3, Factor H, ApoL1 etcetera.

      We believe that there may be a misunderstanding here. The process of endocytic uptake and lysosomal degradation has so far only been demonstrated in the context of VSGbound antibodies, which is why we only refer to this. Of course, the immune system comprises a wide range of proteins and effector molecules, all of which could be involved in immune evasion.

      I do not follow the logic that the high flux through the endocytic system in trypanosomes precludes distinct compartmentalisation - one could imagine a system where a lot of steps become optimised for example. This idea needs expanding on if it is correct.

      Membrane transport by vesicle transfer between several separate membrane compartments would be slower than the measured rate of membrane flux.

      Again I am not sure 'efficient' on line 40. It is fast, but how do you measure efficiency? Speed and efficiency are not the same thing.

      We have replaced the word “efficient” with “fast” (line 42).

      The basis for suggesting endosomes as a TGN is unclear. Given that there are AP complexes, retromer, exocyst and other factors that are part of the TGN or at least post-G differentiation of pathways in canonical systems, this seems a step too far. There really is no evidence in the rest of the MS that seems to support this.

      Yes, we agree and have clarified the discussion accordingly. We have not completely removed the discussion on the TGN but have labelled it more clearly as speculation.

      I am aware I am being pedantic here, but overall the abstract seems to provide an impression of greater novelty than may be the case and makes several very bold claims that I cannot see as fully valid.

      We are not aware of any claim in the summary that we have not substantiated with experiments, or any hypothesis that we have not explained.

      Moreover, the concept of fused or multifunctional endosomes (or even other endomembrane compartments) is old, and has been demonstrated in metazoan cells and yeast. The concept of rigid (in terms of composition) compartments really has been rejected by most folks with maturation, recycling and domain structures already well-established models and concepts.

      We agree that the (transient) presence of multiple Rab proteins decorating endosomes has been demonstrated in various cell types. This finding formed the basis for the endosomal maturation model in mammals and yeast, which has replaced the previous rigid compartment model.

      However, we do not appreciate attempts to question the originality of our study by claiming that similar observations have been made in metazoans or yeast. This is simply wrong. There are no reports of a functionally structured, continuous, single and large endosome in any other system. The only membrane system that might be similar was described in the American parasite Trypanosoma cruzi, however, without the use of endosome markers or any functional analysis. We refer to this study in the discussion.

      In summary, the maturation model falls short in explaining the intricacies of the membrane system we have uncovered in trypanosomes. Therefore, one plausible interpretation of our data is that the overall architecture of the trypanosome endosomes represents an adaptation that enables the remarkable speed of plasma membrane recycling observed in these parasites. In our view, both our findings and their interpretation are novel and worth reporting. Again, modern cell biology should recognize that evolution has developed many solutions for similar processes in cells, about whose diversity we have learned almost nothing because of our reductionist view. A remarkable example of this are the Picozoa, tiny bipartite eukaryotes that pack the entire nutritional apparatus into one pouch and the main organelles with the locomotor system into the other. Another one is the “extreme” cell biology of many protozoan parasites such as Giardia, Toxpoplasma or Trypanosoma.

      Higher plants have been well characterised, especially at the level of Rab/Arf proteins and adaptins.

      We now mention plant endosomes in our brief discussion of the trypanosome TGN. Lines 744 – 747:

      “A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019).”

      The level of self-citing in the introduction is irritating and unscholarly. I have no qualms with crediting the authors with their own excellent contributions, but work from Dacks, Bangs, Field and others seems to be selectively ignored, with an awkward use of the authors' own publications. Diversity between organisms for example has been a mainstay of the Dacks lab output, Rab proteins and others from Field and work on exocytosis and late endosomal systems from Bangs. These efforts and contributions surely deserve some recognition?

      This is an original article and not a review. For a comprehensive overview the reviewer might read our recent overview article on exo- and endocytic pathways in trypanosomes, in which we have extensively cited the work of Mark Field, Jay Bangs and Joel Dacks. In the present manuscript, we have cited all papers that touch on our results or are otherwise important for a thorough understanding of our hypotheses. We do not believe that this approach is unscientific, but rather improves the readability of the manuscript. Nevertheless, we have now cited additional work.

      For the uninitiated, the posterior/anterior axis of the trypanosome cell as well as any other specific features should be defined.

      In lines 102 - 110 we wrote:

      “This process of antibody clearance is driven by hydrodynamic drag forces resulting from the continuous directional movement of trypanosomes (Engstler et al., 2007). The VSG-antibody complexes on the cell surface are dragged against the swimming direction of the parasite and accumulate at the posterior pole of the cell. This region harbours an invagination in the plasma membrane known as the flagellar pocket (FP) (Gull, 2003; Overath et al., 1997). The FP, which marks the origin of the single attached flagellum, is the exclusive site for endo- and exocytosis in trypanosomes (Gull, 2003; Overath et al., 1997). Consequently, the accumulation of VSG-antibody complexes occurs precisely in the area of bulk membrane uptake.”

      We think this sufficiently introduces the cell body axes.

      I don't understand the comment concerning microtubule association. In mammalian cells, such association is well established, but compartments still do not display precise positioning. This likely then has nothing to do with the microtubule association differences.

      We have clarified this in the text (lines 192 – 199). There is no report of cytoplasmic microtubules in trypanosomes. All microtubules appear to be either subpellicular or within the flagellum. To maintain the structure and position of the endosomal apparatus, they should be associated either with subpellicular microtubules, as is the case with the endoplasmic reticulum, or with the more enigmatic actomyosin system of the parasites. We have been working on the latter possibility and intend to publish a follow-up paper to the present manuscript.

      The inability to move past the nucleus is a poor explanation. These compartments are dynamic. Even the nucleus does interesting things in trypanosomes and squeezes past structures during development in the tsetse fly.

      The distance between the nucleus and the microtubule cytoskeleton remains relatively constant even in parasites that squeeze through microfluidic channels. This is not unexpected as the nucleus can be highly deformed. A structure the size of the endosome will not be able to physically pass behind the nucleus without losing its integrity. In fact, the recycling apparatus is never found in the anterior part of the trypanosome, most probably because the flagellar pocket is located at the posterior cell pole.

      L253 What is the evidence that EP1 labels the entire FP and endosomes? This may be extensive, but this claim requires rather more evidence. This is again suggested at l263. Again, please forgive me for being pedantic, but this is an overstatement unless supported by evidence that would be incredibly difficult to obtain. This is even sort of acknowledged on l271 in the context of non-uniform labelling. This comes again in l336.

      The evidence that EP1 labels the entire FP and endosomes is presented here: Engstler and Boshart, 2004; 10.1101/gad.323404).

      Perhaps I should refrain from comments on the dangers of expansion microscopy, or asking what has actually been gained here. Oddly, the conclusion on l290 is a fair statement that I am happy with.

      An in-depth discussion regarding the advantages and disadvantages of expansion microscopy is beyond the manuscript's intended scope. Our approach involved utilizing various imaging techniques to confirm the validity of our findings. We appreciate that our concluding sentence is pleasing.

      F2 - The data in panel A seem quite poor to me. I also do not really understand why the DAPI stain in the first and second columns fails to coincide or why the kinetoplast is so diffuse in the second row. The labelling for EP1 presents as very small puncta, and hence is not evidence for a continuum. What is the arrow in A IV top? The data in panel B are certainly more in line with prior art, albeit that there is considerable heterogeneity in the labelling and of the FP for example. Again, I cannot really see this as evidence for continuity. There are gaps.... Albeit I accept that labelling of such structures is unlikely to ever be homogenous.

      We agree that the dSTORM data represents the least robust aspect of the findings we have presented, and we concur with relocating it to the supplementary material.

      F3 - Rather apparent, and specifically for Rab7, that there is differential representation - for example, Cell 4 presents a single Rab7 structure while the remaining examples demonstrate more extensive labelling. Again, I am content that these are highly dynamic strictures but this needs to be addressed at some level and commented upon. If the claim is for continuity, the dynamics observed here suggest the usual; some level of obvious overlap of organellar markers, but the representation in F3 is clever but not sure what I am looking at. Moreover, the title of the figure is nothing new. What is also a bit odd is that the extent of the Rab7 signal, and to some extent the other two Rabs used, is rather variable, which makes this unclear to me as to what is being detected. Given that the Rab proteins may be defining microdomains or regions, I would also expect a region of unique straining as well as the common areas. This needs to at least be discussed.

      The differences in the representation result from the dynamics of the labelled structures. Therefore, we have selected different cells to provide examples of what the labelling can look like. We now mention this in the results section.

      The overlap of the different Rab signals was perhaps to be expected, but we now have demonstrated it experimentally. Importantly, we performed a rigorous quantification by calculating the volume overlaps and the Pearson correlation coefficients.

      In previous studies the data were presented as maximal intensity projections, which inherently lack the complete 3D information.

      We found that Rab proteins define microdomains and that there are regions of unique staining as well as common areas, as shown in Figure 3. The volumes do not completely overlap. This is now more clearly stated in lines 315 – 319:

      “These objects showed areas of unique staining as well as partially overlapping regions. The pairwise colocalization of different endosomal markers is shown in Figure 3 A, XI - XIII and 3 B. The different cells in Figure 3 B were selected to represent the dynamic nature of the labelled structures. Consequently, the selected cells provide a variety of examples of how the labelling can appear.”

      This had already been stated in lines 331 – 336:

      “In summary, the quantitative colocalization analyses revealed that on the one hand, the endosomal system features a high degree of connectivity, with considerable overlap of endosomal marker regions, and on the other hand, TbRab5A, TbRab7, and TbRab11 also demarcate separated regions in that system. These results can be interpreted as evidence of a continuous endosomal membrane system harbouring functional subdomains, with a limited amount of potentially separated early, late or recycling endosomes.”

      F4-6 - Fabulous images. But a couple of issues here; first, as the authors point out, there is distance between the gold and the antigen. So, this of course also works in the z-plane as well as the x/y-planes and some of the gold may well be associated with membraneous figures that are out of the plane, which would indicate an absence of colinearity on one specific membrane. Secondly, in several instances, we have Rab7 essentially mixed with Rab11 or Rab5 positive membrane. While data are data and should be accepted, this is difficult to reconcile when, at least to some level, Rab7 is a marker for a late-endosomal structure and where the presence of degradative activity could reside. As division of function is, I assume, the major reason for intracellular compartmentalisation, such a level of admixture is hard to rationalise. A continuum is one thing but the data here seem to be suggesting something else, i.e. almost complete admixture.

      We are grateful for the positive feedback regarding the image quality. It is true that the "linkage error," representing the distance between the gold and the antigen, also functions to some extent in the z-axis. However, it's important to note that the zdimension of the section in these Figures is 55 nm. Nevertheless, it's interesting to observe that membranes, which may not be visible within the section itself but likely the corresponding Rab antigen, is discernible in Figure 4C (indicated by arrows).

      We have clarified this in lines 397 – 400:

      “Consequently, gold particles located further away may represent cytoplasmic TbRab proteins or, as the “linkage error” can also occur in the z-plane, correspond to membranes that are not visible within the 55 nm thickness of the cryosection (Figure 4, panel C, arrows). “

      The coexistence of different Rabs is most likely concentrated in regions where transitions between different functions are likely. Our focus was primarily on imaging membranes labelled with two markers. We wanted to show that the prevailing model of separate compartments in the trypanosome literature is not correct.

      F7 - Not sure what this adds beyond what was published by Grunfelder.

      First, this figure is an important control that links our results to published work (Grünfelder et al. (2003)). Second, we include double staining of cargo with Rab5, Rab7, and Rab11, whereas Grünfelder focused only on Rab11. Therefore, our data is original and of such high quality that it warrants a main figure.

      F8 - and l583. This is odd as the claim is 'proof' which in science is a hard thing to claim (and this is definitely not at a six sigma level of certainty, as used by the physics community). However, I am seeing structures in the tomograms which are not contiguous - there are gaps here between the individual features (Green in the figure).

      We have replaced the term "proof". It is important to note that the structures in individual tomograms cannot all be completely continuous because the sections are limited to a thickness of 250 nm. Therefore, it is likely that they have more connectivity above and below the imaged section. Nevertheless, we believe that the quality of the tomograms is satisfactory, considering that 3D Tokuyasu is a very demanding technique and the production of serial Tokuyasu tomograms is not feasible in practice.

      Discussion - Too long and the self-citing of four papers from the corresponding author to the exclusion of much prior work is again noted, with concerns about this as described above. Moreover, at least four additional Rab proteins are known associated with the trypanosome endosomal system, 4, 5B, 21 and 28. These have been completely ignored.

      We have outlined our position on referencing in original articles above. We also explained why we focused on the key marker proteins associated with early (Rab5), late (Rab7) and recycling endosomes (Rab11). We did not ignore the other Rabs, we just did not include them in the present study.

      Overall this is disappointing. I had expected a more robust analysis, with a clearer discussion and placement in context. I am not fully convinced that what we have here is as extreme as claimed, or that we have a substantial advance. There is nothing here that is mechanistic or the identification of a new set of gene products, process or function.

      We do not think that this is constructive feedback.

      This MS suggests that the endosomal system of African trypanosomes is a continuum of membrane structures rather than representing a set of distinct compartments. A combination of light and electron microscopy methods are used in support. The basic contention is very challenging to prove, and I'm not convinced that this has been. Furthermore, I am also unclear as to the significance of such an organisation; this seems not really addressed.

      We acknowledge and respect varying viewpoints, but we hold a differing perspective in this matter. We are convinced that the data decisively supports our interpretation. May future work support or refute our hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      Line 81 - delete 's

      Done.

      Generally, the introduction was very well written and clearly summarised our current understanding but the paragraph beginning line 134 felt out of place and repeated some of the work mentioned earlier.

      We have removed this paragraph.

      For the EM analysis throughout quantification would be useful as highlighted in the public review. How many tomograms were examined, and how often were types of structures seen? I understand the sample size is often small but this would help the reader appreciate the diversity of structures seen.

      We have included the numbers.

      Following on from this how were the cells chosen for tomogram analysis? For example, the dividing cell in 1D has palisades associating with the new pocket - is this commonly seen? Does this reflect something happening in dividing cells. This point about endosomal division was picked up in the discussion but there was little about in the main results.

      This issue is undoubtedly inherent to the method itself, and we have made efforts to mitigate it by generating a series of tomograms recorded randomly. We have refrained from delving deeper into the intricacies of the cell cycle in this manuscript, as we believe that it warrants a separate paper.

      As the authors prosecute, the co-localisation analysis highlights the variable nature of the endosome and the overlap of different markers. When looking at the LM analysis, I was struck by the variability in the size and number of labelled structures in the different cells. For example, in 3A Rab7 is 2 blobs but in 3B Cell 1 it is 4/5 blobs. Is this just a reflection of the increase in the endosome during the cell cycle?

      The variability in representation is a direct consequence of the dynamic nature of the labelled structures. For this reason, we deliberately selected different cells to represent examples of how the labelling can look like. We have decided not to mention the dynamics of the endosome during the cell cycle. This will be the subject of a further report.

      Moreover, Rab 11 looks to be the marker covering the greatest volume of the endosomal system - is this true? I think there's more analysis of this data that could be done to try and get more information about the relative volumes etc of the different markers that haven't been drawn out. The focus here is on the co-localisation.

      Precisely because we recognize the importance of this point, we intend to turn our attention to the cell cycle in a separate publication.

      I appreciate that it is an awful lot of work to perform the immuno-EM and the data is of good quality but in the text, there could be a greater effort to tie this to the LM data. For example, from the Rab11 staining in LM you would expect this marker to be the most extensive across the networks - is this reflected in the EM?

      For the immuno-EM there were no numbers, the authors had measured the position of the gold but what was the proportion of gold that was in/near membranes for each marker? This would help the reader understand both the number of particles seen and the enrichment of the different regions.

      Our original intent was to perform a thorough quantification (using stereology) of the immuno-EM data. However, we later realized that the necessary random imaging approach is not suitable for Tokuyasu sections of trypanosomes. In short, the cells are too far apart, and the cell sections are only occasionally cut so that the endosomal membranes are sufficiently visible. Nevertheless, we continue to strive to generate more quantitative data using conventional immuno-EM.

      The innovative combination of Tokuyasu tomograms with immuno-EM was great. I noted though that there was a lack of fenestration in these models. Does this reflect the angle of the model or the processing of these samples?

      We are grateful to the referee, as we have asked ourselves the same question. However, we do not attribute the apparent lack of fenestration to the viewing angle, since we did not find fenestration in any of the Tokuyasu tomograms. Our suspicion is more directed towards a methodological problem. In the Tokuyasu workflow, all structures are mainly fixed with aldehydes. As a result, lipids are only effectively fixed through their association with membrane proteins. We suggest that the fenestration may not be visible because the corresponding lipids may have been lost due to incomplete fixation.

      We now clearly state this in the lines 563 – 568.

      “Interestingly, these tomograms did not exhibit the fenestration pattern identified in conventional electron tomography. We suspect that this is due to methodological reasons. The Tokuyasu procedure uses only aldehydes to fix all structures. Consequently, effective fixation of lipids occurs only through their association with membrane proteins. Thus, the lack of visible fenestration is likely due to possible loss of lipids during incomplete fixation.”

      The discussion needs to be reworked. Throughout it contains references to results not in the main results section such as supplementary movie 2 (line 735). The explicit references to the data and figures felt odd and more suited to the results rather than the discussion. Currently, each result is discussed individually in turn and more effort needs to be made to integrate the results from this analysis here but also with previous work and the data from other organisms, which at the moment sits in a standalone section at the end of the discussion.

      We have improved the discussion and removed the previous supplementary movies 2 and 3. Supplementary movie 1 is now mentioned in the results section.

      Line 693 - There was an interesting point about dividing cells describing the maintenance of endosomes next to the old pocket. Does that mean there was no endosome by the new pocket and if so where is this data in the manuscript? This point relates back to my question about how cells were chosen for analysis - how many dividing cells were examined by tomography?

      The fate of endosomes during the cell cycle is not the subject of this paper. In this manuscript we only show only one dividing cell using tomography. An in-depth analysis focusing on what happens during the cell cycle will be published separately.

      Line 729 - I'm unclear how this represents a polarization of function in the flagellar pocket. The pocket I presume is included within the endosomal system for this analysis but there was no specific mention of it in the results and no marker of each position to help define any specialisation. From the results, I thought the focus was on endosomal co-localisation of the different markers. If the authors are thinking about specialisation of the pocket this paper from Mark Field shows there is evidence for the exocyst to be distributed over the entire surface of the pocket, which is relevant to the discussion here. Boehm, C.M. et al. (2017) The trypanosome exocyst: a conserved structure revealing a new role in endocytosis. PLoS Pathog. 13, e1006063

      We have formulated our statement more cautiously. However, we are convinced that membrane exchange cannot physically work without functional polarization of the pocket. We know that Rab11, for example, is not evenly distributed on the pocket. By the way, in Boehm et al. (2017) the exocyst is not shown to cover the entire pocket (as shown in Supplementary Video 1).

      We now refer to Boehm et al. (Lines 700 – 703):

      “Boehm et al (2017) report that in the flagellar pocket endocytic and exocytic sites are in close proximity but do not overlap. We further suggest that the fusion of EXCs with the flagellar pocket membrane and clathrin-mediated endocytosis take place on different sites of the pocket. This disparity explains the lower colocalization between TbRab11 and TbRab5A.”

      Line 735 - link to data not previously mentioned I think. When I looked at this data I couldn't find a key to explain what all the different colours related to.

      We have removed the previous supplementary movies 2 and 3. We now reference supplementary movie 1 in the results section.

    1. Sometimes even well-intentioned efforts can do significant harm. For example, in the immediate aftermath of the 2013 Boston Marathon bombing, FBI released a security photo of one of the bombers and asked for tips. A group of Reddit users decided to try to identify the bomber(s) themselves. They quickly settled on a missing man (Sunil Tripathi) as the culprit (it turned out had died by suicide and was in no way related to the case), and flooded the Facebook page set up to search for Sunil Tripathi, causing his family unnecessary pain and difficulty. The person who set up the “Find Boston Bomber” Reddit board said “It Was a Disaster” but “Incredible” [p26], and Reddit apologized for online Boston ‘witch hunt’ [p27].

      There are instances where this can help and ways that it can do more harm than good. Intentions here were good but with joint witch hunting, it seems that folks online can get blinded from the possible repercussions of identifying a criminal incorrectly. It sure does happen already in the justice system but obviously can extend the harm due to human error in the online space. I still do think it's a good thing such as locating a vehicle using a license plate number in an Amber alert but when it comes to just a person looks, it can get hairy.

    1. Because of these (and other) differences, different forms of communication might be preferable for different tasks.

      I hadn’t really thought about how many dimensions go into communication until I read this. The breakdown—especially things like synchronicity and archiving—made me reflect on my own habits. For example, I love texting because I can respond when it’s convenient, but I never considered how helpful it is that messages are automatically saved. That’s really different from face-to-face conversations, where things can be easily forgotten or misunderstood. I also realized how often I choose communication tools based on how “safe” or anonymous I feel in the moment. This made me more aware of how tech shapes not just how we talk, but how we relate to people.

    1. Instead of carrying hometwo babies wrapped in blue blankets, Shauntay brought home one babyand planned a funeral for the other.

      This sentence really stood out to me because it puts a human face on the statistics about racial disparities in infant mortality. Instead of just giving us numbers, Bridges shares a story that makes the issue feel real and emotional. It made me think about how often data is used without considering the people behind it. I wonder if public health would get more attention or funding if more stories like this were highlighted. It’s a heartbreaking reminder of the impact of systemic inequality in health care.

    1. He indicates, at the very beginning of the account, that he wrote in an age where apostates and doubters of religious truths were inexplicably ascendant.

      So from the start he’s telling us that he’s writing during a time where faith is shaky. That kind of explains why he’s writing the way he is , it’s not just a story, it’s a defense of belief.

    1. One feels that Islam is ripe for wholesale dismantling, to be replaced by either a better reconstructed version or an altogether different sociointellectual system.

      It almost feels like some people think Islam, the way it’s been practiced or understood, needs to be torn down completely. Like, not just changed around the edges, but fully rethought. Either rebuilt into something that fits better with the modern world or replaced with a whole new way of thinking and living that speaks more to where people are now, socially, spiritually, intellectually.

    1. meaning his birth occurred under the shadow of a celestial alignment that predetermined his political ascendancy.

      Basically, Yazdi’s saying his rise to power was fate. He didn’t just take control. He was always meant to rule. It’s a way of justifying everything he did before the story even gets going.

    2. The beginning of the Zafarnama makes the point that when he was born, “a world came forth into the world in a human form” (Yazdi, Zafarnama, 1:235)

      This line is dramatic on purpose. It sets Tamerlane up as more than just a person. It’s like the text is saying his existence changed the world from the moment he arrived.

    1. Fixed in placid and parsimonious habits on the outside, Silva’s inner life has a vivacity that transforms with the genre of work he is contracted to proofread. He is given a manuscript to proofread that is a historical work on an event with a deep connection to the city and the quarter in which he lives. The event is the siege of Lisbon in 1147 by Dom Afonso Henriques (d. 1185), ruler of the county of Portugal. Afonso’s capture of Lisbon greatly enlarged his territory, as he found the kingdom of Portugal that has continued to exist as a state, in one form or another, from the twelfth century to the present. As Silva reads the historian’s account of the city’s siege in the twelfth century, his active imagination starts to plot the presence of the army outside the walls of the quarter in which he lives in the twentieth century.

      Silva might seem super routine on the outside, just going through the motions, but once he gets into a manuscript, his imagination kicks in heavy. It’s like whatever he’s reading starts to shape how he sees the world around him. This time, he’s proofreading a history book about something that actually happened right where he lives, the siege of Lisbon in 1147.

    1. It can be argued that the primary significance of Jerusalem beyond its immediate environs rests not on the physical city in the Middle East but in the way it has been imagined by Muslims, Jews, and Christians the world over in the manner we see in the case of Kudus. The Middle Eastern city is the subject of a large literature concerned with what is found in its space. But this literature has little to do with the way Jerusalem has been the object of religious attention by those living far away.

      You could say that what really makes Jerusalem important, especially outside of its actual location, isn’t just the physical city itself, it’s how people all over the world imagine it. Muslims, Jews, and Christians have each built deep connections to Jerusalem, and a lot of that comes from religious meaning, not just geography. Kudus (the way Muslims refer to it) shows this really well. It’s less about what’s physically there and more about what the city represents spiritually and emotionally.

    1. It came about after this that his master's wife cast her eyes on Joseph and said, "Lie with me." 8 But he refused and said to his master's wife, "Look, my master does not pay attention to what I do in the house, and he has put everything that he owns under my care. 9 No one is greater in this house than I am. He has not kept back anything from me but you, because you are his wife. How then can I do this great wickedness and sin against God?"

      Joseph was tested by Yahweh through the pharoah’s wife to see if he would sin, and while Joseph stayed true to his beliefs, he was later on accused of advancing on her. Joseph was tested and put through suffering and almost faced exile and death but justice prevailed in the end. Joseph ended up rising up to a high position of power in Egypt, and saved many people from famine. From a theological perspective, Yahweh in the Old Testament is hard, but fair. This teaches that Yahweh isn't a cruel God, but one that allows tests of faith and if you're like Joseph, you will be redeemed for having that faith. Yahweh teaches a just world, where justice is always served, even if it’s delayed sometimes.

    1. Français

      Canada’s two official languages are English and French. As such, it’s not only important that an official Canadian government website provides the option for both languages, but having that option to toggle between them higher up on the page greatly benefits those who might require text-to-speech or who are unable to use a mouse as they wouldn’t need to navigate to somewhere like the bottom of the page just to change it.

    1. although Django has special conveniences for building “CMS-y” apps, that doesn’t mean it’s not just as appropriate for building “non-CMS-y” apps

      I now wonder if I can build a powerful CRM on Django.

    1. Throughout the history of educational technology, a majority of scholars emphasized the possibilities of technology; however, several scholars emerged as contrarians and technology Cassandras. For example, Postman (1993/2011), McLuhan and Fiore (1967), and Turkle (1984) cautioned that technologies created new environments, fundamentally altering individuals and society. Cuban (2009) famously critiqued technologies as “oversold and underused,” while Feenberg (1991) argued that technologies increase inequality and threaten democracy if society fails to democratize the use of technologies. Watters (2020) scrutinized educational technology’s historical (and failed) attempts to “personalize” education. Selwyn (2016) reminded the field to explore technologies as socially embedded and to take a broader and, often, a more global, perspective to consider the potential harms of technologies.

      This quote stands out because it presents a more critical lens on educational technology, one that challenges the common belief that more tech always means better learning. Cuban’s phrase “oversold and underused” reflects the reality that many tools don’t live up to their promises in the classroom. Feenberg’s warning about inequality and democracy highlights the broader social impact of tech use. As a future teacher, this pushes me to think more deeply about why I use certain technologies, not just how. It’s a reminder that I need to be thoughtful, inclusive, and critical in my approach to ed tech.

    1. "Design shapes our ability to access, participate in, and contribute to the world."

      This powerful quote from Kat Holmes emphasizes that design is not neutral, it directly impacts inclusion, opportunity, and equity. When applied to education, it reminds me that the way we design learning environments, tools, and systems can either support or hinder students' ability to thrive. As a future educator, this encourages me to be intentional about creating accessible, inclusive, and participatory learning experiences. Good design isn’t just about aesthetics, it’s about justice and belonging in the classroom.

    1. They demonstrate an ability to manage their personal data and examine how it is being used andtracked by other

      This quote highlights the long-term impact of students’ digital behavior and the importance of teaching responsible online habits. It reminds me that digital citizenship is more than just avoiding harm, it’s about actively building a positive presence online. As a future educator, I want to help students understand that what they post, share, and engage with online can follow them into college and careers. Teaching students to manage their digital footprint thoughtfully is just as important as teaching traditional academic skills in today’s world.

    1. This is a huge shift from how we viewed the privacy of our communications during the analog era.

      This quote highlights how dramatically our expectations of privacy have changed in the digital age. In the past, there were clear legal protections around private communication, like phone calls. But now, with digital tools and platforms, those lines are much blurrier, especially when it comes to what companies and governments can access. As a future educator, this reminds me that teaching digital literacy isn’t just about using tools, it’s also about helping students understand their rights, how data is collected, and how to protect their privacy online.

    1. As educators, we must strive to create fully accessible learning environments for our students. This requires designinginclusive learning environments and evaluating the accessibility of digital tools and apps before using them in theclassroom to ensure all learners have the same opportunities to access and engage with course content.75

      This quote highlights the critical responsibility educators have to ensure equity in learning by making environments accessible to all students. It reminds me that inclusivity isn’t just a nice-to-have; it’s essential for fair education. The emphasis on evaluating digital tools before using them is especially important in today’s tech-driven classrooms, where some apps or resources may unintentionally exclude learners with disabilities. This reinforces the need for me, as a future teacher, to be diligent about choosing tools that support diverse learning needs and to advocate for accessibility in all aspects of teaching.

    1. A PLN is made up of people, spaces, AND tools that support your ongoing learning and professional growth

      This quote stands out because it expands the idea of a Personal Learning Network beyond just the people we connect with. It emphasizes that the environments we engage in and the tools we use are equally important. As a future teacher, this reminds me that professional growth doesn't happen in isolation. I need to be intentional about the spaces I frequent (like online forums or conferences) and the tools I rely on (like note-taking apps or blogs). It’s also a good reminder that building a PLN isn’t just about networking, it’s about continuously seeking resources and environments that push my thinking, reflect diverse perspectives, and keep me engaged in lifelong learning.

    1. If sex-education has long been accepted by parentsand administrators, what’s stopping schools from including the LGBTQ+ dialogue in the classcurriculum? Perhaps the idea of sex-education only caters to cisgender and heterosexualaudiences. It’s okay for discourses regarding male and female genitalia to pervade classdiscussions, or for students to snide and chuckle whenever pictures of condoms appear on theprojector screen; but it’s not okay if a student wishes to understand why his male classmate iswearing a skirt in class, or why his female peer cut her hair short, or why he feels attracted to theboy sitting next to him at lunch.

      Rather than truly being out of "concern" for the safety of children, it feels like the inability to discuss the LGBTQ+ community with children really just feels like a desire to continue perpetuating a heteronormative society. This may in turn cause a lack of acceptance when they encounter those that are different later on, or self resentment if they believe themselves to be "different" from others.

    2. I thought for a very long time that I was introverted. I realized that I just wanted to bemy true and genuine self - and that’s difficult if people act like it’s weird”

      This statement is very relatable to many people, including myself. Many kids have trouble being themselves over fear of how others will react. Will they lose their friends? lose their social status? It can be very scary for anyone to be themselves. The easiest way is to find a group of people who you trust and have similar interests, like a team.

    3. “I thought for a very long time that I was introverted. I realized that I just wanted to bemy true and genuine self - and that’s difficult if people act like it’s weird”

      This sentence is simple yet profound. Starting from a personal reflection, it exquisitely shows the state of mind of the LGBTQ+ community as they struggle between self - identification and external perceptions, and reflects the intangible pressure exerted on individuals by social concepts.

    4. “I thought for a very long time that I was introverted. I realized that I just wanted to bemy true and genuine self - and that’s difficult if people act like it’s weird”

      I’ve heard classmates say almost the same thing—that they thought they were “quiet,” but really they were just hiding parts of themselves to avoid judgment. It shows how quick labels at school can push people to shrink back. When peers and teachers welcome different personalities, students feel safer showing who they really are.

    5. She believed that being successful in school, to some degree, also meant becomingcisgender and heterosexual

      This sentence left a particularly deep impression on me because it made me think that many times, a family's definition of "success" is actually conditional. It's not just about having good grades, but also about conforming to a certain "normal" appearance, such as cisgender or heterosexuality. I also feel this hidden pressure. It seems that once I'm different from others, I'll be regarded as "inattentive" or "not good enough". This made me start to reflect on whether it is possible for schools and families to challenge this single "success" standard, so that more students of different identities can be seen and supported.

    6. She believed that being successful in school, to some degree, also meant becomingcisgender and heterosexual

      This sentence left a particularly deep impression on me because it made me think that many times, a family's definition of "success" is actually conditional. It's not just about having good grades, but also about conforming to a certain "normal" appearance, such as cisgender or heterosexuality. I also feel this hidden pressure. It seems that once I'm different from others, I'll be regarded as "inattentive" or "not good enough". This made me start to reflect on whether it is possible for schools and families to challenge this single "success" standard, so that more students of different identities can be seen and supported.

    7. In terms of coming out, I suppose I never necessarily “came out” as a whole event. I feellike in the early 2010s, a lot of people viewed coming out as one huge thing in your life. Iwould say that perception comes out as this huge shocking thing. But when you’reactually LGBTQ+, you’re potentially coming-out whenever you meet someone new.(Ngo, 2022

      Thi Ngo challenges the mainstream, often romanticized idea of "coming out" as a single, dramatic, once-in-a-lifetime event. This perception—reinforced by media and cultural narratives—fails to reflect the lived reality of LGBTQ+ individuals, who must navigate visibility and disclosure repeatedly across different settings, relationships, and stages of life.

      By describing coming out as a recurring process, Ngo emphasizes how exhausting and emotionally taxing it can be to continually assess safety and acceptance in new social contexts. This quote also sheds light on the emotional labor queer people endure—deciding when, how, and if they should reveal parts of themselves to others. It’s not just about a one-time declaration; it’s a constant negotiation of identity and safety.

    8. “I thought for a very long time that I was introverted. I realized that I just wanted to bemy true and genuine self - and that’s difficult if people act like it’s weird”

      This quote powerfully captures the emotional and social toll of invisibility that many LGBTQ+ youth experience. Ngo’s reflection highlights a common misunderstanding: what might appear as introversion or social withdrawal can actually stem from a deeper, internal conflict—the fear of judgment for simply being oneself. In environments where queer and non-binary identities are viewed as “weird” or abnormal, students like Ngo often feel forced to suppress their personality in order to stay safe or accepted.

      Rather than a lack of sociability, this is a survival strategy. It underscores the importance of affirming environments where youth can express their identity without fear. When Ngo says he “just wanted to be [his] true and genuine self,” it reminds us that authenticity isn't possible without cultural and institutional support. This makes me wonder, how many students are mislabeled as shy, antisocial, or unmotivated when in reality they’re struggling to feel seen and safe? What can educators do to create spaces where being different isn’t treated as “weird,” but as valuable?

    9. A lot of the times when people label things to be age-appropriate or age-inappropriate,it’s not because of their genuine concern for their child. It’s because of their belief that, ‘Idon’t want my kids to learn about things that I personally do not understand, or thingsthat I do not wish to understand

      I found this really interesting, and I agree with the author. My brother in 8th grade recently had a sex ed class that included conversations on transgender individuals. The parents were unaware of this new addition of information and were outraged in the community boards on Facebook. I remember a lot of the comments were saying that what the students learned were completely inappropriate, that kids were too young to learn about "men being able to be pregnant," and that the material should've been kept for the SexEd classes in high school. From seeing the comments and the discourse, what really stood out to me was the personal discomfort and misunderstanding rather than concern for the students. I agree with this statement from the author in that often what is deemed age-inappropriate is associated with parents protecting their own beliefs or avoiding conversations that they are just not ready for or do not wish to have. This also made me realize the role parents have in limiting what a kid is allowed or able to learn.

    1. The overrepresentation of a particular groupcan manifest itself in several ways:• Culturally and linguistically diverse (CLD) students can be over-identified for specialeducation at the national, state, and district levels.• CLD students can receive special education services at higher rates in more segregated orrestrictive programs.• They can be overrepresented in specific special education categories, such as emotionaldisturbance and intellectual disabilities.• They can experience disciplinary actions, such as suspensions and expulsions, at higherrates than other students.

      This list makes it clear that overrepresentation isn’t just about numbers—it’s about outcomes. The pattern described here reflects deep-rooted structural racism that affects how CLD students are perceived, placed, treated, and disciplined in schools. How might these patterns be challenged through inclusive practices, culturally responsive training, and community advocacy?

    2. The disproportionate representation of students of color in special education is a serious concernthat has lasted for forty years. Research suggests that students of color are too often not identifiedaccurately for special education and that the programs they are placed in are frequently poor inquality. This trend contributes to a less-than-optimal learning environment that lowers theirchances for future success. Some of the factors that may contribute to this problem include povertyand inaccurate teacher perceptions. To reduce this problem, teachers can be trained to be culturallyresponsive and the public-school system can be improved so that students from low-incomehouseholds receive better services

      This passage introduces the central issue of the article: the longstanding systemic inequality in how students of color are identified and served in special education. The concern isn’t just about numbers—it’s about misidentification and inequitable treatment. Students of color are often either over-identified for special education (particularly in subjective categories like emotional or intellectual disabilities) or under-identified when they genuinely need support. Even when placed, the quality of the programs they enter is often subpar, reinforcing a cycle of educational marginalization.

    1. Despite what sometimes seems to be an overwhelmingly hostile contextin schools, the concerted efforts of students, teachers, administrators, andother members of the school community can shift school climates. As the2019 GLSEN survey (Kosciw et al., 2020) shows, schools can make a differ-ence in the experiences of LGBTQ youth.

      This statement offers a crucial reminder: school climates are not fixed—they are changeable. While the chapter outlines the many forms of bias and harm LGBTQ+ students face, this quote shifts the focus toward collective responsibility and the potential for transformation. By emphasizing the role of not just students, but also teachers, administrators, and broader school communities, Mayo reinforces that change doesn't rest on one person or policy alone. It’s a collaborative effort that requires commitment across all levels of a school system.

      The reference to the 2019 GLSEN survey provides data-backed hope—evidence that inclusive curricula, GSAs, and active adult intervention do lead to better outcomes: less harassment, more feelings of safety, and stronger student connections to their schools. This makes the case for proactive, rather than reactive, strategies to support LGBTQ+ students.

    2. A year after her killing, the school district that refusedto have a moment of silence for her immediately after her murder allowedthe anniversary to be acknowledged by having a "No Name Calling Day"(Smothers, 2004 ). It is important to understand that homophobic violenceand the potential for harassment do structure the lives of sexual minorities.But the understanding of their identities, of the places to go to find commu-nities that support their gender and sexual identities, and of their ability toexpress their identities-even in challenging situations-demonstrates thatsexual and gender minority youth like Gunn are actively and creatively in-volved in making their lives and corrimunities

      This passage underscores a painful but essential truth: for LGBTQ+ youth, the threat of violence isn’t hypothetical—it’s a real and structuring force in their lives. The example of Sakia Gunn’s murder and the school district’s delayed, symbolic response highlights how institutions often fail to acknowledge queer lives until forced to do so—and even then, often through surface-level gestures like “No Name Calling Day” instead of deeper structural change.

      Despite these threats, the passage also honors the resilience and agency of queer youth. Students like Gunn not only assert their identities in dangerous environments, but they also work to create communities of care and resistance. This challenges narratives that frame LGBTQ+ youth solely as victims; instead, it centers their creativity, resistance, and leadership in shaping more inclusive futures. What would it look like if schools responded to violence not just with symbolic recognition, but with lasting structural support for LGBTQ+ youth?

    3. Despite what sometimes seems to be an overwhelmingly hostile contextin schools, the concerted efforts of students, teachers, administrators, andother members of the school community can shift school climates. As the2019 GLSEN survey (Kosciw et al., 2020) shows, schools can make a differ-ence in the experiences of LGBTQ youth.

      Which is such a hopeful sign that things can change for the better. With data showing that inclusive curriculum correlates with less harassment and more connection between students, the importance of schools actively including LGBTQ+ in their curriculum cannot be overemphasized. It's not just about representation; it's more so about supporting environments where students feel safest, most valued, and allowed to be their true selves. It really brings out how much education can change more than minds, but also inclusive, respectful communities.

    4. Laws and regulations canhelp them improve school climate and help them know how to put inclusiveknowledge into practice. Homophobia and transphobia, in a very real sense,affect everyone-even professionals who know they ought to do better bysexual and gender minority students feel constrained by the biases circulat-ing in their schools

      This section emphasizes that while laws (like Title IX or state anti-discrimination statutes) provide a foundation for inclusive practice, laws alone aren’t enough. Even educators who want to support LGBTQ students often feel blocked—not by a lack of policy, but by the unspoken biases and pressures within their school communities. These biases may come from colleagues, parents, or administrators, creating an environment where silence and neutrality feel safer than advocacy.

      Mayo’s point here is crucial: homophobia and transphobia are institutional, not just interpersonal. They shape what gets taught, what gets ignored, and who feels safe to speak up. The implication is that real change must involve both top-down (policy) and bottom-up (cultural) efforts. It’s not just about knowing what’s right—it’s about having the support and freedom to act on it.

    5. Moreover, studentsin schools with inclusive curriculum reporte<l lower levels of harassment,higher attendance rates, and more feelings of connection to their schools.

      This sentence provides strong evidence for the real-world impact of inclusive education. It shows that LGBTQ representation in the curriculum isn't just about visibility—it's about creating safer, more affirming learning environments. When LGBTQ students see themselves reflected in what they’re taught, they feel a greater sense of belonging and are less likely to face harassment. This supports the idea that inclusion isn’t merely symbolic; it directly improves students' mental health, safety, and academic engagement.

      Importantly, this doesn’t just benefit LGBTQ students—it fosters a more respectful school culture for everyone. By normalizing diverse identities through curriculum, schools send a message that all students belong, which reduces the “othering” that often fuels bullying.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have eliminated through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against cell elongation/transertion acting a predominant mechanism of nucleoid segregation.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In our revised manuscript, we clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate (Figure 1H and Figure 1 - figure supplement 5A), indicating that it cannot be the main driver.

      (3) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (4) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. It is not clear to us how such nucleoid dynamics could be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether). Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the Introduction and Results section, but we agree that this was not well explained. We have now put emphasis on the related experimental data (Figure 1H, Figure 1 – figure supplement 5A, ) and revised the text (lines 199 - 210) to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the integrity or potential of the membrane depletes cells of ATP; without ATP, gene expression is inhibited. However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we have clarified this point and added model simulations (Figure 7 – figure supplement 2) to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We have included new flow cytometry data of fluorescently labeled DNA to show that DNA replication is not impacted.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. 2021 PMID: 34675077 and Xiang et al. 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions. We model this effect by introducing a term in the free energy χ_np, which refers to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes, this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This has been clarified in the revised text, with the support of additional simulation results showing how the asymmetry in polysome distribution causes a compaction force (Figure 4A).

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      In our case, phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though it arises from entropic effects.

      In the revised manuscript, we now illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell (new Figure 4A).

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression. The model is able to recapitulate the two main phenotypes observed in experiments (Figure 7). These new simulation results have been added to the revised manuscript (Figure 7 – figure supplement 2).

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This is now clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We have added markers in Figure 1C to indicate the average start of cell constriction. This relative time from birth to division was estimated as described in the new Figure 1 – figure supplement 2. We have also indicated that cell birth and division correspond to the first and last images/timepoint in Figure 1B and C, respectively. The two-imensional average cell projections presented in Figure 3D also indicate the average timing of cell constriction, consistent with our analysis in Figure 1 – figure supplement 2.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020. "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point! We have revised the text to mention this work. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We have revised the text accordingly.

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the original discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. The manuscript has been revised accordingly. Furthermore, in the revised manuscript, we have included additional simulation results with three different diffusion coefficients that reflect different polysome sizes to show that different polysome species with less or more ribosomes give similar results (Figure 4 – figure supplement 4). This shows that the average polysome description in our model is sufficient.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). These points are now mentioned in the revised manuscript.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there are likely entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we have revised our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We could not come up with great ideas for visuals others than the schematics we already provide. However, we have revised the text to clarify our points and added a simulation result (Figure 4A) to help explain biophysical concepts.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern.

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol. We have revised the manuscript to add these data (illustrative images + a quantitative analysis) in Figure 4 – figure supplement 1.

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion could contribute to the nucleoid being peripherally localized in A22 cells. We have revised the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We have revised the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We have revised the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We have revised the manuscript to discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we have revised the text to discuss membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      Please see above.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we confirm that the cell elongation rate is indeed overall slower than the nucleoid segregation rate (see Figure 1 - figure supplement 5A where the subtraction of the cell elongation rate to the nucleoid segregation rate at the single-cell level leads to positive values).

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 5B) but were not highlighted in this context. We have revised the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression (Figure 7). In the revised manuscript, we have added simulation results showing that these nucleoid dynamics are predicted by our model (Figure 7 – figure supplement 2).

      Based on these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We have revised the text to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we have revised the text to mention transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we have considered an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<sub>P</sub> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We present these new simulation results in Figure 4 – figure supplement 4 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Does the polysome density correlate with the origins? If the majority of ribosomal genes are expressed near the origins,

      This is indeed an interesting point that we mention in the discussion. The fact that the chromosomal origin is surrounded by highly expressed genes (PMID: 30904377) and is located near the middle of the nucleoid prior to DNA replication (PMID: 15960977, 27332118, 34385314, 37980336) can only help the model that we propose by increasing the polysome density at the mid-nucleoid position.

      (2) Red lines in 3C are hard to resolve - can the authors make them darker?

      Absolutely. Sorry about that.

      Reviewer #2 (Recommendations for the authors):

      The authors use rifampicin treatment as a mechanism to trigger polysome disassembly and show this leads to homogenous RplA distribution. This is a really important experiment as it is used to link RplA localization to polysomes, and tp argue that RplA density is reporting on polysomes. Given rifampicin inhibits RNA polymerase, and given the only reference of the three linking rifampicin to polysome disassembly is the 1971 Blundell and Wild ref), it would perhaps be useful to more conclusively show that polysome depletion (as opposed to inhibition of mRNA synthesis, which is upstream of polysome assembly) by using an alternative compound more commonly linked to polysome disassembly (e.g., puromycin) and show timelapse loss of density as a function of treatment time. This is not a required experiment, but given the idea that RplA density reports on polysomes is central to the authors' interpretation, it feels like this would be a thing worth being certain of. An alternative model is that ribosomes undergo self-assembly into local storage depots when not being used, but those depots are not translationally active/lack polysomes. I don't know if I think this is likely, but I'm not convinced the rifampicin treatment + waiting for a relatively long period of time unambiguously excludes other possible mechanisms given the large scale remodeling of the intracellular environment upon mRNA inhibition. I 100% buy the relationship between ribosomal distribution and nucleoid segregation (and the ectopic expression experiments are amazing in this regard), so my own pause for thought here is "do we know those ribosomes are in polysomes in the ribosome-dense regions". I'm not sure the answer to this question has any bearing on the impact and importance of this work (in my mind, it doesn't, but perhaps there's a reason it does?). The way to unambiguously show this would really be to do CryoET and show polysomes in the dense ribosomal regions, but I would never suggest the authors do that here (that's an entire other paper!).

      We agree that mRNAs play a role, as mRNAs are major components of polysomes and most mRNAs are expected to be in the form of polysomes (i.e., in complex with ribosomes). In addition, as mentioned above, the enrichments of ribosome distribution are known to be associated with polysomes (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). This is also consistent with cryo-ET results that we actually published (see Figure S5, PMID: 34186018). We have added this information to the revised manuscript. Thank you for alerting us of this oversight.

      On line 320 the authors state "Our single-cell studies provided experimental support that phase separation between polysomes and DNA contributes to nucleoid segregation." - this comes pretty out of left field? I didn't see any discussion of this hypothesis leading up to this sentence, nor is there evidence I can see that necessitates phase separation as a mechanistic explanation unless we are simply using phase separation to mean cellular regions with distinct cellular properties (which I would advise against). If the authors really want to pursue this model I think much more support needs to be provided here, including (1) defining what the different phases are, (2) providing explicit description of what the attractive/repulsive determinants of these different phases could be/are, and (3) ruling out a model where the behavior observed is driven by a combination of DNA / polysome entanglement + steric exclusion; if this is actually the model, then being much more explicit about this being a locally arrested percolation phenomenon would be essential. Overall, however, I would probably dissuade the authors from pursuing the specific underlying physics of what drives the effects they're seeing in a Results section, solely because I think ruling in/out a model unambiguously is very difficult. Instead, this would be a useful topic for a Discussion, especially couched under a "our data are consistent with..." if they cannot exclude other models (which I think is unreasonably difficult to do).

      Thank you for your advice. We have revised the text to more carefully choose our words and define our terms.

      Minor comments:

      The results in "Cell elongation may also contribute to sister nucleoid migration near the end of the division cycle" are really interesting, but this section is one big paragraph, and I might encourage the authors to divide this paragraph up to help the reader parse this complex (and fascinating) set of results!

      We have revised this section to hopefully make it more accessible.

      Reviewer #3 (Recommendations for the authors):

      Technical Controls:

      The authors should conduct a photobleaching control to confirm that the perceived 'higher' brightness of new ribosomes at the mid-cell position is not an artefact caused by older ribosomes being photobleached during the imaging process. Comparing results at various imaging frequencies and intensities is necessary to address this issue.

      The ribosome localization data across 30 nutrient conditions (Figure 2, Figure 1 – figure supplement 6, Figure 2 – Figure supplement 1, Figure 2 – Figure supplement 3 and Figure 5) are from snapshot images, which do not have any photobleaching issue. They confirm the mid-cell accumulation seen by time-lapse microscopy. We have revised the text to clarify this point.

      Novelty of Experimental Measurements:

      While the scale of the study is unprecedented, claims of novelty (e.g., line 142) regarding ribosome-nucleoid segregation tracking are overstated. Similar observations have been made previously (e.g., Bakshi et al., 2012; Bakshi et al., 2014; Chai et al., 2014).

      Our apologies. The text in line 142 oversimplified our rationale. This has been corrected in the revised manuscript.

    1. You probably spew bullshit too. It’s just that you’re not famous, and so it may even be harder to be held accountable. Nobody writes essays poking the holes in some private belief you have.

      The at-oddsness of the two things mentioned here—spewing bullshit and private beliefs that someone could correct you about—is hard to skip over.

    1. Escalations Report Q1-25

      Ciaran Feedback: Emphasis of this analysis should be on new business. It's where our growth is. CS escalations were just looking for the same pricing on the same deal. And now we have that so the CS escalations in relation to pricing should drop significantly

    1. ATOSSA. Is it in skill of bow and shaft that Athens’ men excel? CHORUS. Nay, they bear bucklers in the fight, and thrust the spear-point well.

      Here, the differences in weaponry highlight the overall differences between the groups. It's not just about using particular tools, it's about what those tools say about a culture. From an Athenian perspective, a bow may seem like a more cowardly weapon. Calling attention to this perception could be a way of disparaging the foreign Persians while also building a sense of national pride around the Athenians’ own choice of weapon. CC BY-NC-SA

    1. There where the wild she-demons kept Their watch around, she sighed and wept.

      Sita cries due to being captured by the evil "Other". It's made clear that it’s not just the king that's evil, but also the subjects that are called "wild she-demons". This is a very direct example of othering a foreign people by perceiving them as uncivilized. This lays the groundwork for the judgment of others’ unique customs, which in turn, builds an identity around one’s own customs. CC BY-NC-SA

    1. Reviewer #1 (Public review):

      Summary

      In their paper Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods, and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on line 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At minimum, all variables used in the equations should be clearly defined.

      b. Additionally, the description in the main text of how queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2 and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether authors are displaying the bootstrapped 95%Cis or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. figure 10, figure 1 under the mid IRS panel). But it's not possible to conclude on way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      d. Furthermore authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify and likelihood and include in an appendix why their estimation procedure is in fact maximizing this likelihood preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested are for their mean and variance since this could influence the overall quality of the estimation procedure.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5 year olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4 year old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      b. The evaluation of the capacity parameter c seems to be quite important, and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increasing will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

      Comments on revisions:

      The authors have adequately responded to all comments.

    1. Have you ever reported a post/comment for violating social media platform rules?

      I have reported a post before, but honestly not often. It was a comment on TikTok that was clearly racist and got a lot of likes, which made it worse. I don’t usually report things unless they’re really bad, because most of the time I assume nothing will happen. But that one just felt too much to ignore. Reading this chapter made me think more about how the report systems are kind of hidden or hard to trust. It feels like people don’t know what counts as “bad enough” to report unless it’s super obvious.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I can find no problems with the experiments performed in this study, but there are several results that are not easily explained. I would like to see more consideration of possible explanations. For example, one of the major differences between the the CESA structure from primary and secondary cell walls is the displacement of TM7 in the primary cell wall CESAs that leads to the formation of lipid exposed channel. Why does this vary between primary and secondary cell wall CESA proteins? Could it explain differences in the properties, such as crystallinity between primary and secondary cell wall cellulose?

      At this time, the different position of TM helix 7 observed in our GmCesA structures is just an observation. We have some emerging evidence that this helix is also flexible in POCesA8 under certain conditions; however, we do not know whether this affects catalytic activity or cellulose coalescence. We have revised the text to avoid the interpretation that TM 7 repositioning is a characteristic feature of primary cell wall CesAs only.

      Similarly, regarding the formation of the larger structures from mixtures of different CESA trimers. Why do they not form roseOes? Par;cularly as these appear to be forming 2-dimensional structures.

      We have included additional data on the interaction between different CesA isoform trimers (Figure 6). To answer the reviewer’s ques;on, the most likely reasons for not observing closely packed roseOe-like structures are (a) steric interferences between the micelles harboring the individual CesA trimers, and (b) the lack of a stabilizing cellulose fiber.  This interpretation is supported by 2D class averages of dimers of CesA1 and CesA3 trimers (now shown in Fig. 6). The class averages show an ‘upside-down and side-by-side’ orientation of the two trimers, consistent with interferences between the solubilizing detergent micelles. The implica;ons of this non-physiological arrangement are discussed in the revised manuscript. In a biological membrane, the CesA trimers are confined to the same plane in the same orientation, which is likely necessary to form ordered arrangements.

      What role does the NTD play in trimer formation given its apparent very high class specificity?

      We have no data suggesting any contribution of the NTD to trimer formation. Recent work on moss CesA5 and similar AlphaFold predic;ons suggest that, for some CesAs, an extreme Nterminal region can interact with the beta sheet of the catalytic domain via beta-strand augmentation. Whether this interaction can contribute to CesA-CesA interactions remains unknown.

      Reviewer #2 (Recommendations For The Authors):

      The authors provide PDB codes but not EMDB codes for the EM maps, also I would encourage the authors to upload the raw micrographs to the EMPIAR database.

      The EMDB codes are shown in Table 1 and data transfer to EMPIAR is ongoing.  

      Page 6 line 144, the statement "All CesA isoforms show greatest catalytic activity at neutral pH" seems to contradict the data in Figure 1e and the subsequent statements. This sentence should be removed.

      The text has been revised to indicate that CesA1 and CesA6 show highest activity under mild alkaline conditions.  

      Page 6, line 150, the authors state "The affinities for substrate binding range from 1.4 mM for CesA1 to 0.6 and 2.4 mM for CesA3 and CesA6, respectively." How were the affinities determined? Is this the affinities or the Michaelis constants? Is it known whether CesAs are rapid equilibrium enzymes? This should be clarified.

      The text now states that we performed Michaelis Menten kine;cs using the ‘UDP-Glo’ glycosyltransferase assay kit. We are uncertain about whether CesAs can be classified as rapid equilibrium enzymes. The rate-limiting step of cellulose biosynthesis has been proposed to be glycosyl transfer, rather than cellulose transloca;on.  To avoid any confusion, we changed the text from '…reveals Michaelis Menten constants for substrate binding of CesA1 and CesA3' to '…reveals Michaelis Menten constants for CesA1 and CesA3 with respect to UDP-Glc'.

      Page 6, line 153, the authors state "CesA1's apparent Ki for UDP is roughly 0.8 mM, whereas this concentration is increased to about 1.2 to 1.5 mM for CesA6 and CesA3, respectively." From the Figure 1g legend, it appears that the authors performed additional experiments at different UDP-Glc concentrations in order to determine Ki that are not shown. This data should be included as a figure supplement as the data presented are insufficient to determine Ki (only IC50).

      The UDP inhibition data show apparent IC50 values, and this has been corrected in the text. For each CesA isoform, the titration was done at one UDP-Glc concentration only.    

      Page 8, line 202, the authors state that TM helix 7 of the primary cell wall CesAs is more flexible "as evidenced by weaker density." The density for the TM helix 7 should be shown. If the density shown in Supplementary Figure 3 corresponds to TM helices the number of the helices should be indicated as it is not immediately obvious from the amino acid residue numbers.

      The densities for TM helix 7 of all CesA isoforms are shown in Supplemental Figure 3. The helices are now labeled to orient the reader.  

      Reviewer #2 (Public Review)

      The authors demonstrate via truncation that the N-terminus of the CesA is not involved in the interactions between the isoforms and propose that the CSR hook-like extensions are the primary mediator of trimer-trimer interactions. This argument would be strengthened by equivalent truncation experiments in which the CSR region is removed.

      We performed the suggested experiment. We replaced the CSR in N-terminally truncated GmCesA1 and GmCesA3 with a 20-residue long linker. The resulting constructs assemble into homotrimeric complexes as observed for the wild type and only N-terminally truncated versions. However, the CSR-truncated constructs of the different isoforms do not interact with each other in vitro. Further, CSR-deleted GmCesA3 also does not interact with full-length CesA1, suggesting that two CSR domains of different isoforms are necessary for homotrimer interaction. This data is now shown as Fig. 5.  

      Reviewer #3 (Recommendations For The Authors):

      Major Points

      (1) The authors state on Line 354 that they were unable to isolate heterotrimers, but they need to provide the data to support this claim; for example, it is important for readers to understand whether co-expression of all three CESAs leads to only homotrimers or only monomers. This information is essential to exclude model C in Figure 6.

      We have revised the corresponding discussion and toned down the statement that heterotrimeric complexes did not form in our recombinant expression system. Co-expression of differently tagged secondary or primary cell wall CesAs in Sf9 cells has consistently resulted in negligible amounts of material that can be purified sequentially over different affinity matrices (corresponding to the tags on the recombinantly expressed CesAs – His, Strep, Flag). While this does not exclude the formation of a small fraction of hetero-oligomeric complexes (which could be trimers as observed in the structures or monomers interacting via their CSR regions), it demonstrates that CesAs favor the same isoform for trimer formation, rather than partnering with other isoforms. An example of such a purification is now shown as Supplemental Figure 8.

      Determining whether heterotrimers are formed upon co-expression of different CesA isoforms requires high resolution structural analysis because co-purification of different isoforms can also be due to interactions between different homo-trimeric complexes, as demonstrated in this study.

      While we cannot exclude that factors exist in planta that may prevent the formation of homotrimers and favor the formation of hetero-trimers, it is important to keep in mind that currently no experimental data supports the formation of hetero-trimeric complexes. Instead, our work demonstrates that existing data on CesA isoform interactions can be explained by the interaction of homotrimers of different isoforms.

      (2) The evidence that the products of GmCEA1, GmCESA3, and GmCESA6 homotrimers are cellulose is that they consume UDP-glucose and produce a beta-glucanase-sensitive product. Other beta-glucans synthesized by similar GT2 family proteins (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLCs, Kim et al., 2020 PNAS) would be sensitive to this enzyme, and the product cannot truly be called cellulose unless it forms microfibrils. Previous reports of CESA activity in vitro have demonstrated that the products form genuine cellulose microfibrils rather than amorphous beta-glucan (via electron microscopy); extensively documented that the product is sensitive to beta-glucanase, but not other enzymes (e.g., callose or MLG degrading enzymes); provided linkage analysis of the product to conclusively demonstrate that it is a beta1,4-linked glucan; and documented a loss of activity when key catalytic residues were mutated (Purushotham et al., 2016 PNAS; Cho et al., 2017 Plant Phys; Purushotham et al., 2020 Science).

      Other GT2 characterization efforts have documented activity to similar standards (e.g. CSLDs, Yang et al., 2020 Plant Cell or CSLFs, Purushotham et al., 2022 Science Advances). At least one independent method should be provided, and the TEM of the product is necessary for readers to appreciate whether the product forms true cellulose microfibrils.

      There may be some confusion regarding the nomenclature. Therefore, we revised the second sentence of the Introduction to define ‘cellulose’ as a beta-1,4 linked glucose polymer, in accordance with the ‘Essentials of Glycobiology’. This is also consistent with enzyme nomenclature as the primary product of cellulose synthase is a single glucose polymer, and not a fibril. For example, most bacterial cellulose synthases only produce amorphous (single chain) cellulose. 

      We show that the GmCesA products can be degraded with a beta-1,4 specific glucanase (cellulase), which demonstrates the formation of authentic cellulose. This study does not focus on the formation of fibrillar cellulose apart from suggesting a revised model for a microfibrilforming CSC.       

      (3) The position of isoxaben-resistant mutations implies that primary cell wall CESAs form heterotrimers (Shim et al., 2018 Frontiers in Plant Biology). Indeed, in their previous description of the POCESA8 structure (Purushotham et al., 2020 Science), the authors discussed the position of isoxaben-resistant mutations as a way to justify the way that TM7 of one CESA can contribute to forming the cellulose translocation pore in the neighbouring CESA within a heterotrimer. However, in this manuscript, the authors document a different location for TM7 in the GmCEA1, GmCESA3, and GmCESA6 homotrimers, which would change the position of these resistance mutations. Please discuss.

      As stated in the manuscript, we do not know what the functional implication of the TM7 flexibility may be, but we speculate that it could affect the alignment of the synthesized cellulose polymers. Regarding the previously reported POCesA8 structure, the mapping of one of the reported isoxaben resistance mutants to the C-terminus of TM7 was not used to justify the structure; the structure with its position of TM7 stands on its own.  Considering recent observations suggesting that isoxaben may affect cellulose biosynthesis via secondary effects, we prefer not to speculate on the mechanism by which these mutations cause the apparent resistance to isoxaben (PMID: 37823413).

      (4) The authors present no evidence that GmCESA1/3/6 are involved in primary cell wall synthesis. Please include gene expression information (documenting widespread expression consistent with primary CESAs) and rigorous molecular phylogenetic analysis (or references to these published data) to clarify that these are indeed primary cell wall CESAs.

      This has been addressed. We have included additional figures (Fig. 1 and S1B) that show the strong and wide distribution of the selected CesAs in soybean leaves, their co-expression with primary cell wall markers, and their phylogenetic clustering with Arabidopsis primary cell wall CesAs.  

      (5) Several small changes need to be made to the abstract to ensure that it aligns with the data: Line 28: add "in vitro" arer "their assembly into homotrimeric complexes" Line 28: change "stabilized by the PCR" to "presumably stabilized by the PCR".

      We inserted ‘in vitro’ as requested. We did not insert the second modification as requested since CesA trimers are stabilized by the PCR. This is a fact arising from several experimentally determined CesA trimer structures.  

      (6) In all graphs in all figures it is unclear what the sample size is and what the bars represent. These must be stated in the figure legends. It is best practice to plot individual data points so that readers can easily interpret both the sample size and the variation.

      The sample sizes and error bars are now defined in the relevant figure legends.

      (7) The methods need to unambiguously define GmCESA1, GmCESA3, GmCESA6 protein identities using appropriate accession numbers.

      The accession codes are now provided in the Methods.

      Minor Points

      (1) Does CESA1 have higher activity in Figure 1D because of the pH at which the assay was conducted (see Figure 1E)? Could this difference in activity or pH preference have also affected their capacity to resolve TM7 of CESA1?

      We consistently observe higher in vitro catalytic activity of CesA1, compared to CesA3 and CesA6. Activity assays are performed at a pH of 7.5, roughly halfway between the activity maxima of CesA3 and CesA1/6. At this pH, we expect activity differences to arise from factors other than the buffer pH. As detailed above, we do not know whether the conformational flexibility of TM helix 7 affects catalytic activity.

      (2) Line 55: The authors should cite additional papers that also provide insight into CESA structure (e.g. Qiao et al 2021 PNAS).

      A recent publication on moss CesA5 has been included. Qiao et al unfortunately report on a dimeric assembly of a fragment of Arabidopsis thaliana’s CesA3 catalytic domain, which we consider non-physiological. We added a brief statement in the Discussion explaining that our GmCesA3 structure is inconsistent with the dimeric arrangement reported by Qiao et al.

      (3) Line 95: these references are about secondary cell wall CESA isoforms, but there are more appropriate references for the primary CESAs that should be included in place of these papers.

      Fagard et al report on growth defects in roots and dark-grown hypocotyls linked to Arabidopsis CesA 1 and CesA6, which are primary cell wall CesAs. Nevertheless, we have included two additional recent publications from the Meyerowitz and Persson labs.

      (4) Line 121-122: Please cite a specific figure that supports this claim, since the (Purushotham et al., 2020) reference refers to POCESA8 enrichment results, but the claims are about the GmCESA1/3/6 enrichment.

      The POCesA8 reference has been removed. The classification into monomers and trimers arises from the data processing described in this manuscript and is consistent with similar results obtained for POCesA8.

      (5) Line 314: It is more appropriate to use "enzyme activity" rather than "cellulose synthesis".

      We prefer to use cellulose biosynthesis since the enzyme produces cellulose.

      (6) Figure 1: please add colour to the graphs to clarify which trend lines belong to which data series (especially Figure 1G).

      The figure (now Fig. 2) has been revised as suggested.  

      (7) Figure 2D: It's not clear which parts are GmCESA and which are POCESA8; please clarify the figure legend.

      Thank you, the legend has been revised accordingly (now Fig. 3).

      (8) In Figure 5, It's not clear that the one CESA is maintained at a steady concentration throughout the assay since there is only a bar for that CESA at the highest concentration (e.g. in Figure 5A, the blue bar for CESA1 only appears on the right-most assay, but there was CESA1 in all assays, so this should be indicated).

      In the panel the reviewer is referring to, the blue bar corresponds to the activity measured for only CesA1 at a concentration of 20 µM. The red columns (indicated as ‘Mix’) represent the activities measured in the presence of 20 µM of CesA1 plus increasing concentrations of CesA3. The purple columns represent activities obtained for only CesA3 at the indicated concentrations. Numerical addition of the activities of CesA1 alone at 20 µM (blue column) and CesA 3 alone (purple columns) gives rise to the gray columns, now indicated by a capital ‘sigma’ sign. We are unclear on how the figure could be improved, but we have revised the legend to avoid confusion.    

      (9) Figure 5 legend needs to be clarified to indicate whether monomers or homotrimers were used in the assays.

      This is now shown as Fig. 7 and the legend has been revised as requested. The experiments were performed with the trimeric CesA fractions.

      (10) There seem to be some random dots near the top of Figures 6B & 6C

      Removed. Thank you.

  8. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. ShiningConcepts. r/TheoryOfReddit: reddit is valued at more than ten billion dollars, yet it is extremely dependent on mods who work for absolutely nothing. Should they be paid, and does this lead to power-tripping mods? November 2021. URL: www.reddit.com/r/TheoryOfReddit/comments/qrjwjw/reddit_is_valued_at_more_than_ten_billion_dollars/ (visited on 2023-12-08).

      It's quite astonishing to me that Reddit relies on unpaid mods. While mods might work out of passion, the huge valuation of Reddit makes it seem unfair that they don't receive compensation. Also, if Reddit is so valuable, it should consider how to support mods, whether with payment or other incentives. Just expecting them to work for free seems outdated and potentially harmful to the community. It also makes me wonder what Reddit's true priorities are.

    2. David Gilbert. Facebook Is Ignoring Moderators’ Trauma: ‘They Suggest Karaoke and Painting’. Vice, May 2021. URL: https://www.vice.com/en/article/m7eva4/traumatized-facebook-moderators-told-to-suck-it-up-and-try-karaoke (visited on 2023-12-08).

      I think Gilbert exposes how Facebook's outsourced moderators are left to deal with violent and psychologically damaging content on a regular basis with minimal mental health support. The suggestion that painting or karaoke can heal trauma caused by long-term exposure to violence, child abuse, or suicide is not just tone-deaf—it's reflective of a deep structural contempt for worker well-being. This piece strongly conveys that ethical content moderation is not just about what is removed, but about whom corporations step on in the process.

    3. Sarah T. Roberts. Behind the Screen. Yale University Press, September 2021. URL: https://yalebooks.yale.edu/9780300261479/behind-the-screen (visited on 2023-12-08).

      Reading about Behind the Screen really opened my eyes. I always assumed content moderation was handled mostly by AI, but Roberts shows that it’s often done by real people who have to view traumatic content day after day. That made me think about how much invisible labor goes into keeping our feeds “clean.” I feel kind of guilty now for complaining about my posts being flagged when others are doing such emotionally draining work just to protect users like me.

    1. Sometimes individuals are given very little control over content moderation or defense from the platform,

      I think it’s frustrating that users are often told to just “not read the comments” instead of being given real tools to protect themselves. It feels like the responsibility is shifted away from the platform and onto the person being targeted. I agree that platforms should do more—it shouldn’t just be up to individual users to block or mute others after the damage is already done.

    1. Another strategy for content moderation is using bots, that is computer programs that look through posts or other content and try to automatically detect problems. These bots might remove content, or they might flag things for human moderators to review.

      I’ve seen these bots in action, especially on TikTok and Instagram. Sometimes they flag completely harmless content—like one time a friend’s video got muted just because a lyric sounded suspicious. I get that bots are trying to help, but I feel like they still lack the human context to tell jokes from harm. It’s helpful for filtering spam quickly, but it also makes me nervous when something important gets taken down just because the algorithm didn’t “get” it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. Reviewer #1 (Public review):

      This paper presents a computational model of the evolution of two different kinds of helping ("work," presumably denoting provisioning, and defense tasks) in a model inspired by cooperatively breeding vertebrates. The helpers in this model are a mix of previous offspring of the breeder and floaters that might have joined the group, and can either transition between the tasks as they age or not. The two types of help have differential costs: "work" reduces "dominance value," (DV), a measure of competitiveness for breeding spots, which otherwise goes up linearly with age, but defense reduces survival probability. Both eventually might preclude the helper from becoming a breeder and reproducing. How much the helpers help, and which tasks (and whether they transition or not), as well as their propensity to disperse, are all evolving quantities. The authors consider three main scenarios: one where relatedness emerges from the model, but there is no benefit to living in groups, one where there is no relatedness, but living in larger groups gives a survival benefit (group augmentation, GA), and one where both effects operate. The main claim is that evolving defensive help or division of labor requires the group augmentation; it doesn't evolve through kin selection alone in the authors' simulations.

      This is an interesting model, and there is much to like about the complexity that is built in. Individual-based simulations like this can be a valuable tool to explore the complex interaction of life history and social traits. Yet, models like this also have to take care of both being very clear on their construction and exploring how some of the ancillary but potentially consequential assumptions affect the results, including robust exploration of the parameter space. I think the current manuscript falls short in these areas, and therefore, I am not yet convinced of the results. Much of this is a matter of clearer and more complete writing: the Materials and Methods section in particular is incomplete or vague in some important junctions. However, there are also some issues with the assumptions that are described clearly.

      Below, I describe my main issues, mostly having to do with model features that are unclear, poorly motivated (as they stand), or potentially unrealistic or underexplored.

      One of the main issues I have is that there is almost no information on what happens to dispersers in the model. Line 369-67 states dispersers might join another group or remain as floaters, but gives no further information on how this is determined. Poring through the notation table also comes up empty as there is no apparent parameter affecting this consequential life history event. At some point, I convinced myself that dispersers remain floaters until they die or become breeders, but several points in the text contradict this directly (e.g., l 107). Clearly this is a hugely important model feature since it determines fitness cost and benefits of dispersal and group size (which also affects relatedness and/or fitness depending on the model). There just isn't enough information to understand this crucial component of the model, and without it, it is hard to make sense of the model output.

      Related to that, it seems to be implied (but never stated explicitly) that floaters do no work, and therefore their DV increases linearly with age (H_work in eq.2 is zero). That means any floaters that manage to stick around long enough would have higher success in competition for breeding spots relative to existing group members. How realistic is this? I think this might be driving the kin selection-only results that defense doesn't evolve without group augmentation (one of the two main ways). Any subordinates (which are mainly zero in the no GA, according to the SI tables; this assumes N=breeder+subordinates, but this isn't explicit anywhere) would be outcompeted by floaters after a short time (since they evolve high H and floaters don't), which in turn increases the benefit of dispersal, explaining why it is so high. Is this parameter regime reasonable? My understanding is that floaters often aren't usually high resource holding potential individuals (either b/c high RHP ones would get selected out of the floater population by establishing territories or b/c floating isn't typically a thriving strategy, given that many resources are tied to territories). In this case, the assumption seems to bias things towards the floaters and against subordinates to inherit territories. This should be explored either with a higher mortality rate for floaters and/or a lower DV increase, or both.

      When it comes to floaters replacing dead breeders, the authors say a bit more, but again, the actual equation for the scramble competition (which only appears as "scramble context" in the notation table) is not given. Is it simply proportional to R_i/\sum_j R_j ? Or is there some other function used? What are the actual numbers of floaters per breeding territory that emerge under different parameter values? These are all very important quantities that have to be described clearly.

      I also think the asexual reproduction with small mutations assumption is a fairly strong one that also seems to bias the model outcomes in a particular way. I appreciate that the authors actually measured relatedness within groups (though if most groups under KS have no subordinates, that relatedness becomes a bit moot), and also eliminated it with their ingenious swapping-out-subordinates procedure. The fact remains that unless they eliminate relatedness completely, average relatedness, by design, will be very high. (Again, this is also affected by how the fate of the dispersers is determined, but clearly there isn't a lot of joining happening, just judging from mean group sizes under KS only.) This is, of course, why there is so much helping evolving (even if it's not defensive) unless they completely cut out relatedness.

      Finally, the "need for division of labor" section is also unclear, and its construction also would seem to bias things against division of labor evolving. For starters, I don't understand the rationale for the convoluted way the authors create an incentive for division of labor. Why not implement something much simpler, like a law of minimum (i.e., the total effect of helping is whatever the help amount for the lowest value task is) or more intuitively: the fecundity is simply a function of "work" help (draw Poisson number of offspring) and survival of offspring (draw binomial from the fecundity) is a function of the "defense" help. As it is, even though the authors say they require division of labor, in fact, they only make a single type of help marginally less beneficial (basically by half) if it is done more than the other. That's a fairly weak selection for division of labor, and to me it seems hard to justify. I suspect either of the alternative assumptions above would actually impose enough selection to make division of labor evolve even without group augmentation.

      Overall, this is an interesting model, but the simulation is not adequately described or explored to have confidence in the main conclusions yet. Better exposition and more exploration of alternative assumptions and parameter space are needed.

    1. Cognitive modeling66 Olson, J. R., & Olson, G. M. (1990). The growth of cognitive modeling in human-computer interaction since GOMS. Human-Computer Interaction.  is a collection of methods that build models, sometimes computational models, of how people reason about tasks. GOMS44 John, B.E. and Kieras, D.E. (1996). The GOMS family of user interface analysis techniques: comparison and contrast. ACM Transactions on Computer-Human Interaction (TOCHI). , for example, which stands for Goals, Operators, Methods, and Selection Rules, is a way of defining expert interactions with an interface and using the model to predict how long it would take to perform various tasks. This has been useful in trying to find ways to optimize expert behavior quite rapidly without having to conduct user testing.

      I totally agree with this statement. It's important to understand the logic behind oour suser's behavior instead of just consider the asthetic design.

    2. Methods in this paradigm try to simulate people using a design and then use design principles and expert judgement to predict likely problems.

      I think it’s interesting that analytical methods don’t require real users, just simulations and expert knowledge. This seems really efficient, especially when you can’t get users to test your design right away. But I also wonder if this approach sometimes misses issues that only real users would notice.

    3. There are two adjacent commands that do very different things.

      I observed this design problem on several instances in different applications. Recently, I was playing a video game where tried purchasing a new car and I was one click away from destroying my previous car that I worked hard for (you're only allowed to have one). There was just one warning which looked like any other generic warning in the game that this would happen and I almost pressed yes which would've been pitiful. It's really important when designing interfaces to pay attention to these little things that can make or break the user experience.

    1. A major limitation of A/B tests is that because it’s difficult to come up with holistic measures of success, the results tend to be pretty narrow.

      This point made me realize that not everything important can be measured easily, especially with A/B tests. I used to think A/B testing was the ultimate way to know what works, but now I see its limits. It’s a reminder to think carefully about what really matters in a design, not just what’s easy to track.

    1. When I followed up with her over the phone, she said that it’s getting more and more difficult to catch A.I. use because a savvier user will recognize absurdities and hallucinations and go back over what a chatbot spits out to make it read more as if the user wrote it herself. But what troubles Martin more than some students’ shrewd academic dishonesty is “that there’s just no grit that’s instilled in them. There’s no sense of ‘Yes, you’re going to struggle, but you’re going to feel good at the end of it.’”

      This also adds onto the reliance aspect of AI and it really messes up the mental of students.

    1. Energy, discipline, my own power will keep me going,” says ex-anorectic Aimee Liu, recreating her anorexic days. “I need nothing and no one else. . . . I will be master of my own body, if nothing else, I vow

      I've had similar thought processes around medication. It sucks, and it's very hard to get out of that cycle of thoughts. You feel like you just need to be a little stronger and push through it without needing anything.

    1. Your natural style of communication dictates how others perceive you. If the way you talk doesn’t reflect your intentions, others may not understand what you’re trying to communicate and may develop negative feelings towards you. To better understand the way you communicate and the ways you can improve:

      I agree with that statement because communication isn't just about the words you say and it's also about how you say them. Your tone, body language, pacing, and choice of words all contribute to how others interpret your message. If your delivery doesn’t align with your intention, people might misinterpret your message, leading to confusion, frustration, or even conflict.

  9. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Neurodiversity

      This article explains that neurodiversity is about how everyone’s brain works in its own unique way, and that’s totally normal. It’s not always something that needs to be changed, it’s just different. The idea started with autistic people standing up for themselves and now includes things like ADHD and dyslexia too. One important point is that the term “neurodiversity” was first used by Judy Singer back in the 1990s. Another key point is that trying to "fix" neurodivergent people can actually hurt their mental health and make them feel like something’s wrong with them.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex, and human brain slices of white matter. 

      This is impressive work and represents a leap over existing light-sheet microscopes. As an example, it offers a fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences. 

      Strengths: 

      - ExA-SPIM features an exceptional combination of field of view, working distance, resolution, and throughput. 

      - An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability. 

      - The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform. 

      Weaknesses: 

      - There is a general question about the scaling laws of lenses, and expansion microscopy, which in my opinion remained unanswered: In the context of whole brain imaging, a larger expansion factor requires a microscope system with larger volumetric coverage, which in turn will have lower resolution (Figure 1B). So what is optimal? Could one alternatively image a cleared (non-expanded) brain with a high-resolution ASLM system (Chakraborty, Tonmoy, Nature Methods 2019, potentially upgraded with custom objectives) and get a similar effective resolution as the authors get with expansion? This is not meant to diminish the achievement, but it was unclear if the gains in resolution from the expansion factor are traded off by the scaling laws of current optical systems. 

      Paraphrasing the reviewer: Expanding the tissue requires imaging larger volumes and allows lower optical resolution. What has been gained?

      The answer to the reviewer’s question is nuanced and contains four parts. 

      First, optical engineering requirements are more forgiving for lenses with lower resolution. Lower resolution lenses can have much larger fields of view (in real terms: the number of resolvable elements, proportional to ‘etendue’) and much longer working distances. In other words, it is currently more feasible to engineer lower resolution lenses with larger volumetric coverage, even when accounting for the expansion factor. 

      Second, these lenses are also much better corrected compared to higher resolution (NA) lenses. They have a flat field of view, negligible pincushion distortions, and constant resolution across the field of view. We are not aware of comparable performance for high NA objectives, even when correcting for expansion.

      Third, although clearing and expansion render tissues ‘transparent’, there still exist refractive index inhomogeneities which deteriorate image quality, especially at larger imaging depths. These effects are more severe for higher optical resolutions (NA), because the rays entering the objective at higher angles have longer paths in the tissue and will see more aberrations. For lower NA systems, such as ExaSPIM, the differences in paths between the extreme and axial rays are relatively small and image formation is less sensitive to aberrations. 

      Fourth, aberrations are proportional to the index of refraction inhomogeneities (dn/dx). Since the index of refraction is roughly proportional to density, scattering and aberration of light decreases as M^3, where M is the expansion factor. In contrast, the imaging path length through the tissue only increases as M. This produces a huge win for imaging larger samples with lower resolutions. 

      To our knowledge there are no convincing demonstrations in the literature of diffraction-limited ASLM imaging at a depth of 1 cm in cleared mouse brain tissue, which would be equivalent to the ExA-SPIM imaging results presented in this manuscript.  

      In the discussion of the revised manuscript we discuss these factors in more depth. 

      - It was unclear if 300 nm lateral and 800 nm axial resolution is enough for many questions in neuroscience. Segmenting spines, distinguishing pre- and postsynaptic densities, or tracing densely labeled neurons might be challenging. A discussion about the necessary resolution levels in neuroscience would be appreciated. 

      We have previously shown good results in tracing the thinnest (100 nm thick) axons over cm scales with 1.5 um axial resolution. It is the contrast (SNR) that matters, and the ExaSPIM contrast exceeds the block-face 2-photon contrast, not to mention imaging speed (> 10x).  

      Indeed, for some questions, like distinguishing fluorescence in pre- and postsynaptic structures, higher resolutions will be required (0.2 um isotropic; Rah et al Frontiers Neurosci, 2013). This could be achieved with higher expansion factors.

      This is not within the intended scope of the current manuscript. As mentioned in the discussion section, we are working towards ExA-SPIM-based concepts to achieve better resolution through the design and fabrication of a customized imaging lens that maintains a high volumetric coverage with increased numerical aperture.  

      - Would it be possible to characterize the aberrations that might be still present after whole brain expansion? One approach could be to image small fluorescent nanospheres behind the expanded brain and recover the pupil function via phase retrieval. But even full width half maximum (FWHM) measurements of the nanospheres' images would give some idea of the magnitude of the aberrations. 

      We now included a supplementary figure highlighting images of small axon segments within distal regions of the brain.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue. 

      Strengths: 

      I commend the authors for making all of the documentation, models, and acquisition software openly accessible and believe that this will help assist others who would like to replicate the instrument. I anticipate that the protocols for imaging large expanded tissues (such as an entire mouse brain) will also be useful to the community. 

      Weaknesses: 

      The characterization of the instrument needs to be improved to validate the claims. If the manuscript claims that the instrument allows for robust automated neuronal tracing, then this should be included in the data. 

      The reviewer raises a valid concern. Our assertion that the resolution and contrast is sufficient for robust automated neuronal tracing is overstated based on the data in the paper. We are hard at work on automated tracing of datasets from the ExA-SPIM microscope. We have demonstrated full reconstruction of axonal arbors encompassing >20 cm of axonal length.  But including these methods and results is out of the scope of the current manuscript. 

      The claims of robust automated neuronal tracing have been appropriately modified.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Smaller questions to the authors: 

      - Would a multi-directional illumination and detection architecture help? Was there a particular reason the authors did not go that route?

      Despite the clarity of the expanded tissue, and the lower numerical aperture of the ExA-SPIM microscope, image quality still degrades slightly towards the distal regions of the brain relative to both the excitation and detection objective. Therefore, multi-directional illumination and detection would be advantageous. Since the initial submission of the manuscript, we have undertaken re-designing the optics and mechanics of the system. This includes provisions for multi-directional illumination and detection. However, this new design is beyond the scope of this manuscript. We now mention this in L254-255 of the Discussion section.

      - Why did the authors not use the same objective for illumination and detection, which would allow isotropic resolution in ASLM? 

      The current implementation of ASLM requires an infinity corrected objective (i.e. conjugating the axial sweeping mechanism to the back focal plane). This is not possible due to the finite conjugate design of the ExA-SPIM detection lens.

      More fundamentally, pushing the excitation NA higher would result in a shorter light sheet Rayleigh length, which would require a smaller detection slit (shorter exposure time, lower signal to noise ratio). For our purposes an excitation NA of 0.1 is an excellent compromise between axial resolution, signal to noise ratio, and imaging speed. 

      For other potentially brighter biological structures, it may be possible to design a custom infinity corrected objective that enables ASLM with NA > 0.1.

      - Have the authors made any attempt to characterize distortions of the brain tissue that can occur due to expansion? 

      We have not systematically characterized the distortions of the brain tissue pre and post expansion. Imaged mouse brain volumes are registered to the Allen CCF regardless of whether or not the tissue was expanded. It is beyond the scope of this manuscript to include these results and processing methods, but we have confirmed that the ExA-SPIM mouse brain volumes contain only modest deformation that is easily accounted for during registration to the Allen CCF. 

      - The authors state that a custom lens with NA 0.5-0.6 lens can be designed, featuring similar specifications. Is there a practical design? Wouldn't such a lens be more prone to Field curvature? 

      This custom lens has already been designed and is currently being fabricated. The lens maintains a similar space bandwidth product as the current lens (increased numerical aperture but over a proportionally smaller field of view). Over the designed field of view, field curvature is <1 µm. However, including additional discussion or results of this customized lens is beyond the scope of this manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      System characterization: 

      - Please state what wavelength was used for the resolution measurements in Figure 2.

      An excitation wavelength of 561 nm was used. This has been added to the manuscript text.

      - The manuscript highlights that a key advance for the microscope is the ability to image over a very large 13 mm diameter field of view. Can the authors clarify why they chose to characterize resolution over an 8diameter mm field rather than the full area? 

      The 13 mm diameter field of view refers to the diagonal of the 10.6 x 8.0 mm field of view. The results presented in Figure 1c are with respect to the horizontal x direction and vertical y direction. A note indicating that the 13 mm is with respect to the diagonal of the rectangular imaging field has been added to the manuscript text. The results were presented in this way to present the axial and lateral resolution as a function of y (the axial sweeping direction).

      - The resolution estimates seem lower than I would expect for a 0.30 NA lens (which should be closer to ~850 nm for 515 nm emission). Could the authors clarify the discrepancy? Is this predicted by the Zemax model and due to using the lens in immersion media, related to sampling size on the camera, or something else? It would be helpful if the authors could overlay the expected diffraction-limited performance together with the plots in Figure 2C. 

      As mentioned previously, the resolution measurements were performed with 561 nm excitation and an emission bandpass of ~573 – 616 nm (595 nm average). Based on this we would expect the full width half maximum resolution to be ~975 nm. The resolution is in fact limited by sampling on the camera. The 3.76 µm pixel size, combined with the 5.0X magnification results in a sampling of 752 nm. Based on the Nyquist the resolution is limited to ~1.5 µm. We have added clarifying statements to the text.

      - I'm confused about the characterization of light sheet thickness and how it relates to the measured detection field curvature. The authors state that they "deliver a light sheet with NA = 0.10 which has a width of 12.5 mm (FWHM)." If we estimate that light fills the 0.10 NA, it should have a beam waist (2wo) of ~3 microns (assuming Gaussian beam approximations). Although field curvature is described as "minimal" in the text, it is still ~10-15 microns at the edge of the field for the emission bands for GFP and RFP proteins. Given that this is 5X larger than the light sheet thickness, how do the authors deal with this? 

      The generated light sheet is flat, with a thickness of ~ 3 µm. This flat light sheet will be captured in focus over the depth of focus of the detection objective. The stated field curvature is within 2.5X the depth of focus of the detection lens, which is equivalent to the “Plan” specification of standard microscope objectives.

      - In Figure 2E, it would be helpful if the authors could list the exposure times as well as the total voxels/second for the two-camera comparison. It's also worth noting that the Sony chip used in the VP151MX camera was released last year whereas the Orca Flash V3 chosen for comparison is over a decade old now. I'm confused as to why the authors chose this camera for comparison when they appear to have a more recent Orca BT-Fusion that they show in a picture in the supplement (indicated as Figure S2 in the text, but I believe this is a typo and should be Figure S3). 

      This is a useful addition, and we have added exposure times to the plot. We have also added a note that the Orca Flash V3 is an older generation sCMOS camera and that newer variants exist. Including the Orca BT-Fusion. The BT-Fusion has a read noise of 1.0 e- rms versus 1.6 e- rms, and a peak quantum efficiency of ~95% vs. 85%. Based on the discussion in Supplementary Note S1, we do not expect that these differences in specifications would dramatically change the data presented in the plot. In addition, the typo in Figure S2 has been corrected to Figure S3.

      - In Table S1, the authors note that they only compare their work to prior modalities that are capable of providing <= 1 micron resolution. I'm a bit confused by this choice given that Figure 2 seems to show the resolution of ExA-SPIM as ~1.5 microns at 4 mm off center (1/2 their stated radial field of view). It also excludes a comparison with the mesoSPIM project which at least to me seems to be the most relevant prior to this manuscript. This system is designed for imaging large cleared tissues like the ones shown here. While the original publication in 2019 had a substantially lower lateral resolution, a newer variant, Nikita et al bioRxiv (which is cited in general terms in this manuscript, but not explicitly discussed) also provides 1.5-micron lateral resolution over a comparable field of view. 

      We have updated the table to include the benchtop mesoSPIM from Nikita et al., Nature Communications, 2024. Based on this published version of the manuscript, the lateral resolution is 1.5 µm and axial resolution is 3.3 µm. Assuming the Iris 15 camera sensor, with the stated 2.5 fps, the volumetric rate (megavoxels/sec) is 37.41.

      - The authors state that, "We systematically evaluated dehydration agents, including methanol, ethanol, and tetrahydrofuran (THF), followed by delipidation with commonly used protocols on 1 mm thick brain slices. Slices were expanded and examined for clarity under a macroscope." It would be useful to include some data from this evaluation in the manuscript to make it clear how the authors arrived at their final protocol. 

      Additional details on the expansion protocol may be included in another manuscript.

      General comments: 

      There is a tendency in the manuscript to use negative qualitative terms when describing prior work and positive qualitative terms when describing the work here. Examples include: 

      - "Throughput is limited in part by cumbersome and error-prone microscopy methods". While I agree that performing single neuron reconstructions at a large scale is a difficult challenge, the terms cumbersome and error-prone are qualitative and lacking objective metrics.

      We have revised this statement to be more precise, stating that throughput is limited in part by the speed and image quality of existing microscopy methods.

      - The resolution of the system is described in several places as "near-isotropic" whereas prior methods were described as "highly anisotropic". I agree that the ~1:3 lateral to axial ratio here is more isotropic than the 1:6 ratio of the other cited publications. However, I'm not sure I'd consider 3-fold worse axial resolution than lateral to be considered "near" isotropic.

      We agree that the term near-isotropic is ambiguous. We have modified the text accordingly, removing the term near-isotropic and where appropriate stating that the resolution is more isotropic than that of other cited publications.

      - In the manuscript, the authors describe the photobleaching in their imaging conditions as "negligible". Figure S5 seems to show a loss of 60% fluorescence after 2000 exposures (which in the caption is described as "modest"). I'd suggest removing these qualitative terms and just stating the values.

      We agree and have changed the text accordingly.

      - The results section for Figure 5 is titled "Tracing axons in human neocortex and white matter". Although this section states "larger axons (>1 um) are well separated... allowing for robust automated and manual tracing" there is no data for any tracing in the manuscript. Although I agree that the images are visually impressive, I'm not sure that this claim is backed by data.

      We have now removed the text in this section referring to automated and manual tracing.

    1. In 2019 the company Facebook (now called Meta) presented an internal study that found that Instagram was bad for the mental health of teenage girls, and yet they still allowed teenage girls to use Instagram. So, what does social media do to the mental health of teenage girls, and to all its other users?

      I think it’s a great question of should we still allow people to use instagram if it’s bad for us. If Meta’s proves that instagram can be bad to teenage girls. Shouldn’t we find ways to let it be beneficial instead of just ban it. I would considered that as a method to stop instagram from taking away customers from Facebook. Also the study indicated that in general this could be harmful. But there’s people benefits from this platform. If unfair to close it just because this may be bad for the tonnage girls mental health. Finding ways to make it beneficial to mental health would be the right solution.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02888

      Corresponding author(s): Christian, Fankhauser

      General Statements

      We were pleased to see that the three reviewers found our work interesting and provided supportive and constructive comments.

      Our answers to their comments and/or how we propose to address them in a revised manuscript are included in bold.

      1. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments: *• I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text. *

      We appreciate the reviewers’ comments and will address it by introducing appropriate changes to the discussion. One element that should be pointed out is that the study of Willige et al., 2021 allows us to look at sites where PIF7 is recruited in response to the shade stimulus (a low R/FR treatment) and relate this to higher transcript abundance of the nearby genes. The study of Pfeiffer et al., 2014 which analyses PIF ChIP studies from several labs does not include this dynamic view of PIF recruitment in response to a stimulus. For example, this study re-analyses data from our lab, Hornitschek et al., 2012, in which we did PIF5 ChIP in low R/FR, but we did not compare that to high R/FR to enable an analysis of sites where we see recruitment of PIF5 in response to a shade cue. In the revised manuscript we will also include a new figure comparing PIF7 recruitment and changes in gene expression at direct PIF target genes.

      • I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.* This is a thoughtful suggestion. Our choice to focus on PIF7 target genes is dictated by two reasons. First, the finding that amongst all tested PIFs, PIF7 is the major contributor to the control of low R/FR (neighbor proximity) induced responses in seedlings (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). In addition, the PIF7 ChIP-seq and gene expression data from the Willige et al., 2021 paper was obtained using growth conditions very similar to the ones we used, hence allowing us to compare it to our data. As the reviewer suggests, other PIFs also contribute to the low R/FR response and hence looking at ChIP-seq for those PIFs in publicly available data is also informative. One limitation of this data is that ChIP-seq was not always done in seedlings grown in conditions directly comparable to the conditions we used (except for PIF5, see above). Nevertheless, we have performed this analysis with the available data suggested by the reviewer and intend to include the results in the revised version of the manuscript, presumably updated Figure 4B.

      • In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      We appreciate this comment but to be comprehensive, we like to include a Col-0 control for each experiment (whenever possible) and hence also include the data when available.

      • In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences. __This is a good comment. Having looked at PIL1 gene induction by low R/FR in dozens of similar experiments made us realize that indeed while the PIL1 induction is always massive, the extent is somewhat variable. Based on the data that we have (including from RNA-seq) we are convinced that this is due to the very low level of expression of PIL1 in high R/FR conditions. Given that induction by low R/FR is expressed as fold increase relative to baseline high R/FR expression, small changes in the lowly expressed PIL1* in high R/FR leads to seemingly significant differences in its induction by low R/FR across experiments.__

      All qPCR data is represented by three biological replicates, and the variation between them per experiment is low, which is reflected in the size of the SD error bars. Data on technical and biological replicates in each panel will be clearly indicated in the revised figure legends.

      • I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      * All qPCR and CoP-qPCR experiments have been performed with three 3 biological replicates as described in Materials and Methods section, and these are represented in the Figures. Relative gene expression in the qPCR experiments was normalized to two housekeeping genes YLS8 and UBC21 and afterwards to one biological replicate of Col-0 control in HRFR. As indicated for the previous comment information about replicates will be included in the updated figure legends.

      • Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      * This comment of the reviewer prompted us to propose a different way of representing the data that is clearer (new Figure 5B and 5C). We believe that this facilitates the comparison between the genotypes. Enrichment over the input was calculated for the chromatin accessibility of each region. Chromatin accessibility was further normalized against two open control regions on the promoters of ACT2 (AT3G18780, region chr3:6474579: 6474676) and RNA polymerase II transcription elongation factor (AT1G71080 region chr1:26811833:26811945). The difference between previous representation is that the regions are not additionally subtracted to Col-0 in HRFR. We will update the Materials and Methods and figure legend sections with this information.

      Minor comments: • Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      We propose changing the presentation of the hypocotyl length data to show the values for days side-by-side as the Reviewer suggests.

      • I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      * We agree with the reviewer and will reduce the paragraph about auxin and merge it with the previous paragraph about transcription.

        • For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      * We agree that presenting the raw data that was used for quantification is important. We will include the western blots used for quantifying PIF4, PIF5 and PIF7 protein abundance (and loading control DET3). This information will presumably be included to the Supplementary Figure 3C (figure number to be confirmed once we decide on all new data to be presented).

      • Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      * __As suggested by the reviewer, we will rephrase this paragraph to more accurately account for our data and also what was reported by others (e.g. Willige et al, 2021, in Li et al, 2012) regarding the regulation PIF7 levels and phosphorylation in response to a low R/FR treatment. __

      • There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.*

      We thank the reviewer for pointing this out. This will be clarified by appropriate changes in the figure to avoid confusion in the revised version of Figure 3B.

      Reviewer #1 (Significance (Required)):

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study by Paulisic et al. explores the variations in chromatin accessibility landscape induced by plant exposure to light with low red/far-red ratios (LRFR), which mimicks neighbor shade perception. The authors further compare these changes with the genome association of PIF4 and PIF7 transcription factors - two major actors of gene expression regulation in response to LRFR. While this is not highlighted in the main text, the analyses of chromatin accessibility are performed on INTACT-mediated nucleus sorting, presumably to ensure proper and clean isolation of nuclei.

      Major comments

      • Why is the experimental setup exposing plants to darkness overnight? Does this affect the response to LRFR, by a kind of reset of phytochrome signaling? I guess this choice was made to maintain a strong circadian rhythm. Yet, given that PIF genes are clock-regulated, I am afraid that this choice complicates data interpretation concerning the specific effects of LRFR exposure.

      There appears to be some confusion which prompts us to better explain our protocol both by changing Figure 1A (that outlines the experimental conditions) and in the text.

      Seedlings are grown in long day conditions because this is more physiologically relevant than growing them in constant light, which is a rather unnatural condition.

      The reviewer is correct that PIF transcription is under circadian control and the shade avoidance response is gated by the circadian clock (e.g. Salter et al., 2003). To prevent conflating circadian and light quality effects, all samples that are compared are harvested at the same ZT (circadian time – hours after dawn). This allows us to focus our analysis on light quality effects specifically. We are therefore convinced that our protocol does not complicate the interpretation of the LRFR effects reported here.

      • As a result of this setup, the 1h exposure to LRFR immediately follows HRFR while the 3h final LRFR exposure of the « 25h LRFR » samples immediately follows a long period of darkness. Can this explain why in several instances (e.g., at the ATHB2 gene) 1h LRFR seems to have stronger effects than 25h LRFR on chromatin accessibility?* Please check the explanation above. Both samples are harvested at the same ZT (ZT3, meaning 3 hours after dawn). The 1h LRFR seedlings went through the night, had 2 hours of HRFR then 1h of LRFR. The 25h are harvested at the very same ZT, meaning 3h after dawn. Importantly, the HRFR control was also harvested at ZT3, meaning 3h after dawn. As indicated above this protocol allows us to focus on the light quality effects by comparing samples that are all harvested at the same ZT.

      We expect that the changes in Fig. 1A and associated text changes will clarify this issue.

      • Lane 42 cites the work by Calderon et al 2022 as « Transcript levels of these genes increase before the H3K4me3 levels, implying that H3K4me3 increases as a consequence of active transcription ». Despite this previous study being reviewed and published, such a strong conclusion should be taken cautiously, and I disagree with it. The study by Calderon et al compares RNA-seq with ChIP-seq data, two methodologies with very different sensitivity, especially when employing bulk cells/whole seedlings as starting materials. For example, a gene strongly induced in a few cells will give a good Log2FC in RNA-seq data analysis (as new transcripts are produced after a low level of transcripts before shade) but, even though its chromatin variations would follow the same temporality or would even precede gene induction, this would be invisible in bulk ChIP-seq data analysis (which averages the signal of all cells together). I understand the rationale for relying on the conclusions made in an excellent lab with strong expertise in light signaling, but I recommend being cautious when relying on these conclusions to interpret new data.* We agree with this comment, and we will change the text to reflect this.

      • The problem is that the same issue holds true when comparing ATAC-seq and RNA-seq data. ATAC signals reflect average levels over all cells while RNA-seq data can be influenced by a few cell highly expressing a given gene. Even though authors carefully sorted nuclei using an INTACT approach, this should be discussed, in particular when gene clusters (such as cluster C-D) show no match between chromatin accessibility and transcript level variations. In this regard, is PIF7 expressed in many cells or a small niche of cells upon LRFR exposure? The conclusions on its role in chromatin accessibility, analyzed here as mean levels of many different seedling cells, could be affected by PIF7 activity pattern (e.g., at lane 293). __This is a good comment. PIF7 is expressed in the cotyledons and leaves in LD conditions (Kidokoro et al, 2009, Galvao et al, 2019), and few available scRNA-seq datasets indicate an enrichment of PIF7 in the epidermis (Kim et al, 2021, Lopez-Anido et al, 2021). LRFR exposure only mildly represses PIF7* expression as seen in Figure 3A and also in our bulk RNA-seq study (Table S4). We will discuss this potential limitation to our study in a revised version of the manuscript.__

      • Lane 89, the conclusion linking DNA methylation and DNA accessibility is unclear to me, this may be rephrased. Also, it should be noted that in gene-rich regions, most DNA methylation is located along the body of moderately to highly transcribing genes (gene-body methylation) while promoters of active and inactive genes are most frequently un-methylated.* We will rephrase to better reflect the presence or absence of DNA methylation on promoter regions of shade regulated genes that contain accessible sites.

      • Figure 3B shows a few ChIP-qPCR results with important conclusions. Why not sequencing the ChIPped DNA to obtain a genome-wide view of the PIF4-PIF7 relationships at chromatin, and also consequently a more robust genome-wide normalization?

      * Several studies have shown that in the conditions that we studied here: transfer of seedlings from high R/FR (simulated sun) to low R/FR (neighbor proximity), amongst all PIFs, PIF7 is the one that plays the most dominant function (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). PIF4 and PIF5 also contribute but to a lesser extent. Given that Willige et al., 2021 did extensive ChIP-seq studies for PIF7 using similar conditions to the ones we used, we decided to rely on their data (that we re-analyzed), rather than performing our own PIF7 ChIP-seq analysis. While also performing a ChIP-seq analysis for PIF4 in similar conditions might be useful (this data is not available as far as we know), we are not convinced that doing that experiment would substantially modify the message. In the revised version we will also include analysis of the data from Pfeiffer et al., 2014, which comprises a ChIP-seq. dataset for PIF5 (the closest paralog of PIF4) initially performed by Hornitschek et al., in 2012 in low R/FR conditions (see comment to reviewer 1 above). For new ChIP-seq, we would have to make this experiment from scratch with substantially more material than what we used for the targeted ChIP-qPCR analyses. We thus do not feel that such an investment (time and money) is warranted.

        • Given the known functional interaction between PIF7 and INO80, it would be relevant to test whether changes in chromatin accessibility at ATHB2 and other genes are affected in ino80 mutant seedlings. __We agree with the reviewer that this is potentially an interesting experiment. This will allow us to determine whether the nucleosome histone composition has an influence on nucleosome positioning at selected shade-regulated genes (e.g. ATHB2). We note that according to available data, the effect of INO80 would be expected once PIF7 started transcribing shade-induced genes. We therefore propose comparing the WT with an ino80 mutant for their seedling growth phenotype, expression of selected shade marker gene (e.g. ATHB2*) and chromatin accessibility before (high R/FR) and after low R/FR treatment at selected shade marker genes. This will allow us to determine whether INO80 influences chromatin accessibility prior to a low R/FR treatment and/or once the treatment started. Our plan is to include this data in a revised version of the manuscript. __
      • On the same line, it would be interesting to test whether PIF7 target regions with pre-existing accessible chromatin would exist in ino80 mutant plants. In other words, testing a model in which chromatin remodeling by INO80 defines accessibility under HRFR to enable rapid PIF recruitment and DNA binding upon LRFR exposure.*

      See our answer just above.

      Minor comments

      *• In Figure 1C, it seems that PIF7 target genes do not match the set of LRFR-downregulated genes (even less than at random). Why not exclude these 4 genes from the analyses? *

      This is correct. There are indeed only 4 downregulated PIF7 target genes as we define them. Removing these genes from the analyses does not change our interpretation of the data and hence for completeness we propose keeping them in a revised version of the manuscript

      • Figure 3A shows the quantification of protein blots, but I did not find the corresponding images. These should be shown in the figure or as a supplementary figure with proper controls.

      * We will include the raw Westen blots used for quantification of PIF4, PIF5 and PIF7 in the revised version of the manuscript

        • Lane 102, it is unclear why PIF7 target genes were defined as the -3kb/TSS domains while Arabidopsis intergenic regions are on average much shorter. Gene regulatory regions, or promoters, are typically called within -1kb/TSS regions to avoid annotating a ChIP peak to the upstream gene or TE. A better proxy of PIF7 typical binding sites in gene regulatory regions could be determined by analysing the mean distance between PIF7 peak coordinates and the closest TSS. Typically, a gene meta-plot would give this information. __We agree that the majority of PIF7 binding peaks are close to the 5’ of the TSS based on the PIF7 binding distribution meta-plot. But several known PIF binding sites are actually further upstream than 1kb 5’ of the TSS (e.g. ATHB2 and HFR1). However, we re-analyzed the data using your suggestion with -2kb/TSS and -1kb/TSS and while the number of target genes is reduced, it does not change our conclusions about PIF7 binding sites being located on accessible chromatin regions. Importantly, some well characterized LRFR induced genes such as HFR1* would not be annotated correctly if only peaks closest to the gene TSS were taken into account, without flanking genes. In this case only the neighboring AT1G02350 would be annotated, hence missing some important PIF7 target genes. Taking this into consideration we will not modify this part of the analysis in a revised manuscript.__
      • Figure 4B, what's represented in the ATAC-seq heatmap: does a positive z-score represent high accessibility?*

      On the ATAC-seq heatmap we have represented z-scores of the average CPM (counts per million) for accessible chromatin regions. Z-scores are calculated by subtracting the average CPM from the median of averaged CPMs for each accessible chromatin region and then divided by the standard deviation (SD) of those averaged CPMs across all groups per accessible region (in our case a group is an average of three biological replicates for either HRFR, 1h or 25h of LRFR). In that sense, z-score indicates a change in accessibility, where higher z-score indicates opening of the region and lower z-score indicates a region becoming more closed when compared among the three light treatments (HRFR, 1h or 25h of LRFR). We will make sure that this is clear in the revised manuscript. Reviewer #2 (Significance (Required)):

      Contradicting the naive hypothesis that PIFs may target shade-inducible genes to « open » chromatin of shade-inducible genes with the help of chromatin remodelers, such as INO80, the study highlights that PIF7 typically associates with pre-existing accessible chromatin states. Thus, even though this is not stated, results from this study indicate that PIF7 is not a pioneer transcription factor. The data seem very robust, and while some conclusions need clarification, it should be of great interest to the community of scientists studying plant light signaling and shade responses.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Paulisic et al. investigate whether the transcriptional response of Arabidopsis seedlings to shade depends on chromatin accessibility, with a specific focus on PIF7-regulated genes. To this end, they perform ATAC-seq and RNA-seq, along with other experiments, on seedlings exposed to short and long shade and correlate the results with previously reported PIF7 and PIF4 ChIP-seq data. Based on their findings, they propose that shade-mediated transcriptional regulation may not require extensive remodeling of DNA accessibility. Specifically, they suggest that the open chromatin conformation allows PIFs to easily access and recognize their binding motifs, rapidly initiating gene expression in response to shade. This transcriptional response primarily depends on a transient increase in PIF stability and gene occupancy, with changes in chromatin accessibility occurring in only a small number of genes.

      Major comments: * • I have one issue that, in my opinion, requires more attention. To define the PIF7 target genes, which were later used to estimate whether PIF7 binds to open or closed chromatin and affects DNA accessibility after its binding, the authors compared the 4h LRFR data point from Willige et al. (2021) ChIP-seq with their 1h RNA-seq data point. This comparison might have missed early genes where PIF7 binds before the 1h time point but is no longer present on DNA at 4h. I understand the decision to choose the 4h Willige et al. ChIP-seq data point, performed under LD conditions, as it matches the data in this study, rather than the 5min-30min data points, which were conducted in constant light. However, if possible, it would be interesting to also compare the RNA-seq data with the early PIF7 binding genes to assess how many additional PIF7 target genes could be identified based on that comparison and whether this might alter the conclusions. If the authors do not agree with this point, it should at least be emphasized that the ChIP-seq data and the RNA-seq/ATAC-seq data were performed under different LRFR conditions (R/FR 0.6 vs. 0.1), which may lead to the misidentification of PIF7 target genes in the manuscript.*

      1) This is an interesting suggestion, we therefore reanalyzed 5, 10 and 30 min ChIP-seq timepoints from Willige et al, 2021 and compared them to 4h of LRFR (ZT4). We have crossed these lists of potential PIF7 targets with our 1h LRFR PIF457 dependent genes based on our RNA-seq. While some PIF7 targets appear only in early time points 5-10 min of LRFR exposure, overall, the number and composition of PIF7 target genes is rather constant across these timepoints. We propose to include these additional analyses in a revised version of the manuscript as a supplemental figure. However, these additional analyses do not influence our general conclusions.

      2) The comment regarding the R/FR ratio is important. We will point this out although the conditions used by Willige et al., 2021 and the ones we used are similar, they are not exactly the same in terms of R/FR ratio. Importantly, in both studies the early transcriptional response largely depends on the same PIFs, many of the same response genes are induced (e.g. PIL1, AtHB2, HFR1, YUC8, YUC9 and many others) and the physiological response (hypocotyl elongation) is similar. This shows that this low R/FR response yields robust responses.

      Minor comments: • In Fig. 1D, please describe the meaning of the blue shaded areas and the blue lines under the ATAC-seq peaks, as they do not always correlate.

      The shaded areas and the bars define the extension of the ATAC-seq accessible chromatin peaks. We will add the meaning of the shaded areas and the blue bars in the Figure legend and correct the colors in a revised manuscript

      • In Fig. 1E, it could be helpful to note that the 257 peaks in the right bar correspond to the peaks associated with the 177 genes in the left bar.* We will update Figure 1E and Figure legends for better understanding as the Reviewer suggested.

      • In lines 116, 119, and 122, I believe it should read "Fig. 2" instead of "Fig. 2A."* We thank the Reviewer for noticing the error that we will correct.

      • Lines 138-139: "PIF7 total protein levels were overall more stable, and only a mild and non-significant increase of PIF7 levels was seen at 1 h of LRFR." Since PIF7 usually appears as two bands in HRFR and only one band in LRFR, how was the protein level of PIF7 quantified in Fig. 3A? Additionally, I was wondering about the authors' thoughts on the discrepancy with Willige et al. (2021, Extended Data Fig. 1d), where PIF7 abundance seems to increased after 30 min and 2 h of LRFR.* PIF7 protein levels were quantified by considering both the upper and the lower band in HRFR (total PIF7) and normalizing its levels to DET3 loading control. We still observe an increase in the total PIF7 protein levels at 1h of LRFR, however this change was not statistically significant in these experiments. In our conditions as in Willige et al, 2021, the increase in PIF7 protein levels to short term shade seems consistent as is the pronounced shift or disappearance of the upper band (phosphorylated form) on the Western blots (raw data will be available in the revised manuscript). We will introduce text changes referring to the phosphorylation status of PIF7 in our conditions.

      • Line 150: "... many early PIF target genes (Figure 3C)." Since only PIL1 is shown in Fig. 3C, I would recommend revising this sentence. Alternatively, the data could be presented, as in Fig. 2, for all the PIF7 target genes with transient expression patterns.

      * We will introduce changes in the text to reflect that we only show PIL1 in the main Figure 3C.

      • Line 204: I'm not sure if Supplementary Fig. 7C-D is correct here. If it is, could the order of the figures be changed so that Supplementary Fig. 7C-D becomes Supplementary Fig. 7A-B?*

      The order of the panels A-B in the Supplementary Figure 7 follows the order of the text in the manuscript and is mentioned before panels C-D. It refers to the sentence “Overexpression of phyB resulted in a strong repression of hypocotyl elongation in both HRFR and LRFR, while the absence of phyB promoted hypocotyl elongation (Supplementary Figure 7A-B).”

        • Line 208: "In all three cases...". Please clarify what the three cases refer to. __We will change the text to more explicitly refer to the differentially accessible regions (DARs) of the genes ATHB2 and HFR1* shown in Figure 5A.__
      • Line 231: Should Fig. 5C also be cited here in addition to Supplementary Fig. 7?* We will add the reference to Figure 5C that was missing.

      *• In Supplementary Table 3, more information is needed. For example, it could mention: "This data is presented in Fig. 3 and is based on datasets from ChIP-seq, RNA-seq, etc."

      *

      The table will be updated with more information as suggested by the Reviewer.

      • In the figure legend of Fig. 4B, please check the use of "( )".*

      We will correct the error and include the references inside the parenthesis.

      Reviewer #3 (Significance (Required)):

      Paulisic et al. present novel discoveries in the field of light signaling and shade avoidance. Their findings extend our understanding of how DNA organization, prior to shade, affects PIF binding and how PIF binding remodels DNA accessibility. The data presented support the conclusions well and are backed by sufficient experimental evidence.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The manuscript has not been modified yet.

      3. Description of analyses that authors prefer not to carry out

      • *

      Reviewer 2 asked for new ChIP-seq analyses for PIF7 and PIF4. For reasons that we outlined above, we believe that such analyses are not required, and we currently do not intend performing these experiments.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments:

      I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text.

      I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.

      In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences.

      I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      Minor comments:

      Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.

      Significance

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      My fields of expertise are photobiology, photosynthesis and early seedling development.

    1. Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the *right* way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but *not* the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the non-normalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But, assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693 and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to *disinformation*, rather than simply _less information._ I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore *not* Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

    1. “Tendency to continue to surf or scroll through bad news, even though that news is saddening, disheartening, or depressing. Many people are finding themselves reading continuously bad news about COVID-19 without the ability to stop or step back.”

      It's wild seeing this with a dictionary definition, I thought it was just a slang term. I have absolutely nothing to back this up, but I do wonder if this was something that became a little bit more normalized during the height of the COVID-19 pandemic. I remember scouring articles every day to see if anything had changed, and I'm sure that was something that a large portion of the population was doing.

    1. One of the ways social media can be beneficial to mental health is in finding community (at least if it is a healthy one, and not toxic like in the last section). For example, if you are bullied at school (and by classmates on some social media platform), you might find a different online community online that supports you. Or take the example of Professor Casey Fiesler finding a community that shared her interests (see also her article [m26]):

      I think it's really important that this chapter mentions that social media can help people find a community that supports them. For instance, some people may be bullied by their classmates in real life and even attacked on social media. At such times, if they can find a warm group online, it will make them feel that they are not alone and they will be more powerful. I myself have also found friends online who share the same interests as me. That feeling was extremely happy, just like suddenly someone understands you. Although social media can sometimes have negative effects, if used properly, it can really help our mental health.

    1. Education changes lives in ways that go far beyond economic gains. The data show clearly that children who get better schooling are healthier and happier adults, more civically engaged and less likely to commit crimes. Schools not only teach students academic skills but also noncognitive skills, like grit and teamwork, which are increasingly important for generating social mobility. Even the friendships that students form at school can be life-altering forces for social mobility, because children who grow up in more socially connected communities are much more likely to rise up out of poverty.Conversely, limited social mobility hurts not just these children but all of society. We are leaving a vast amount of untapped talent on the table by investing unequally in our children, and it’s at all of our expense.Researchers have also used big data to uncover many specific education reforms that could lead to huge improvements. For instance, the evidence is clear that teachers are critical; my co-authors and I found that, when better teachers arrive at a school, the students in their classrooms earn around $50,000 more over each of their lifetimes. This adds up to $1.25 million for a class of 25 in just a single year of teaching.Smaller classes and increased tutoring also lead to long-term gains for students. Charter schools have revealed a range of effective approaches as well, often to the benefit of some of society’s most disadvantaged children. Children also benefit from longer school days, greater access to special education and less aggressive cutoffs for holding students back a grade.

      This sentence expands the value of education beyond academics, emphasizing that schools shape character and social behavior. It supports the idea that education has broad, lasting impacts on a child’s future, especially for those trying to overcome poverty.

    1. And yet suppression is never more dangerous than in such a time of social tension.

      It's like looking into a mirror of today's events. It's crazy that this statement is just as relevant today as it was the day it was written (and rewritten). It's so crazy to me as a collective we are in an ongoing cycle of freedom and suppression, conformity and nonconformity. It's a never ending fight.

  10. doolsetbangtan.wordpress.com doolsetbangtan.wordpress.com
    1. It’s not that you’re being punished

      Cluster criticism developed by Kenneth Burke, helps identify themes of an artifact based on reoccuring words which focuses on a main theme for the artifact. When picking an artifact for this criticism both discursive and nondiscursive artifacts are allowed. However, if the artifact is incomplete such as the example provided in Sonja K. Foss's book "a section of text or looking at a small poem might not work or be enough" because there will be too little data to work with to actually do a cluster analysis. When an artifact is chosen, it must be analyzed in three easy steps; The first thing to do is to find the key terms of the artifact, these are the recurring terms in the artifact that are significant. It is recommended not to pick more than five or six terms. The second step was to chart the terms that cluster around the key terms ** these would be other words that related and can be grouped under a specific key term that strengthens its significance. The third step is to discover an explanation for the artifact** what is the rhetor's thoughts or beliefs in regards to the key terms they had for the artifact. What could it tell us about the rhetor?

      In this specific artifact Pied Piper by BTS, from the title Pied Piper we make a connection to the fairy tale Pied Piper of Hamelin. The tale goes that the pied piper was hired by the people of Hamelin to help with the infestation of mice, the pied piper is able to easily lead the mice out by playing a tune on his flute which puts the mice into a trance. Once every single mice is removed, the town closes the entrance of their home and when the Pied Piper returns to get paid for his services the town refuses to pay him and casts him away. As retribution for this shameless/entitled act the pied piper plays his flute specifically targeting the children of the town, his tune hypnotizes all the children and leads them out of town which is where the tale ends. Throughout the song BTS takes on this role of the pied piper, this song being that tune the pied piper plays to attract their audience (ARMY). We arrive to this understanding from the terms constantly used throughout the song. Key Terms/cluster terms: -Paradise: I'm, your, prize, sweet, rescue, warm hand - Song: listen, pied piper, leads you, follow, sound, blow, pipes, pulled<br /> - Dangerous: guilty pleasure, takin' over, ruin, forbidden fruit, can't live without, no escape

      Significance: Theres a lot of symbolism and ties to the creation of Adam and Eve, the temptation they get to have the forbidden fruit. The Pied Piper being the tempter that takes their target to their doom or in this instance to paradise. The little bits of danger that are emphasized are those of listeners being at the mercy of the musician, they are unable to refuse and unconsciously follow and are lead away towards this new realm known to be paradise. The song in this instance is supposed to be this tool that caused the audience to follow, attentively listen to and get swept away to this better universe. Paradise or this "better universe" is the bigger gift listeners will get for being apart of this community. listeners get prizes such as songs, content (lives, interactions, photos, vlogs). They'll be the receivers of sweetness, feel safe and welcomed. It feels like the selling and advertisement of a vacation package that is too good and difficult to refuse. There's this understanding of the effortless hook/enchantment which listeners are being warned about but due to already listening to the warning there are no reversals because the benefits are greater than any potential consequence they could imagine. <br /> Although, we get that one rap verse addressing common fan behaviors that may be seen as negative by those outside the community, most of the song is about accepting and telling listeners to continue following and doing what they've been doing so far (having fun, supporting their favorite artists, buying merchandise, etc.). <br /> [1] (https://www.youtube.com/watch?v=zsAiX0M7H6U) Just like it can be seen in the video and the meaning of the song, ARMY is seen being in complete synchronization with BTS and rhythm of the song both in movement and singing. Not only are synchronized in this song but in all the songs ever made.

    1. But ‘jazz’ was not merely in the name of a catchy tune. As the word ‘jazz’ began to appearwith increasing frequency across the city, it came to signify a whole range of meanings – aswould also be the case in many other locales around the planet. Indeed, between 1917 and1921, the word ‘jazz’ disseminated rapidly throughout the world attaining, along the way, amultiplicity of meanings, sometimes related to musical practices from New Orleans,Chicago, New York, and elsewhere in the United States, but quite often also associatedwith a diverse array of things, objects, ideas, and situations in the worlds of music entertain-ment, dance, leisure, and fashion.

      I am drawn to the idea that Jazz music has been more than just a cut-and-dry label for a genre of music since the beginning. 1917 was the first year of jazz music being released in a traditional sense (sales of records/albums), and I think it’s cool that even that early on, people were already recognizing jazz as more than just an objective style of music, but also as an idea/attitude/culture as well. In some of the research that i’ve been doing for my paper, I’ve been looking at early newspaper articles about jazz music, and it’s interesting how much jazz is perceived just as much of a rebellious act/ cultural phenomenon as it is a genre of music. I also find it interesting how jazz music spread and was influenced over time across multiple different cultures, and that its history was not just limited to its development domestically.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Compelling and clearly described work that combines two elegant cell fate reporter strains with mathematical modelling to describe the kinetics of CD4+ TRM in mice. The aim is to investigate the cell dynamics underlying the maintenance of CD4+TRM.

      The main conclusions are that:

      (1) CD4+ TRM are not intrinsically long-lived.

      (2) Even clonal half-lives are short: 1 month for TRM in skin, and even shorter (12 days) for TRM in lamina propria.

      (3) TRM are maintained by self-renewal and circulating precursors.

      Strengths:

      (1) Very clearly and succinctly written. Though in some places too succinctly! See suggestions below for areas I think could benefit from more detail.

      (2) Powerful combination of mouse strains and modelling to address questions that are hard to answer with other approaches.

      (3) The modelling of different modes of recruitment (quiescent, neutral, division linked) is extremely interesting and often neglected (for simpler neutral recruitment).

      Weaknesses/scope for improvement:

      (1) The authors use the same data set that they later fit for generating their priors. This double use of the same dataset always makes me a bit squeamish as I worry it could lead to an underestimate of errors on the parameters. Could the authors show plots of their priors and posteriors to check that the priors are not overly-influential? Also, how do differences in priors ultimately influence the degree of support a model gets (if at all)? Could differences in priors lead to one model gaining more support than another?

      We now show the priors and posteriors overlaid in Figure S2. The posteriors lie well within the priors, giving us confidence that the priors are not overly influential.

      (2) The authors state (line 81) that cells were "identified as tissue-localised by virtue of their protection from short-term in vivo labelling (Methods; Fig. S1B)". I would like to see more information on this. How short is short term? How long after labelling do cells need to remain unlabelled in order to be designated tissue-localised (presumably label will get to tissue pretty quickly -within hours?). Can the authors provide citations to defend the assumption that all label-negative cells are tissue-localised (no false negatives)?

      And conversely that no label-positive cells can be found in the tissue (no false positives)? I couldn't actually find the relevant section in the methods and Figure S1B didn't contain this information.

      We did describe the in vivo labeling in the first section of Methods (it was for 3 mins before sacrifice). The two aims of Fig S1B were to show the gating strategy (label-positive and negatives from tissue samples were clearly separated) and to address the false-positive issue. Less than 3% of cells in our tissue samples were positive; therefore, at most 3% of truly tissue-resident cells acquired the i.v. label, and likely less. Excluding those (as we did) therefore makes little difference to our analyses in terms of cell numbers. False negative rates are expected to be extremely low; labeling within circulating cells is typically >99% (see refs in Methods).

      (3) Are the target and precursor populations from the same mice? If so is there any way to reflect the between-individual variation in the precursor population (not captured by the simple empirical fit)? I am thinking particularly of the skin and LP CD4+CD69- populations where the fraction of cells that are mTOM+ (and to a lesser extent YFP+) spans virtually the whole range. Would it be nice to capture this information in downstream predictions if possible?

      This is a great point. We do indeed isolate all populations from each mouse. We are very aware of the advantages of using this grouping of information to reduce within-mouse uncertainty – we employ this as often as we can. The issue here was that the label content within the tissue (target) at any time depends on the entire trajectory of the label frequency in the precursor, in that mouse, up to that point. We can’t identify this curve for each animal individually – so we are obliged to use a population average.

      To mitigate this lack of pairing we do take a very conservative approach and fit this empirical function describing the trajectories of YFP and mTom in precursors at the same time as the label kinetics in the target; that is, we account for uncertainty in label influx in our fits and parameter estimates.

      Another issue is that to be sure that we are performing model selection appropriately, we only use the distribution of the likelihood on the target observations when comparing support for different precursors with LOO-IC. If we had been able to pair the precursor and target data in some way, the two would then be entangled and model comparison across precursors would not be possible.

      We’ve added some of this to the discussion.

      (4) In Figure 3, estimates of kinetics for cells in LP appear to be more dependent on the input model (quiescent/neutral/division-linked) than the same parameters in the skin. Can the authors explain intuitively why this is the case?

      This is a nice observation and it has a fairly straightforward explanation. As we pointed out in the paper, estimated rates of self renewal become more sensitive to the mode of recruitment the greater the rate of influx. If immigrants are quiescent, all Ki67 in the tissue has to be explained by self renewal. If all new immigrants are Ki67 high, the estimate of the rate of self renewal within the tissue will be lower. Across the board, the estimated rates of influx into gut were consistently higher than those in skin, and so the sensitivity of parameters to the mode of recruitment was much more obvious at that site.

      The importance of this trade-off for the division linked model can also be seen when you look at the neutral and quiescent models; they give similar parameter estimates because the Ki67 levels within all precursor populations were all less than 25% and so those two modes of recruitment are difficult to distinguish.

      (5) Can the authors include plots of the model fits to data associated with the different strengths of support shown in Figure 4? That is, I would like to know what a difference in the strength of say 0.43 compared with 0.3 looks like in "real terms". I feel strongly that this is important. Are all the fits fantastic, and some marginally better than others? Are they all dreadful and some are just less dreadful? Or are there meaningful differences?

      This is another good point (and from the author recommendations list, is your most important concern).

      We find that a fairly common issue is that models that are clearly distinguished by information criteria or LRTs can often give visually quite similar fits. Our experience is that this is partly due to the fact that models are usually fit on transformed scales (e.g. log for cell counts, logit for fractions) to normalise residuals, and this uncertainty is compressed when one looks at fits on the observed scale (e.g. linear). Another issue in our case is that for each model (precursor, target, and mode of recruitment) we fit 6 time courses simultaneously. Visual comparisons of fits of different models can then be a little difficult or misleading; apparently small differences in each fitted timecourse can add up to quite significant changes in the combined likelihood. We added this to the Discussion.

      The number of models is combinatorial (Fig. 4) so showing them all seems a bit cumbersome. But now in the supporting information (Fig. S3), for each target we show the best, second best, and the worst model fits overlaid, to give a sense of the dynamic range of the models we considered. As you will now see, visual differences among the most strongly supported models were not huge (but refer to our point just above). Measures of out-of-sample prediction error (LOO-IC) discriminated between these models reasonably well, though (weights shown in Fig. 4).

      It’s also worth mentioning here that we have substantially greater confidence in the identity of the precursors than in the precise modes of recruitment - you can see this clearly in the groupings of weights in Figure 4A. We did comment on this in the text but now emphasise it more.

      (6) Figure 4 left me unclear about exactly which combinations of precursors and targets were considered. Figure 3 implies there are 5 precursors but in Figure 4A at most 4 are considered. Also, Figure 4B suggests skin CD69- were considered a target. This doesn't seem to be specified anywhere.

      Thanks for pointing this out. When we were considering CD4+ EM in bulk as target, this population includes CD69- cells; in those fits, therefore, we couldn't use CD69- as a precursor. We now clarify this in the caption. Thanks also for the observation about Figure 4B; we didn’t consider CD69- cells as a target, so we’ve also made that clearer.

      Reviewer #2 (Public review):

      This manuscript addresses a fundamental problem of immunology - the persistence mechanisms of tissue-resident memory T cells (TRMs). It introduces a novel quantitative methodology, combining the in vivo tracing of T-cell cohorts with rigorous mathematical modeling and inference. Interestingly, the authors show that immigration plays a key role in maintaining CD4+ TRM populations in both skin and lamina propria (LP), with LP TRMs being more dependent on immigration than skin TRMs. This is an original and potentially impactful manuscript. However, several aspects were not clear and would benefit from being explained better or worked out in more detail.

      (1) The key observations are as follows:

      a) When heritably labeling cells due to CD4 expression, CD4+ TRM labeling frequency declines with time. This implies that CD4+ TRMs are ultimately replenished from a source not labeled, hence not expressing CD4. Most likely, this would be DN thymocytes.

      That’s correct.

      b) After labeling by Ki67 expression, labeled CD4+ TRMs also decline - This is what Figure 1B suggests. Hence they would be replaced by a source that was not in the cell cycle at the time of labeling. However, is this really borne out by the experimental data (Figure 2C, middle row)? Please clarify.

      (2) For potential source populations (Figure 2D): Please discuss these data critically. For example, CD4+ CD69- cells in skin and LP start with a much lower initial labeling frequency than the respective TRM populations. Could the former then be precursors of the latter?

      A similar question applies to LN YFP+ cells. Moreover, is the increase in YFP labeling in naïve T cells a result of their production from proliferative thymocytes? How well does the quantitative interpretation of YFP labeling kinetics in a target population work when populations upstream show opposite trends (e.g., naïve T cells increasing in YFP+ frequency but memory cells in effect decreasing, as, at the time of labeling, non-activated = non-proliferative T cells (and hence YFP-) might later become activated and contribute to memory)?

      These are good (and related) points. We've added some text to the discussion, paragraphs 2 and 3; we reproduce it here, slightly expanded.

      Fig 1B was a schematic but did faithfully reflect the impact of any waning of YFP in precursor on its kinetic in the targets. However, in our experiments, as you noted, the kinetics of YFP in most of the precursor populations were quite flat. This was due in part to memory subsets being sustained by the increasing levels of YFP within naïve cells from the cohort of thymocytes labeled during treatment. There is also likely some residual permanent labeling of lymphocyte progenitor populations. We discussed this in Lukas Front Imm 2023. (The latter is not a problem; all that matters for our analysis is that we generate a reasonable empirical description of the label kinetics in naive cells, however it arises). YFP is therefore not cleanly washed out in the periphery; and so for models with circulating memory as the tissue precursor, the flatness of their YFP curves leads to rather flat curves in the tissues.

      The mTom labelling was more informative as it was clearly diluted out of all peripheral populations by mTom-negative descendants of thymically-derived cells, as you point out in (a).

      Regarding (2), re: interpreting the initial levels of labels in precursors and targets. The important point here is that YFP and mTom were induced quickly in all populations we studied; therefore our inferences regarding precursors and targets aren’t informed by the initial levels of levels in each. (Imagine a slow precursor feeding a rapidly dividing target; YFP levels in the former would start lower than those in the latter). The causal issue that we think you’re referring to would matter if one expects the targets to begin with no label at all; for instance, in our busulfan chimeric mouse model (e.g. Hogan PNAS 2015) new, thymically derived ‘labelled’ (donor) cells progressively infiltrate replete ‘unlabelled’ (host) populations. In that case, one can immediately reject certain differentiation pathways by looking the sequence of accrual of donor cells in different subsets.

      The trends in YFP and mTom frequencies after treatment do matter for pathway inference, though, because precursor kinetics must leave an imprint on the target. For the case you mentioned, with opposite trends in label kinetics, such models would unlikely to be supported strongly; indeed, we never saw strong support for naïve cells (strongly increasing YFP) as a direct precursor of TRM (fairly flat).

      We’ve added a condensed version of this to the Discussion.

      (3) Please add a measure of variation (e.g., suitable credible intervals) to the "best fits" (solid lines in Figure 2).

      Added.

      (4) Could the authors better explain the motivation for basing their model comparisons on the Leave-OneOut (LOO) cross-validation method? Why not use Bayesian evidence instead?

      Bayes factors are very sensitive to priors and are either computationally unstable if calculated with importance sampling methods, or very expensive to calculate, if ones uses the more stable bridge sampling method. (We also note that fitting just a single model here takes a substantial amount of time). Further, using BF can be unreliable unless one of the models is close to the 'true' data generating model; though they seem to work well, we can be sure that none of our models are! For us, a more tractable and real-world selection criterion is based on the usefulness of a model, for which predictive performance is a reasonable proxy. In this case the mean out-of-sample prediction error (which LOO-IC reflects) is a wellestablished and valid means of ascribing support to different models.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. History is replete with cruelty, yet we seem incapable of shocking one another with its mention. The impact of a specific story of undue suffering leaves a mark, but on whom, and under what circumstances, remains uncertain—uncertain in any way that might guide us toward clear resolutions or the avoidance of suffering in some moral form. I take this to be true of humanity, or of any population within a district, city, or collective we might belong to. I don’t yet know how to properly assess the importance of Greek tragedy, its cathartic effect, or the social function it served, any more than I feel compelled to comment on its counterpart in the perverse spectacle of public capital punishment, or the codes, glee, or grim necessity driving prison guards and military personnel to inflict beatings or torture in relative privacy. The point is that, outside of artistic redemption or intentional extremes, most of us are agents of death—often, but also mere facilitators of it. We all have a job to do. The closer that job aligns with our immediate goals, the farther we are from that death or suffering, padding our own bank accounts to avoid the same destruction—or perhaps, the closer to it. It would be unfair to label a grocery clerk an agent of death for ringing up food that might clog a customer’s arteries and contribute to their demise. There is an element of choice, but also the indirectness of facilitating death. Of entropy. What can we say of the owner of a liquor store in the heart of an urban hellscape? I’m sure it depends.

      Perhaps it is just a taste thing, but I think the story doesn't really need this opening,. It's well written, and stands well on it's own - but it pulls focus away from the narrative. Maybe just a few opening lines could have the same effect?

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      (1) This manuscript introduces a useful curation pipeline of antibody-antigen structures downloaded from the PDB database. The antibody-antigen structures are presented in a new database called AACDB, alongside annotations that were either corrected from those present in the PDB database or added de-novo with a solid methodology. Sequences, structures, and annotations can be very easily downloaded from the AACDB website, speeding up the development of structure-based algorithms and analysis pipelines to characterize antibody-antigen interactions. However, AACDB is missing some key annotations that would greatly enhance its usefulness.

      Here are detailed comments regarding the three strengths above:

      I think potentially the most significant contribution of this database is the manual data curation to fix errors present in the PDB entries, by cross-referencing with the literature. However, as a reviewer, validating the extent and the impact of these corrections is hard, since the authors only provided a few anecdotal examples in their manuscript.

      I have personally verified some of the examples presented by the authors and found that SAbDab appears to fix the mistakes related to the misidentification of antibody chains, but not other annotations.

      (a) "the species of the antibody in 7WRL was incorrectly labeled as "SARS coronavirus B012" in both PDB and SabDab" → I have verified the mistake and fix, and that SAbDab does not fix is, just uses the pdb annotation.

      (b) "1NSN, the resolution should be 2.9 , but it was incorrectly labeled as 2.8" → I have verified the mistake and fix, and that sabdab does not fix it, just uses the PDB annotation.

      (c) "mislabeling of antibody chains as other proteins (e.g. in 3KS0, the light chain of B2B4 antibody was misnamed as heme domain of flavocytochrome b2)" → SAbDab fixes this as well in this case.

      (d) "misidentification of heavy chains as light chains (e.g. both two chains of antibody were labeled as light chain in 5EBW)" → SAbDab fixes this as well in this case.

      I personally believe the authors should make public the corrections made, and describe the procedures - if systematic - to identify and correct the mistakes. For example, what was the exact procedure (e.g. where were sequences found, how were the sequences aligned, etc.) to find mutations? Was the procedure run on every entry?

      We appreciate the reviewer’s valuable feedback. Our correction procedures combined manual curation with systematic sequence analysis. While most metadata discrepancies were resolved through cross-referencing original literature, we implemented a structured approach for identifying mutations in specific cases. For PDB entries labeled as variants (e.g., "Bevacizumab mutant" or "Ipilimumab variant Ipi.106") where the "Mutation(s)" field was annotated as "NO," we retrieved the canonical therapeutic antibody sequence from Thera-SAbDab, then performed pairwise sequence alignment against the PDB entry using BLAST program to identified mutated residues.

      This procedure was not applied to all entries, as mutations are context-dependent. Therapeutic antibodies have well-defined reference sequences, enabling systematic alignment. For antibodies lacking unambiguous wild-type references (e.g., research-grade or non-therapeutic antibodies), mutation annotations were directly inherited from the PDB or literature.

      All corrections have been publicly archived in AACDB. We have added a detailed discussion of this issue in the section “2.3 Metadata” of revised manuscript.

      (2) I believe the splitting of the pdb files is a valuable contribution as it standardizes the distribution of antibody-antigen complexes. Indeed, there is great heterogeneity in how many copies of the same structure are present in the structure uploaded to the PDB, generating potential artifacts for machine learning applications to pick up on. That being said, I have two thoughts both for the authors and the broader community. First, in the case of multiple antibodies binding to different epitopes on the same antigen, one should not ignore the potentially stabilizing effect that the binding of one antibody has on the complex, thereby enabling the binding of the second antibody. In general, I urge the community to think about what is the most appropriate spatial context to consider when modeling the stability of interactions from crystal structure data. Second, and in a similar vein, some antigens occur naturally as homomultimers - e.g. influenza hemagglutinin is a homotrimer. Therefore, to analyze the stability of a full-antigen-antibody structure, I believe it would be necessary to consider the full homo-trimer, whereas, in the current curation of AACDB with the proposed data splitting, only the monomers are present.

      We sincerely appreciate the reviewer’s insightful comments regarding the splitting of PDB files and we appreciate the opportunity to address the reviewer’s thoughtful concerns.

      Firstly, when two antibodies bind to distinct epitopes on the same antigen, we would like to clarify that this scenario can be divided into two cases based on the experimental context: Case1: When two antibodies bind to distinct epitopes on the same antigen, and their complexes are determined in separate structures. For example, SAR650984 (PDB: 4CMH) and daratumumab (PDB: 7DHA) target CD38 at non-overlapping epitopes. These two antibody-antigen complexes were determined independently, and their structures do not influence each other. Case 2 : When the crystal structure contains a ternary complex with two antibodies and an antigen, as in the example of 6OGE discussed in Section 2.2 of our manuscript. After reviewing the original literature, the experiment confirmed that the order of Fab binding does not affect the formation of the ternary complex, and the binding of one antibody does not enhance the binding of the other. This supports the rationale for splitting 6OGE into two separate structures. However, we acknowledge that not all ternary complexes in the PDB provide such detailed experimental descriptions in their original literature. We agree with the reviewer that in some cases, one antibody may stabilize the structure to facilitate the binding of a second antibody. For instance, in 3QUM, the 5D5A5 antibody stabilizes the structure, enabling the binding of the 5D3D11 antibody to human prostate-specific antigen. Such sandwich complexes are indeed valuable for identifying true epitopes and paratopes. Importantly, splitting the structure does not alter the interaction sites.

      Secondly, we fully agree with the reviewer that for antigens that naturally exist as homomultimers (e.g., influenza hemagglutinin as a homotrimer), the full multimeric structure should be considered when analyzing stability. In such cases, users can directly utilize the original PDB structures provided in their multimeric form. Our splitting approach is intended to provide an additional option for cases where monomeric analysis is sufficient or preferred, but it does not preclude the use of the original multimeric structures when necessary.

      (3) I think the manuscript is lacking in justification about the numbers used as cutoffs (1A^2 for change in SASA and 5A for maximum distance for contact) The authors just cite other papers applying these two types of cutoffs, but the underlying physico-chemical reasons are not explicit even in these papers. I think that, if the authors want AACDB to be used globally for benchmarks, they should provide direct sources of explanations of the cutoffs used, or provide multiple cutoffs. Indeed, different cutoffs are often used (e.g. ATOM3D uses 6A instead of 5A to determine contact between a protein and a small molecule https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c45147dee729311ef5b5c3003946c48f-Abstract-round1.html). I think the authors should provide a figure with statistics pertaining to the interface atoms. I think showing any distribution differences between interface atoms determined according to either strategy (number of atoms, correlation between change in SASA and distance...) would be fundamental to understanding the two strategies. I think other statistics would constitute an enhancement as well (e.g. proportion of heavy vs. light chain residues).

      Some obvious limitations of AACDB in its current form include:

      AACDB only contains entries with protein-based antigens of at most 50 amino acids in length. This excludes non-protein-based antigens, such as carbohydrate- and nucleotide-based, as well as short peptide antigens.

      AACDB does not include annotations of binding affinity, which are present in SAbDab and have been proven useful both for characterizing drivers of antibody-antigen interactions (cite https://www.sciencedirect.com/science/article/pii/S0969212624004362?via%3Dihub) and for benchmarking antigen-specific antibody-design algorithms (cite https://www.biorxiv.org/content/10.1101/2023.12.10.570461v1)).

      We thank the reviewer for raising this critical point about the cutoff values used in AACDB. In the current study, the selection of the threshold value is very objective; the threshold chosen in the manuscript is summarized based on existing literature, and we have provided more literature support in the manuscript. The criteria for defining interacting amino acids in established tools, typically do not set the ΔSASA exceed 1 Å2 and the distance exceed 6 Å. While our manuscript emphasizes widely accepted thresholds for consistency with prior benchmarks, AACDB explicitly provides raw ΔSASA and distance values for all annotated residues. Users can dynamically filter the data from downloaded files by excluding entries exceeding their preferred thresholds (e.g., selecting 5Å instead of 6Å). This ensures adaptability to diverse research needs. In the revised version, we reset the distance threshold to 6 Å and calculated the interacting amino acids in order to give the user a wider range of choices. In the section “3.2 Database browse and search” of revised manuscript, we provide a description of the flexible choice of thresholds for practical use.

      Furthermore, distance and ΔSASA are two distinct metrics for evaluating interactions. Distance directly quantifies spatial proximity between atoms, reflecting physical contacts such as van der Waals interactions or hydrogen bonds, and is ideal for identifying direct spatial adjacency. ΔSASA, on the other hand, measures changes in solvent accessibility of residues during binding, capturing the contribution of buried surfaces to binding free energy. Even for residues not in direct contact, reduced SASA due to conformational changes may indicate indirect functional roles.

      As demonstrated through comparisons on the detailed information pages, the sets of interacting amino acids defined by these two methods differ by only a few residues, with no significant variation in their overall distributions. However, since interaction patterns vary significantly across different complexes, analyzing residue distributions across all structures using both criteria is not feasible.

      We thank the reviewer for highlighting these limitations. AACDB currently focuses on protein-based antigens ≤50 amino acids to prioritize structural consistency, which excludes non-protein antigens and shorter peptides. While affinity annotations are critical for benchmarking antibody design tools, these data were not integrated in this release due to insufficient data verification caused by internal team constraints. We acknowledge these gaps and plan to expand antigen diversity and incorporate affinity metrics in future updates.

      Reviewer #2:

      Summary:

      Antibodies, thanks to their high binding affinity and specificity to cognate protein targets, are increasingly used as research and therapeutic tools. In this work, Zhou et al. have created, curated, and made publicly available a new database of antibody-antigen complexes to support research in the field of antibody modelling, development, and engineering.

      Strengths:

      The authors have performed a manual curation of antibody-antigen complexes from the Protein Data Bank, rectifying annotation errors; they have added two methods to estimate paratope-epitope interfaces; they have produced a web interface that is capable of both effective visualisation and of summarising the key useful information in one page. The database is also cross-linked to other databases that contain information relevant to antibody developability and therapeutic applications.

      Weaknesses:

      The database does not import all the experimental information from PDB and contains only complexes with large protein targets.

      Thank you for the valuable feedback. As previously responded to Reviewer 1, due to limitations within our team, comprehensive data integration from PDB has not been achieved in the current version. We acknowledge the significance of expanding the database to encompass a broader range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Nevertheless, we are committed to enhancing the database in upcoming upgrades to provide users with a more comprehensive and inclusive resource

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 194: "produce" → "produced"

      We thank the reviewer for the feedback. We have checked the grammar and spelling carefully in the revised manuscript.

      (2) As mentioned in the public review, I think adding binding affinity annotations would greatly enhance the use cases for the database.

      We thank the reviewer for the suggestion. As the response in “Public review”. Due to team constraints, these data are not integrated into this release but are being collated. We recognize these gaps and plan to expand antigenic diversity and incorporate affinity metrics in future updates.

      (3) I think adding a visualization of interface atoms and contacts on an entry's webpage would be useful for someone exploring specific entries. It also would be useful if the authors provided a pymol command to select interface residues since that's a procedure any structural biologist is likely to do.

      We sincerely appreciate the reviewer’s constructive suggestions. In response to the request for enhanced visualization and accessibility of interface residue information, we have implemented the following improvements: (1) Web Interface Visualization. On the entry-specific webpage, we have added an interactive visualization window that highlights the antigen-antibody interaction interface using distinct colors. The interaction interface visualization has been incorporated into Figure 5 of the revised manuscript, with a detailed description. (2) PyMOL Command Accessibility. The “Help” page now provides step-by-step PyMOL commands to select and visualize interface residues.

      (4) I think the authors should provide headers to the files containing interface residues according to the change-in-SASA criterion, as they do for those computed according to contact. This would avoid unnecessary confusion - however slight - and make parsing easier. I was initially confused by the meaning of the last column, though after a minute I understood it to be the change in SASA.

      We thank the reviewer for providing such detailed feedback. We thank the reviewer for the comment and the suggestion. We have provided headers for the files of the interacting residues defined by ΔSASA.

      (5) Line 233: "AACDB's data processing pipeline supports mmCIF files" → The meaning and implications of this statement are not obvious to me, and are mentioned nowhere else in the paper. Do you mean that in AACDB there are structure entries that the RCSB PDB database only has in mmCIF file format, and not .pdb format? So, effectively, there are some entries in AACDB that are not in any other antibody-specific database?I checked and, as of Dec 3rd, 2024, there are 41 structures in AACDB that are NOT in SAbDab. Manually checking 5 of those 41 structures, none are mmCIF-only structures.

      We thank the reviewer for the valuable comment. Because of the size of the structures within certain entries, representing them in a single PDB format data file is not feasible due to the excessive number of atoms and polymer chains they contain. As a result, PDB stores these structures in “mmcif” format files. In AACDB, 47 entries, such as 7SOF, 7NKT, 7B27, and 6T9D, are only available in the “mmCIF” format from the PDB. The “.pdb” and “.cif” files contain atomic coordinates in distinct text formats, and the segmentation of these structure files is automatically conducted based on manually annotated antibody-antigen chains. To accommodate this, we have incorporated these considerations into our file processing pipeline, thereby enabling a fully automated file segmentation process. Additionally, we employed Naccess to calculate interatomic distances. However, since this software only accepts .pdb format files as input, we also converted all split .cif files into .pdb format within our fully automated pipeline. We apologize for the lack of clarity in the original manuscript and have included a more detailed explanation in the "2.2 PDB Splitting" section of the revised manuscript.

      Reviewer #2:

      (1) In SabDab and PDB, experimental binding affinities are also reported: could the authors comment on whether they also imported this information and double-checked it against the original paper? If it wasn't imported, that might discourage some users and should be considered as an extension for the future.

      We thank the reviewer for the comment and the suggestion. As the response in “Public review”. Due to current resource constraints, quantitative affinity data has not been incorporated into this release but is undergoing systematic curation. We explicitly recognize these limitations and propose a two-pronged strategy for future iterations: (1) broadening antigen diversity coverage through expanded structural sampling, and (2) integrating quantitative binding affinity measurements. In the Discussion section, we have included description outlining the planned enhancements.

      (2) Line 49-50: the references mentioned in connection to deep learning methods for antibody-antigen predictions seem a bit limited given the amount of articles in this field, with 3 of 4 references on one method only (SEPPA), could the authors expand this list to reflect a bit more the state of the art?

      We thank the reviewer for the suggestion. We agree that more relevant studies should be listed and therefore more references are provided in the revised manuscript.

      When mentioning the limitations of the existing databases, it feels a bit that the criticism is not fully justified. For instance:

      Line 52-53: could the authors elaborate on the reasons why such an identification is challenging? (Isn't it possible to make an efficient database-filtered search? Or rather, should one highlight that a more focussed resource is convenient and why?)

      Thank you for feedback. In this study, the keywords "antibody complex," "antigen complex," and "immunoglobulin complex," were employed during data collection. PDB returned over 30,000 results, of which only one-tenth met our criteria after rigorous filtering. This demonstrates that keyword searches, while useful, inherently limit result precision and introduce substantial redundancy, likely due to the PDB's search mechanism. That’s why we illustrated the significant challenges in identifying antibody-antigen complexes from general protein structures in the PDB.

      Line 55: reading the website http://www.abybank.org/abdb/, it would be fairer to say that the web interface lacks updates, as the database and the code have gone through some updates. Could the authors provide a concrete example of the reason why: 'The AbDb database currently lacks proper organization and management of this valuable data.'?

      We thank the reviewer for highlighting this issue. In our original manuscript, the statement that the AbDb database "lacks proper organization and management" was based on the absence of explicit statement regarding data updates on its official website at the time of submission, even though internal updates to its content may have occurred. We fully respect the long-standing contributions of AbDb to antibody structural research, and our comments were solely directed at the specific state of the database at that time. As the reviewer noted, following the release of our preprint, we have also taken note of AbDb's recent updates. To reflect the latest developments and avoid potential misinterpretation, we have revised the original statement in revised manuscript.

      Also 'this rapid updating process may inadvertently overlook a significant amount of information that requires thorough verification,': it's difficult for me to understand what this means in practice. Could the authors clarify if they simply mean that SabDab collects information from PDB and therefore tends to propagate annotation errors from there? If yes, I think it's enough to state it in these terms, and for sure I agree that the reason is that correcting these annotation errors requires a substantial amount of work.

      We thank the reviewer for providing such detailed feedback on the manuscript. We acknowledge that SabDab represents a highly valuable contribution to the field, and its rapid update mechanism has significantly advanced related research areas. However, as stated by the reviewer, we aim to clarify that SabDab primarily relies on automated metadata extraction from the PDB for annotation, and its rapid update process inherently inherits raw data from upstream sources. According to their paper, manual curation is only applied when the automated pipeline fails to resolve structural ambiguities. This workflow—dependent on PDB annotations with limited manual verification—may propagate errors provided by PDB. Examples include species misannotation and mutation status misinterpretation. We fully agree with the reviewer's observation that correcting errors in such cases necessitates labor-intensive manual curation, which is a core motivation for our study.

      Line 86: why 'Structures that consisted solely of one type of antibody were excluded'? Why exclude complexes with antigens shorter than 50 amino acids? These complexes are genuine antibody-antigen complexes.

      We thank the reviewer for the valuable question. The AACBD database is dedicated to curating structural data of antigen-antibody complexes. Structures featuring only a single antibody type are classified as free antibodies and systematically excluded from the database due to the absence of protein-bound partners. During data screening , we retained sequences shorter than 50 amino acids by categorizing them as peptides rather than eliminating them outright. The current release exclusively encompasses complexes with protein-based antigens. Meanwhile, complexes involving peptide, haptens, and nucleic acid antigens are undergoing systematic curation, with planned inclusion in future updates to broaden antigen category representation.

      Line 96 needs a capital letter at the beginning.

      Line 107: 'this would generate' → 'this generates' (given it is something that has been implemented, correct?).

      Line 124: missing an 'of'.

      Line 163: inspiring by -> inspired by.

      Thank you for feedback. All of the above grammatical or spelling errors have been revised in the manuscript.

      Line 109-111: apart from the example, it would be good to spell out the general rule applied to anti-idiotypic antibodies.

      We thank the reviewer for the valuable feedback. For anti-idiotypic antibodies complex. the partner antibody is treated as a dual-chain antigen, , necessitating individual evaluation of heavy chain and light chain interactions with the anti-idiotypic component. We have given a general rule for anti-idiotypic antibodies in section “2.2 PDB splitting” of revised manuscript.

      Line 155-159: could the authors provide references for the two choices (based on sasa and any-atom distance) that they adopted to define interacting residues?

      We thank the reviewer for the comment and the suggestion. As the same as the response to reviewer #1 in Public review. The interacting residues definition and the threshold chosen in the manuscript is summarized based on existing literature. We have added additional references for support in section “1.Introduction”. Our resource does not provide a fixed amino acid list. Instead, all interacting residues are explicitly documented alongside their corresponding ΔSASA (solvent-accessible surface area changes) and intermolecular distances, allowing researchers to flexibly select residue pairs based on customized thresholds from downloadable datasets. Furthermore, aligning with widely adopted criteria in current literature—where interactions are defined by ΔSASA >1 Ų and atomic distances <6 Å, we have recalibrated our analysis in the revised version. Specifically, we replaced the previous 5 Å distance threshold with a 6 Å cutoff to recalculate interacting residues.

      Line 176-178: could the authors re-phrase this sentence to clarify what they mean by 'change in the distribution'?

      We thank the reviewer for the suggestion. Our search was conducted with an end date of November 2023. However, Figure 3B includes an entry dated 2024. Upon reviewing this record, we identified that the discrepancy arises from the supersession of the 7SIX database entry (originally released in December 2022) by the 8TM1 version in January 2024. This version update explains the apparent chronological inconsistency. We regret any lack of clarity in our original description and have revised the corresponding section in the manuscript to explicitly clarify this change of database.

      Caption Figure 3: please spell out all the acronyms in the figure. Provide the date when the last search was performed (i.e., the date of the last update of these statistics).

      We thank the reviewer for the comment. We have systematically expanded all acronyms and included update dates for statistics in the legend of Figure 3. Corresponding changes have also been made to the statistical pages on the website.

      Finally, it would be advisable to do a general check on the use of the English language (e.g. I noted a few missing articles). In Figure 5 DrugBank contains typos.

      We sincerely appreciate the reviewer's meticulous attention to linguistic precision. We have corrected the typographical error in Figure 5 and conducted a comprehensive review of the entire manuscript to ensure accuracy and clarity.

    1. It isn't strictly necessary, but set -euxo pipefail turns on a few useful features that make bash shebang recipes behave more like normal, linewise just recipe: set -e makes bash exit if a command fails. set -u makes bash exit if a variable is undefined. set -x makes bash print each script line before it's run. set -o pipefail makes bash exit if a command in a pipeline fails. This is bash-specific, so isn't turned on in normal linewise just recipes.
    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Reviewer #1 (Public review):

      Summary:

      Gruskin and colleagues use twin data from a movie-watching fMRI paradigm to show how genetic control of cortical function intersects with the processing of naturalistic audiovisual stimuli. They use hyperalignment to dissect heritability into the components that can be explained by local differences in cortical-functional topography and those that cannot. They show that heritability is strongest at slower-evolving neural time scales and is more evident in functional connectivity estimates than in response time series.

      Strengths:

      This is a very thorough paper that tackles this question from several different angles. I very much appreciate the use of hyperalignment to factor out topographic differences, and I found the relationship between heritability and neural time scales very interesting. The writing is clear, and the results are compelling.

      Weaknesses:

      The only "weaknesses" I identified were some points where I think the methods, interpretation, or visualization could be clarified.

      (1) On page 16, the authors compare heritability in functional connectivity (FC) and response time series, and find that the heritability effect is larger in FC. In general, I agree with your diagnosis that this is in large part due to the fact that FC captures the covariance structure across parcels, whereas response time series only diverge in terms of univariate time-point-by-time-point differences. Another important factor here is that (within-subject) FC can be driven by intrinsic fluctuations that occur with idiosyncratic timing across subjects and are unrelated to the stimulus (whereas time-locked metrics like ISC and time-series differences cannot, by definition). This makes me wonder how this connectivity result would change if the authors used intersubject functional connectivity (ISFC) analysis to specifically isolate the stimulus-driven components of functional connectivity (Simony et al., 2016). This, to me, would provide a closer comparison to the ISC and response time series results, and could allow the authors to quantify how much of the heritability in FC is intrinsic versus stimulus-driven. I'm not asking that the authors actually perform this analysis, as I don't think it's critical for the message of the manuscript, but it could be an interesting future direction. As the authors discuss on page 17, I also suspect there's something fundamentally shared between response time series and connectivity as they relate to functional topography (Busch et al., 2021) that drives part of the heritability effect.

      (2) The observation that regions with intermediate ISC have the largest differences between MZ, DZ, and UR is very interesting, but it's kind of hard to see in Figure 1B. Is there any other way to plot this that might make the effect more obvious? For example, I could imagine three scatter plots where the x- and y-axes are, e.g., MZ ISC and UR ISC, and each data point is a parcel. In this kind of plot, I would expect to see the middle values lifted visibly off the diagonal/unity line toward MZ. The authors could even color the data points according to networks, like in Figure 3C. (They also might not need to scale the ISC axis all the way to r = 1, which would make the differences more visible.)

      (3) On page 9, if I understand correctly, the authors regress the vector of ISC values across parcels out of the vector of heritability values across parcels, and then plot the residual heritability values. Do they center the heritability values (or include some kind of intercept) in the process? I'm trying to understand why the heritability values go from all positive (Figure 2A) to roughly balanced between positive and negative (Figure 2B). Important question for me: How should we interpret negative values in this plot? Can the authors explain this explicitly in the text? (I also wonder if there's a more intuitive way to control for ISC. For example, instead of regressing out ISC at the parcel/map level, could they go into a single parcel and then regress the subject-level pairwise ISC values out when computing the heritability score?).

      (4) On page 4 (line 155), the authors say "we shuffled dyad labels"- is this equivalent to shuffling rows and columns of the pairwise subject-by-subject matrix combined across groups? I'm trying to make sure their approach here is consistent with recommendations by Chen et al., 2016. Is this the same kind of shuffling used for the kinship matrix mentioned in line 189?

      (5) I found panel A in Figure 4 to be a little bit misleading because their parcel-wise approach to hyperalignment won't actually resolve topographic idiosyncrasies across a large cortical distance like what's depicted in the illustration (at the scale of the parcels they are performing hyperalignment within). Maybe just move the green and purple brain areas a bit closer to each other so they could feasibly be "aligned" within a large parcel. Worth keeping in mind when writing that hyperalignment is also not actually going to yield a one-to-one mapping of functionally homologous voxels across individuals: it's effectively going to model any given voxel time series as a linear combination of time series across other voxels in the parcel.

      (6) I believe the subjects watched all different movies across the two days, however, for a moment I was wondering "are Day 1 and Day 2 repetitions of the same movies?" Given that Day 1 and Day 2 are an organizational feature of several figures, it might be worth making this very explicit in the Methods and reminding the reader in the Results section.

      References:

      Busch, E. L., Slipski, L., Feilong, M., Guntupalli, J. S., di Oleggio Castello, M. V., Huckins, J. F., Nastase, S. A., Gobbini, M. I., Wager, T. D., & Haxby, J. V. (2021). Hybrid hyperalignment: a single high-dimensional model of shared information embedded in cortical patterns of response and functional connectivity. NeuroImage, 233, 117975. https://doi.org/10.1016/j.neuroimage.2021.117975

      Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259. https://doi.org/10.1016/j.neuroimage.2016.05.023

      Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Communications, 7, 12141. https://doi.org/10.1038/ncomms12141

    2. Reviewer #2 (Public review):

      Summary:

      The authors attempt to estimate the heritability of brain activity evoked from a naturalistic fMRI paradigm. No new data were collected; the authors analyzed the publicly available and well-known data from the Human Connectome Project. The paper has 3 main pieces, as described in the Abstract:

      (1) Heritability of movie-evoked brain activity and connectivity patterns across the cortex.

      (2) Decomposition of this heritability into genetic similarity in "where" vs. "how" sensory information is processed.

      (3) Heritability of brain activity patterns, as partially explained by the heritability of neural timescales.

      Strengths:

      The authors investigate a very relevant topic that concerns how heritable patterns of brain activity among individuals subjected to the same kind of naturalistic stimulation are. Notably, the authors complement their analysis of movie-watching data with resting-state data.

      Weaknesses:

      The paper has numerous problems, most of which stem from the statistical analyses. I also note the lack of mapping between the subsections within the Methods section and the subsections within the Results section. We can only assess results after understanding and confirming the methods are valid; here, however, Methods and Results, as written, are not aligned, so we can't always be sure which results are coming from which analysis.

      (A) Intersubject correlation (ISC) (section that starts from line 143): "We used non-parametric permutation testing to quantify average differences in ISC for each parcel in the Schaefer 400 atlas for each day of data collection across three groups: MZ dyads, DZ dyads, and unrelated (UR) dyads, where all UR dyads were matched for gender and age in years." ... "some participants contributed to ISC values for multiple dyads (thus violating independence assumptions)"

      This is an indirect attempt to demonstrate heritability. And it's also incorrect since, as the authors themselves point out, some subjects contribute to more than one dyad.

      Permutation tests don't quantify "average differences", they provide a measure of evidence about whether differences observed are sufficient to reject a hypothesis of no difference.

      Matching subjects is also incorrect as it artificially alters the sample; covarying for age and sex, as done in standard analyses of heritability, would have been appropriate.

      It isn't clear why the authors went through the trouble of implementing their own non-parametric test if HCP recommends using PALM, which already contains the validated and documented methods for permutation tests developed precisely for HCP data.

      The results from this analysis, in their current form, are likely incorrect.

      (B) Functional connectivity (FC) (section that starts from line 159): Here the authors compute two 400x400 FC matrix for each subject, one for rest, one for movie-watching, then correlate the correlations within each dyad, then compared the average correlation of correlations for MZ, DZ, and UR. In addition to the same problems as the previous analysis, here it is not clear what is meant by "averaging correlations [...] within a network combination". What is a "network combination"? Further, to average correlations, they need to be r-to-z transformed first. As with the above, the results from this analysis in its current form are likely incorrect.

      (C) ISC and FC profile heritability analyses (section that starts from line 175): Here, the authors use first a valid method remarkably similar to the old Haseman-Elston approach to compute heritability, complemented by a permutation test. That is fine. But then they proceed with two novel, ill-described, and likely invalid methods to (1) "compare the heritability of movie and rest FC profiles" and (2) to "determine the sample size necessary for stable multidimensional heritability results". For (1), they permute, seemingly under the alternative, rest and movie-watching timeseries, and (2), by dropping subjects and estimating changes in the distribution.

      The (1) might be correct, but there are items that are not clearly described, so the reader cannot be sure of what was done. What are the "153 unique network combinations"? Why do the authors separate by day here, whereas the previous analyses concatenated both days? Were the correlations r-to-z transformed before averaging?

      The (2) is also not well described, and in any case, power can be computed analytically; it isn't clear why the authors needed to resort to this ad hoc approach, the validity of which is unknown. If the issue is the possibility that the multidimensional phenotypic correlation matrix is rank-deficient, it suffices that there are more independent measurements per subject than the number of subjects.

      (D) Frequency-dependent ISC heritability analysis (from line 216): Here, the authors decompose the timeseries into frequency bands, then repeat earlier analyses, thus bringing here the same earlier problems and questions of non-exchangability in the permutations given the dyads pattern, r-z transforms, and sex/age covariates.

      (E) FC strength heritability analysis (from line 236): Here, the authors use the univariate FC to compute heritability using valid and well-established methods as implemented in SOLAR. There is no "linkage" being done here (thus, the statement in line 238 is incorrect in this application. SOLAR already produces SEs, so it's unclear why the authors went out of their way to obtain jackknife estimates. If the issue is non-normality, I note that the assumption of normality is present already at the stage in which parameters themselves are estimated, not just the standard errors; for non-normal data, a rank-based inverse-normal transformation could have been used. Moreover, typically, r-to-z transformed values tend to be fairly normally distributed. So, while the heritabilities might be correct, the standard errors may not be (the authors don't demonstrate that their jackknife SE estimator is valid). The comparison of h2 between dyads raises the same questions about permutations, age/sex covariates, and r-z transforms as above.

      (F) Hyperalignment (from line 245): It isn't clear at this point in the manuscript in what way hyperalignment would help to decompose heritability in "where vs. how" (from the Abstract). That information and references are only described much later, from around line 459. The description itself provides no references, and one cannot even try to reproduce what is described here in the Methods section. Regardless, it isn't entirely clear why this analysis was done: by matching functional areas, all heritabilities are going to be reduced because there will be less variance between subjects. Perhaps studying the parameters that drive the alignment (akin to what is done in tensor-based and deformation-based morphometry) could have been more informative. Plus, the alignment process itself may introduce errors, which could also reduce heritability. This could be an alternative explanation for the reduced heritability after hyperalignment and should be discussed. An investigation of hyperaligment parameters, their heritability, and their co-heritability with the BOLD-phenotypes can inform on this.

      (G) Relationships between parcel area and heritability (from line 270): As under F), how much the results are distorted likely depends on the accuracy of the alignment, and the error variance (vs heritable variance) introduced by this.

      (H) Neural timescale analyses (from line 280): Here, a valid phenotype (NT) is assessed with statistical methods with the same limitations as those previously (exchangability of dyads, age/sex covariates, and r-z transforms). NT values are combined across space and used as covariates in "some multivariate analyses". As a reader, I really wanted to see the results related to NT, something as simple as its heritability, but these aren't clearly shown, only differences between types of dyads.

      (I) Significance testing for autocorrelated brain maps and FC matrices (from line 310): Here, the authors suddenly bring up something entirely different: reliability of heritability maps, and then never return to the topic of reliability again. As a reader, I find this confusing. In any case, analyses with BrainSMASH with well-behaved, normally distributed data are ok. Whether their data is well behaved or whether they ensured that the data would be well behaved so that BrainSMASH is valid is not described. As to why Spearman correlations are needed here, Mantel tests, or whether the 1000 "surrogate" maps are valid realizations of the data under the null, remains undemonstrated.

      (J) Global signal was removed, and the authors do not acknowledge that this could be a limitation in their analyses, nor offer a side analysis in which the global signal is preserved.

      (K) FDR is used to control the error rate, but in many cases, as it's applied to multiple sets of p-values, the amount of false discoveries is only controlled across all tests, but not within each set. The number of errors within any set remains unknown.

      (L) Generally, when studying the heritability of a trait, the trait must be defined first. Here, multiple traits are investigated, but are never rigorously defined. Worse, the trait being analyzed changes at every turn.

    3. Reviewer #3 (Public review):

      Strengths:

      It's sort of novel to study the heritability of movie-watching fMRI data. The methodology the authors used in the paper is also supportive of their findings. Figures are nicely organized and plotted. They finally found that sensory processing in the human brain is under genetic control over stable aspects of brain function (here referring to neural timescale and resting state connectivity).

      Weaknesses:

      What I am worried about most is the sample size and interpretation of heritability.

      (1) Figure 1. I assumed that the authors just calculated the ISC within each group (MZ, DZ, and UR). Of course, you can get different variations between each group. Therefore, there is heritability. Why not calculate ISC across the whole sample, then separate MZ, DZ, and UR?

      (2) Heritability scores in the paper are sort of small. If the sample size is small, please consider p-values, which will tell more about the trustworthiness of your heritability.

      (3) I don't understand the high-frequency signals in fMRI data. It's always regarded as noise, the band 1 here in particular.

      (4) The statement "we show that the heritability of brain activity patterns can be partially explained by the heritability of the neural timescale" should come from Figure 5. However, after controlling for NT, the heritability decreased max. 0.025 in temporal areas. I am not sure this change supports the statement. If the visual cortex is outlined, and combining ISC changes in the visual cortex, I think this would somehow be answered. Instead of delta h2, adding a new model h2 would be obvious to the readers.

      (5) Figures 7 and 8, when getting the difference of heritability, please also consider the standard errors of the heritability estimates. Then you can compare across networks/regions.

      (6) I think movie VS resting state is a really important result in this paper. However, there is almost no discussion. Discussing this part would be more beneficial for understanding the genetic control over the neuron arousal and excitation circuits.

    1. A meme is a piece of culture that might reproduce in an evolutionary fashion, like a hummable tune that someone hears and starts humming to themselves, perhaps changing it, and then others overhearing next. In this view, any piece of human culture can be considered a meme that is spreading (or failing to spread) according to evolutionary forces. So we can use an evolutionary perspective to consider the spread of:

      It's rather interesting to think about memes in the context of evolutionary theory, especially with the dialogue I've heard circulating the internet's "attention span" and how things don't typically last very long. This just seems to be an add on to that notion which is interesting to think about. In this way, it'll be rather curious to study the specific "traits" of a meme that makes it more likely to last- Especially age old memes like the rick roll.

    1. It's interesting how different atmospheres and the presence of certain individuals in the class can really determine how someone acts and feels in the room. I see a lot of hurt in these halls ... kids struggling, being harassed, ridiculed, teased .... It appears that the courses in school aren't really the hardest part about it. And the material taught in classes is probably the least of what is learned within these walls. But what kids learn, is it helping them or pulling them apart? School is more of a war zone-a place to survive.1

      This section hits hard—it really shows that the biggest challenges in school aren't about academics, but about emotional survival. The line “school is more of a war zone—a place to survive” really stuck with me. It’s a reminder that students often face way more than just tests and homework—they’re navigating bullying, pressure, and identity. I remember in middle school, there was a kid in our class who barely spoke, and only years later did we find out he was being bullied constantly. No one noticed because we were all too busy surviving our own stuff. This makes me think schools should focus more on emotional safety, not just grades.

    2. In the central drama of adolescence, high school is when girls and boys begin their quests to develop an adult identity. They experiment with dif-ferent roles in ways that bewilder their parents. As one mother told us: My daughter comes home from school exhausted. She doesn't say much usually, but yesterday she couldn't stop talking. First she was so happy because the cast names were up for the play and she got a part.

      This part beautifully shows how high school is more than just academics—it’s really where teens start trying out who they want to become. The phrase “experiment with different roles” totally makes sense; one day you’re shy and quiet, the next you’re suddenly in the school play or trying out for student council. It reminds me of my friend who used to be super introverted, but after she joined the drama club, she started to shine. She told me it felt like she finally found a version of herself that made sense—and that’s what high school is often about: testing things out until something clicks.

    3. Today's teenagers, both girls and boys, report that although they have many friends, they lack intimate, close friends. Teenagers say that there is no one that they can really confide in, no one with whom to share their deepest thoughts. In the midst of a crowd, they feel alone. It is a disturbing admission.

      This is definitely how a lot of students today feel, especially during and post-Covid pandemic, it has left a gap for many young people. Especially with social media too, where perceptions of who and how cool people are are perceived through a moderated lens. Also, truly finding someone to click with when everyone is figuring out who they are is hard, and even if you do find that, it can shift again post-graduation. From my own experience, even the friends I connected with and still have today from high school, I didn't truly know them or felt like they knew me until we started college. I'm not really sure why that is, but maybe deep friendship that young is just rare? Or maybe it's a deep issue.

    4. "Nothing" ranked higher than "classes I'm taking" and "teachers."

      This makes me sad that of the majority, more people would remember nothing more than a teacher they had or a positive class experience. It makes it so clear how impactful mentors and curriculum could be, but could be falling short? Or is it just that students of the adolescent age would never focus on that part. I hope that we shift so it's seen as a positive instead of something forgotten.

    1. The governing ideology of the far right in our age of escalating disasters has become a monstrous, supremacist survivalism.It is terrifying in its wickedness, yes. But it also opens up powerful possibilities for resistance. To bet against the future on this scale – to bank on your bunker – is to betray, on the most basic level, our duties to one another, to the children we love, and to every other life form with whom we share a planetary home.

      To me, it's sad that it has even gotten this far. The future itself is honestly a mystery. What will happen next? The only ideology to have is to just sort of survive and find a way to stick through it.

    1. Of course, after all of this discussion of making, it’s important to reiterate: the purpose of a prototype isn’t the making of it, but the knowledge gained from making and testing it. This means that what you make has to be closely tied to how you test it. What aspect of the prototype is most critical to the test, and therefore must be high fidelity?What details can low-fidelity, because they have less bearing on what you’re trying to learn?Who will you test it with, and are they in a position to give you meaningful feedback about the prototype’s ability to serve your stakeholders’ needs?As we will discuss in the coming chapters, these questions have their own complexity.

      I've never considered having just aspects of a design be high fidelity and some details to be low. I think these questions are great in identifying the priority of the designs. I will definitely be using these questions to discern which aspects of the prototype to be more detailed on.

    2. Building things takes a long time and is very expensive, and usually much more than anyone thinks. Don’t spend 6 months engineering something that isn’t useful.

      I really liked how Chapter 6 emphasized that prototyping is all about learning and making decisions, not just about quickly building something. I agree that it’s easy to fall into the trap of thinking you should just start coding, but prototyping first can save so much time and effort by helping you test your ideas early. The range of prototyping methods described-from sketches to Wizard of Oz-was especially useful, and it made me realize how flexible and creative the design process can be.

    3. This allows you to have someone pretend to use a real interface, but clicking and tapping on paper instead of a screen

      I think this is a very unique idea that I've never thought about doing. I can definitely see the value though. I think knowing exactly how a product is supposed to work in the first place, even if it doesn't work yet, can help a lot when designing. It's also just using pen and paper, which is a lot less expensive than producing an actual product. This is especially important when you're designing a solution that's in an earlier stage of design, as it prevents time and resources from being spent on something that may or may not work.

    4. As you can see, prototyping isn’t strictly about learning to make things, but also learning how to decide what prototype to make and what that prototype would teach you.

      This insight really resonates with me, especially thinking back to a hackathon I participated in recently. My team and I jumped into coding too quickly, thinking we had a solid solution. But halfway through, we realized we didn’t fully understand our users’ needs or the pain points we were trying to address. Looking back, a simple low-fidelity prototype, even just sketches, could have helped us clarify what problem we were solving and saved hours of backtracking. This quote reframes prototyping not just as a technical skill but as a thinking tool, one that helps guide decisions, not just execution. It reminds me that in any design process, it's just as important to ask why you're building something as it is to ask how you'll build it.

    5. Don’t spend 6 months engineering something that isn’t useful.

      This really makes me think about how easy it is to underestimate the time and cost of building a product. I can see how tempting it is to just jump in and start building. It’s a good reminder to slow down and make sure the idea is actually useful before investing a lot of effort.

    1. 12.1. Evolution and Memes

      The trio of replication, variation, and selection immediately made me think of TikTok trends like when someone copies a video (that's replication), gives it their own spin (that's variation), and then the algorithm pushes the versions that get the most likes (that's selection!!). It’s crazy how these cultural memes really do evolve just like living things. I wonder how how the fitness of a meme change when it jumps from one platform to another...

  11. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Evolution of cetaceans. November 2023. Page Version ID: 1186568602. URL: https://en.wikipedia.org/w/index.php?title=Evolution_of_cetaceans&oldid=1186568602 (visited on 2023-12-08).

      It’s hard for me to imagine that whales and dolphins used to be land animals. Reading this, I learned that they had legs and could walk around on the ground a long time ago, just like dogs or cows do now. But over millions of years, things started to change since the environment around them changed, and they needed to adapt. They also learned new ways to move, they started to swim instead of walking. I was fascinated by reading about the process of how they slowly changed to live in the ocean. They grew flippers and learned to swim instead of walking. Hence, I’m really curious about how exactly they made that big change, it also makes me wonder what other amazing evolutions happened that I don’t know!

    1. Fundamentally, no meme is an island. "A text that just spreads well, and a lot of people see it, is not a meme," says Shifman. "It's viral. But if a lot of people create their own versions then it becomes a group of texts and then it's a meme."

      Limor Shifman

    1. I have never seen any form of create generative model output (be that image, text, audio, or video) which I would rather see than the original prompt. The resulting output has less substance than the prompt and lacks any human vision in its creation. The whole point of making creative work is to share one’s own experience - if there’s no experience to share, why bother? If it’s not worth writing, it’s not worth reading.

      I haven't been using LLMs much—my ChatGPT history is scant and filled with the evidence of months going by between the experiments I perform—but I recently had an excellent experience with ChatGPT helping to work out a response that I was incredibly pleased with.

      It went like this: one commenter, let's label them Person A posted some opinion, and then I, Person B, followed up with a concurrence. Person C then appeared and either totally misunderstood the entire premise and logical throughline in a way that left me at a loss for how to respond, or was intentionally subverting the fundamental, basic (though not unwritten) rules of conversation. This sort of thing happens every so often, and it breaks my brain, so I sought help from ChatGPT.

      First I prompted ChatGPT to tell me what was wrong with Person C's comment. Without my saying so, it discerned exactly what the issue was, and described it correctly: "[…] the issue lies in missing the point of the second commenter's critique […]". I was impressed but still felt like pushing for more detail; the specific issue I was looking for ChatGPT to help explain was how Person C was breaking the conversation (and my brain) by violating Grice's maxims.

      It didn't end up getting there on its own within the first few volleys, even with me pressing for more, so I eventually just outright said that "The third comment violates Grice's maxims". ChatGPT responded with the level of sycophantism (and emoji-punctuated bulleted statements) a little too high, but also went on to prove itself capable of being a useful tool in assisting with crafting a response about the matter at hand that I would not have been able to produce on my own and that came across as a lot more substantive than the prompts alone.

    1. Probably not, given Google’s long history of interpreting multilingual as serial monolingual (see this 2007 presentation at Google by Stephanie Booth pointing this same stuff out), ignoring that multilingual people tend to change languages throughout their activities even for just a single word or short phrase. (I don’t have Dutch, English or German days or topics, in the case of books I may want to find the German original of an English translation, or want to search for a specific thing in French because I know it exists, while also interested in any Dutch translation that might be available or the Italian original. My notes are always in multiple languages.)

      Multilingualism is never serial monolingual. I use my languages all mixed up, it's a tapestry in which different languages can provide different nuances, associations, emotions. A tapestry. As I wrote here Google has never understood that throughout its history.

    1. the second m oon o f pure Chaos cameinto being. It is not subject to the universal laws held insuch high regard by the Slann, for it orbits according to nofathomable pattern —a source o f unending consternation tothe Slann and Skink Priests w ho still look to the stars to readthe future. T h e Slann M age-P riests o f T laxdan, and theirSkink Priest attendants, have long pondered the impossibleconundrum that is the Chaos M oon. T hey have dedicatedm uch energy towards pushing it out o f the w orld’s orbit,directing meteorites to strike it and hundreds o f other ploys,yet still the fell m oon plagues them .

      That is actually hilarious. I'm imagining this Chaos moon also takes on the guise of the regular moon just to fuck with those astronomers even more. Maybe it has a leering gobbo face on one side that it constantly hides away until it's too late. Majora's Mask style.

    Annotators

    1. Elizabeth R. Gordon Interviewed by Lilia Bierman TranscriptElizabeth R. Gordon Interviewed by Lilia Bierman00:00:00:00 - 00:00:37:24LILIA: Okay. I'm recording. ERG: Okay. As I'm scratching my head. Please edit that out. (Laughs)LILIA: (laughs) I will. Okay, our topic is on the transition from VCR, VHS, and DVD rentals to online streaming. The first question is, how old were you when VCR, VHS, and DVD became a thing, and later, when digital became a big thing? 00:00:37:24 - 00:01:05:21ERGSo, VCR, I was 14. Okay. DVD, I think, is probably like college. So maybe 21, 22. So that would have been like in 1993, but they still weren't affordable. Yeah. And then streaming. We probably didn't start streaming anything till about five years ago. I was in my late forties. 00:01:05:21 - 00:01:31:15LILIAOkay. What was your experience adapting to the transition to digital away from VHS, DVD, and VCR? And what did you think about these social changes?00:01:32:15 - 00:01:58:12ERGLike, when you have DVDs, when they get scratched, you would have to deal with that. And that was problematic. A lot of my videos are still on videotape. So my wedding is on tape. Oh my son, all his first moments are also on videotape.So I've got to get those transitioned—and then streaming and digital stuff. I mean like I said, because I came in the generation where we did not have personal computers in college. Everything has had to be self-taught. Luckily, my husband is very good about this, and he helps me out. But now I feel very confident in streaming and doing things like that and having apps on my phone—stuff like that.00:01:58:28 - 00:02:19:10Unknown(LILIA) Okay. (ERG) And then what was the second part of that. (Lilia) And what did you think about these social changes. (ERG) What do you mean by that. (LILIA)I mean it's just like how it it kind of ties into the next question, how it kind of changed your everyday lifestyle, if at all. If you noticed any changes, was it more difficult to adapt to.00:02:19:12 - 00:02:36:24ERGI mean, you made it easier because you didn't have to carry all this technology around. You have this I can stream Netflix on my phone now. And you don't have to keep up with X, Y and Z. It, I thought it made it very, it made it much easier and I definitely would not want to go backwards.00:02:38:18 - 00:03:09:11ERGBut I like my parents who are in their 80s. There's no way that they, they like the idea of probably have a Netflix or Amazon Prime, but there's no way that my dad could handle that. Yeah. He has a smartphone that, you know, it's, tech support. Yeah. Smartphone. LILIA Yep. I get it. Were there any challenges that you or others that you know, faced while adapting to these new technologies, whether it was learning it or just kind of want to throw your computer at the wall?00:03:09:16 - 00:03:30:01ERGYou know, because we didn't have any computer classes in high school. Yeah. I think they had one section. But the computers that we had or what we did, especially when I was in college, like I wanted C plus programing, I had never it was never taught like word processing Microsoft Word I learned how to type on a typewriter.00:03:30:22 - 00:03:51:21ERGSo again everything was self-taught. It was very hard to begin with and made me kind of nervous. I know a lot of people, think that they can mess something up and can't get it back, and, and there was a lot of anxiety, with that transition. But I feel, you know, again, like, I don't know everything.00:03:51:23 - 00:04:11:10ERGAnd I have children that can help me out, but, you know, I've had to learn a lot. My generation has had to learn a lot. Yeah. And most of us have adapted well, I think. Yes. I'm in Gen X, so that's 1965 to about 1980. And and we've learned a lot and adapted. You know. Yeah. The generation before us.00:04:11:12 - 00:04:38:29ERGNo they're not going to do that. No they're not. In retrospect what were the pros and cons of these shifts in technology. You can get more data on things. So I remember when I was writing my thesis in graduate school, and I was still we we didn't have a lot of memory on computers and had to save it on disks, and it took like 6 or 7 deaths and it would be awful.00:04:38:29 - 00:05:01:07ERGAnd then I'd have to get another. So that was extremely frustrating. You know, being able to have things that are quicker and easier to access and knowing that I've got more space and understanding what a megabyte is, what a gigabyte is, and the storage, that is a lot, lot more helpful. But again, I, I, I've enjoyed the technology push.00:05:01:07 - 00:05:26:12ERGThe one thing I don't like about it is that, I'm glad that I raised my children before this. Because I think that kids that are now being raised, a lot of them, you know, this is, this is shoved in their direction in order to occupy them and they're missing out on reading books. They're missing out on dealing with time that you just have to entertain yourself.00:05:26:12 - 00:05:42:26ERGLike going to the doctor's office. We always read books, or we always did stories, or we always just talked about our day. And now I see, you know, like a two year old or one year old, the doctor's office and the parent says this. Yep, yep. And that is just. And then again, you know, my students, I say it's constant.00:05:42:28 - 00:06:08:01ERGYeah. They can't cut it all. No. Like you got to be professional and put it aside and make eye contact. So it's all like that. Yeah. No, I totally agree. Looking back, what are the biggest lasting impacts of this shift? I just like the fact that you have more information that's accessible. You do have to decipher what is true and what's not true.00:06:08:02 - 00:06:29:26ERG Yeah, but, you know, if I have a question, instead of having to go to a library and find the book or and I would have I mean, I've taken graduate classes since the shift and my papers, I can find so much more information to write about. Because it's more accessible than half in a way on interlibrary loan or going over there and looking something up.00:06:29:28 - 00:06:54:27ERGSo I do like that quick access to information. I do like the portability of it. And I think that has really changed. And then I mean, things like exposure, like medical records. And when I make a doctor's appointment, the reminder will shift through my cell phone, or I'll shift through the app and then I can find out my test, my blood test for that rather quickly, and have to rely on somebody to call me and tell.00:06:55:00 - 00:07:04:29LILIAYeah, I totally agree. So I love all that. Yeah, it is very helpful. How would you describe this shift in one word?00:07:05:24 - 00:07:10:15ERGOne word?00:07:11:18 - 00:07:35:04ERGI think it's exciting. Yeah, I think it really is. I mean, again, I've embraced it because I've been forced to embrace it as an educator. As a parent. So I've everything about I've like except for again that this is just steering people away from having relationships. Yeah. And learning how to deal with, you know just empty time.00:07:35:04 - 00:07:56:10ERGYou've, you've got to, I think, a lot of parents are missing out on that. They definitely are. LILIAYeah, I totally agree. Do you miss VCR, VHS or DVD? And if so, what aspects specifically do you miss?00:07:56:13 - 00:08:19:09ERGCan't miss it if it's never gone. And I still have all my children's Pixar stuff. We lived on it. They had portable DVD players that would hook into the car. Yeah. We had 13-hour (car) rides to go with it. LILIAI mean, you can't argue about that.00:08:19:15 - 00:08:40:27ERGNo, you cannot, but no, I don't miss this at all. You know, I need to get the one thing that I'm really concerned about, which is that I need to get all my son's videos transferred over, and I'm about to send them to somebody. Yeah. And then my wedding video. I need to get that transferred into something. So, no, I don't miss it.00:08:40:29 - 00:09:01:17ERGNo, I still have a bunch, and I still have a DVD player. We got rid of the VCR a couple of years ago. Oh, maybe we haven't. So I can't watch my wedding videos anymore. But now I don't miss this at all. Okay, well that's fair. I don't blame you, since it does, and there's nothing in your computer, so, like.00:09:01:23 - 00:09:37:29ERGNo, I can't know. And there used to be some laptops where you could plug in CD's. Yeah, I remember that. And then like, you know, in the cars when I was 16, you had just, you had a radio and then you had a tape. And then like if you're real fancy, you had a plug in DVD and you plug in a CD player, but like when you went over a bob it was and then came you know they installed and I think my car right now it's like a 2016 I think it has a cassette and a DVD player.00:09:38:12 - 00:09:54:03ERGMay not have the cassette probably then, but yeah, it's just and then all that trying to figure out your song that you want, I mean it's just so much easier. Yeah. Just to plug something in or auto-connect it. It's fantastic. LILIAYeah. Okay. Well, that was all of my questions.Steven Hawk Interviewed by Colby Hawk TranscriptDr. Steven Hawk Interviewed by Colby Hawk00:00:00:00 - 00:00:28:08 Steven: Okay. Go ahead. You can introduce yourself. Yes. My name is Doctor Steven Hawk and I am a licensed K through 12 English teacher. And I've been teaching in the public schools for eight years now. Colby:  Cool. So, about how old were you? When, you know, you grew up with the, you know, VHS, VCR and everything, what was it like with that being a big thing back in the day?  00:00:28:08 - 00:00:48:04 Colby: What was your experiences with everyday life and having it having this technology?  Steven: Yeah. From, from the age where I was able to really watch movies, I was watching VHS tapes. So, I had a very small collection of VHS tapes and pretty much just rewatched the same 2 or 3 movies again and again and again and again.  00:00:48:04 - 00:01:06:24 Steven: As my mom would tell you, she would say, I wore out Land Before Time on VHS and Home Alone. Those are my two movies that I pretty much would play ‘em rewind ‘em, play ‘em, rewind ‘em. So as a child, that was my experience was just VHS tapes. You could go to a blockbuster and rent a VHS tape at that point.  00:01:06:26 - 00:01:29:22 Steven: But you owned very few and you were able to rent very few. If you were able to rent, it was usually like once a week. So, you didn't watch a lot of movies. And when you did, hopefully it was something you really liked, and you just watched it again and again and again.  Colby: Cool. Yeah. And having the technology and everything and, you know, the, you know, VHS mainly for you.  00:01:29:24 - 00:01:53:16 Colby: what was it like transitioning, to this digital, you know, internet age when you have iPhones in your pocket, MacBooks and streaming and all of that?  Steven: Yeah. So, the, the, the chain for me, was we went from VHS to DVD probably when I was about 13 years old, around 13. We, we had DVDs and that was a big deal.  00:01:53:19 - 00:02:15:11 Steven: And then DVDs evolved into Blu rays. So, the quality of the DVD DVDs got better. I remember it was my sophomore year of high school when MP3's became a thing. So no longer do we have to carry Walkmans to listen to music, but which is like a DVD, right? we transitioned to MP3's, and so the digital age kind of came upon us.  00:02:15:15 - 00:02:42:09 Steven: It wasn't until I was probably 22 that I had my first iPhone. So growing up, you know, we didn't have internet for the most part of my life. We didn't have any kind of apps or streaming until I was in my probably early 20s. And so that was a huge change because of the amount of things that you could be, I guess, exposed to through streaming.  00:02:42:12 - 00:03:07:12 Steven: It went from having to have a physical copy of a movie or a disc for music to being able to just choose from a vast digital library of different genres and different artists, to then seek out things which isn't something you were able to do. No more than just going to blockbuster and looking through the shelves, could you really seek out different genres and different types of things.  00:03:07:12 - 00:03:29:03 Steven: So, it in a lot of ways it was very freeing because it introduced you to a lot of new things, and you were able to discover a lot of new, tastes, genres, artists, things like that. So, yeah, I would say I was probably about 22 when streaming really caught on in the United States.  00:03:29:05 - 00:03:49:05 Colby: Now, if when you were 22, when you were 22, you would have just gotten out of college. So when you were still at UTK, what was that like, you know, going, you know, if you wanted to go watch something with your friends or, you know, catch up on the newest whatever, what what was that experience like before you had access to all this?  00:03:49:06 - 00:04:11:11 Steven: Yeah. So it was still DVDs were still the thing. You know, when I was in college, we hadn't moved to streaming quite yet. We had the internet age where you were streaming games online with friends and multiplayer and stuff like that. But not really movies. Movies and TV were not mainstream stream. They were not streamed to the mainstream yet.  00:04:11:14 - 00:04:33:23 Steven: And so for me, it was still going to the movies, you know, my friends and I, we would go to the movie theater if there was a movie coming out. You knew the release date and you would you would set a date and a time to go see the movie with your friends physically at a theater. So it wasn't like we stayed in our dorms or apartments and were able to stream the newest movie or TV show.  00:04:33:25 - 00:05:03:12 Steven: So, for me, that was it was still kind of what you would consider an old school experience. I know I've told you Facebook came out in 2005 when I first went to college. And, you know, so social media and the evolution of all streaming from internet, computer platforms, to digital media, for movies, and games, and music, that all really, you know, came mainstream after my college experience. Not during.  00:05:03:15 - 00:05:25:03 Colby: Now, the one big thing I think, and most everybody knows about right is blockbuster.  Steven  Yeah.  Colby  So, can you tell me a little bit more about your experiences with blockbuster? You know, was there like a membership program? Was there like certain deals that they had? What was it like going into one of these stores and renting and picking out your favorite flicks?  00:05:25:05 - 00:05:51:07 Steven: Yeah. If there was a membership program, I'm not aware. As a small child, I don't remember if there was a membership program. But what I do remember, and I tell people often, it was always like Christmas morning for me. I loved blockbuster. I think everyone kind of had the same experience where it was 1 or 2 times a week that you might be fortunate enough to go to a blockbuster and get to rent a new movie that you had never seen.  00:05:51:10 - 00:06:09:23 Steven: It was usually a Friday night, and you've been going to school all week and you're just looking forward to Friday night, because that's the one time your parents get to take you to blockbuster and you walk in the store, and it was like toys R us. You have all these movies, and it was just the covers of the movies with a DVD behind it.  00:06:09:25 - 00:06:32:09 Steven: And if you wanted to watch that movie, you had to take the cover out of the way and see if the DVD was still left. And if there was no DVD, then someone had already rented that movie. And if there were enough left, then you got to take one home. But very often they'd already been rented, and so some, some nights you would go for a certain movie, a new release, and it wasn't there.  00:06:32:14 - 00:06:50:03 Steven: And you'd be a little bummed, but you would just go pick out another movie and you would be excited because you didn't get to watch movies, but maybe once or twice a week. like, at all. You didn't get to watch any more than 1 or 2 movies a week. And so, it was a big deal to watch a movie back then, and it was a lot of fun.  00:06:50:04 - 00:07:15:08 Steven: It was something you really look forward to for Monday. You look forward to getting to Friday and Saturday so you could watch a movie and, and so yeah. It was really special back then. And, it had its. Looking back, you could say it had its difficulties. Like I said, you know, the movie may not be there for you to rent, but we dealt with that disappointment really well, I think, and just say, hey, maybe it'll be back by tomorrow.  00:07:15:08 - 00:07:36:02 Steven: Maybe we could rent it on Saturday night. If not, maybe next week. That'll be the movie. So, you know, we didn't get mad about it. It was part of the deal when you went to blockbuster. So I feel like, you know, movies were so much more special back then because they were so much more rare, and they're not rare anymore.  00:07:36:05 - 00:07:56:08 Steven: And so, you know, I miss I miss blockbuster, I miss the excitement of going into the store and the excitement of seeing if the DVD is still there and the excitement of taking it home and watching it. In the VHSs, you had to be kind and rewind is what you had to do. You know, you rewound the tape for the next person to use it.  00:07:56:15 - 00:08:14:18 Steven: When DVDs came along, it was special because you no longer had to rewind the movie. You could just return the disc. So that was a big deal for us. And then of course, as it moved to streaming, you could watch whatever you wanted whenever, you know, whatever day of the week. You didn't have to worry about rewinding or anything.  00:08:14:18 - 00:08:37:21 Steven: So, it was definitely an evolution. But, for me, blockbuster was really special. And not just blockbuster, but, you know, even Redbox later and, you know, any form of renting a movie during the week was really special.  Colby: Yeah. And, you're talking about how, you know, now it's not as you know, it's not special. You know, it's not, you know, you have easy access to everything.  00:08:37:21 - 00:09:10:19 Colby: And, kind of on that note, like looking back at your experiences having, you know, dealt with DVDs, VHS, all this stuff, and then having Disney+ and Netflix, and, whatever, Hulu, whatever. You know, how has that changed, like your lifestyle or, you know, just society today and, and like what what would you say or like in some of the pros and cons with having this easy access through, you know, the internet or whatever, you know.  00:09:10:24 - 00:09:35:04 Steven: Yeah. Definitely, it's a double edged sword. To kind of go back to say, Netflix started as a DVD subscription process, and then that turned into a digital streaming process. I didn't jump into that process, probably for a couple of years into when Netflix became a digital subscription service. Netflix was the first one that I subscribed to.  00:09:35:06 - 00:09:54:08 Steven: It was fairly cheap, and I thought, hey, this seems pretty neat, and I gave it a try. And that was my first foray into the digital streaming world. And I enjoyed it. You know, my first experience was, or my first thought was this, this is nice. This is a lot better than having to, you know, get out of my house and drive to a store and it may or may not be there.  00:09:54:08 - 00:10:20:06 Steven: And so, there were some pros there. There were some benefits to that process. But I think as time went on, and this is a year's process, right? As more and more things started to become, digital based, streaming based platforms, news, TV, movies, eventually, taking you out of the theater, even, and just leaving you in your living room.  00:10:20:08 - 00:10:50:07 Steven: Then the layers with Covid. You know, people not getting out of their house. They marketed streaming really heavily during the Covid years, and the years to follow Covid, as something to keep you safe. So it was a marketing ploy to really get you to binge watch and stream. So like I said, it became over time, I believe more of a negative thing had a negative impact on my life because it's so addictive.   00:10:50:09 - 00:11:27:02 Steven: Right? That word binge is probably not a positively connotated word in any other setting. If you binge on food per se, that would not be good. But to binge on Netflix has been marketed as a culturally positive thing. It's something that's good to do. And while it may seem good and may seem fun, and you may find a show or, you know, a series of shows that have five, seasons, and you can watch all of them in a matter of two weeks, I’m not sure that that’s healthy.   00:11:27:10 - 00:11:53:13 Steven: And, in my own life, personally, I think, I think it has had a negative impact to be totally honest. It’s much easier after a hard day of work to go to my bedroom and shut the door away from my kids and silence the house and just consume right? To not give anymore, but to just consume, to binge.   00:11:53:15 - 00:12:16:00 Steven: And that's not good. And I know that that's not good. And so, I feel like now I'm having to self-police. I'm having to say this much is okay, but this much is dangerous. This is not good, not healthy. And so, there's it's a fine line. I'm not exactly sure where the line is now because it's all an evolving process.    00:12:16:02 - 00:12:54:07 Steven: But for me personally, I know it's taking time from my kids, taking time from me reading books and things that I used to do more of, perhaps taking time away from, you know, talking to my wife and communicating. Giving myself a pass when things have been difficult to just sit there and binge and to stream. So, while there have been good things, I think you are, you're probably, kind of like the genres of music. You’re able to discover more through streaming, things that you didn't know existed or things that you didn't know perhaps you were interested in.  00:12:54:10 - 00:13:20:01 Steven: But the negative effect, I think, perhaps outweighs the positive. And that's just my experience. I know some people would disagree.  Colby: Yeah, there's a lot of differing opinions on, streaming and everything. And I think, I mean, I don't even have time to binge these days anymore, which is probably a good thing.  Steven: Yeah, I think so.  Colby: So we talked, you know, you touched on, like, the society and the shift and changes.  00:13:20:01 - 00:13:51:08 Colby: That was very good. With online and all that. Were there any, I guess, you kind of talked about this maybe a little bit, but like any challenges that you or any others that you observed or faced with this challenge of going away from, you know, more analog, whatever, to digital?   Steven: Yeah. I mean, nothing, nothing dramatic or drastic, but I think the first challenge was, of course, going from DVD to streaming because we were in an in-between stage there for a while.  00:13:51:13 - 00:14:07:23 Steven: You had streaming apps out there, and you had Netflix and things that you could, you know, sign up for and partake of, but it's like you kind of had a toe in that world, but you were still stuck to DVDs and you rented from, you know, once blockbuster went out, it was Redbox or, you know, stuff like that.  00:14:07:23 - 00:14:30:20 Steven: And then when I went full into streaming, then, I guess the challenge is, you know, part of its financial, to be totally honest. You’re, you're paying for things regularly that you didn't used to pay for, you know. Monthly, you're paying at a minimum, People are probably paying for one streaming app. Lots of people are paying for five or more streaming apps.  00:14:30:22 - 00:14:57:01 Steven: So what used to be free through cable is now charged through apps. So that's been a struggle. Just a financial struggle is like, where's the line between what's an appropriate amount to spend on this form of entertainment and what's not? What’s healthy, what's not? I know this was not for me, but for for some elderly people, there was a huge problem trying to transition to the digital streaming apps.  00:14:57:01 - 00:15:19:13 Steven: And, you know, they they had their TVs that they liked, but they weren't smart TVs. So, you know, they had to figure that they needed a new TV and how to work a new remote and how to download apps and work apps. And that wasn't a problem for me. But I did deal and try to help a lot of elderly people through that transition process to understand how to stream content.  00:15:19:16 - 00:15:40:17 Steven: But for me, you know, like I said, it was just kind of a. It was a learning phase then followed by a self-policing phase of what's. What do I need and what do I not need? Because everyone who develops a streaming app tells you that you need it. And it's kind of hard to select the right service, you know? Do you go with Hulu?  00:15:40:17 - 00:15:59:22 Steven: Do you go with, you know, Comcast? Which one do you go with? There are just so many to choose from that I had to do my research before I landed on the one that I would pay for. Yeah.  Colby: So I think we've already talked about, like, looking back, what were the big impacts on that.  00:15:59:22 - 00:16:29:29 Colby: I think we already touched that. Steven:  Yeah.  Colby:  How would you describe that shift in one word? Or that shift or like actually three things. How do you describe the shift?  The time before the like the VHS DVDs, all that. And then the time now after this shift. Like three, I know upped it but three.  Steven: Yeah. I would say for the time past, nostalgic. Nostalgic is my word because I miss it.  00:16:30:01 - 00:16:51:15 Steven: It's it's something you didn't know that you would miss when it when when it went away. there was sadness when blockbuster went out of business, but there was also an acceptance that this is just the new way of things. And sometimes the more we get into the new way, the more I wish it could become the old way.  00:16:51:18 - 00:17:19:01 Steven: So nostalgic would be that one. For the transition, I would say exciting would be the word I would use for that. I can remember being the only, high schooler, on the way to a baseball team with a new iPod that streamed. Or not streamed but you know, had the MP3 downloaded music that I could just select from a playlist, while all my friends had a Walkman disc that would skip if, you know, they didn't hold it right.  00:17:19:01 - 00:17:47:03 Steven: And so for me, it was exciting. It was a new frontier. It was a new challenge to learn the technology of it. What was for for the, what was the last question for now? I would say the word is dangerous. For the reasons I've stated already, you know, the, mainly the social reasons. What is marketed to us is that we, again, should binge these things.  00:17:47:09 - 00:18:15:27 Steven: We need these things. We can't live without these things. There's a lot of clever marketing that goes into it, and a lot of people that are persuaded by that marketing, including me to some extent. Right. Because I stream. I do watch shows and a lot of it, a lot more than I used to. What used to be one movie a week has turned into ten movies a week. And  20 episodes a week. And that's dangerous.  00:18:15:28 - 00:18:38:02 Steven: It’s dangerous because it's taking me from things that are more important. And it's giving me a pass when I'm tired to say I don't have to struggle with difficult things. I can just. I deserve this. To just sit quietly in my room, away from my children, away from my wife, away from whomever, and reward myself. I think that's a dangerous notion.  00:18:38:04 - 00:18:50:15 Steven: So dangerous, I think, would be the word. Colby: Cool. Yeah. And then. Yeah my battery’s giving me the warning. I think I've got 1 or 2. One more question.  00:18:50:15 - 00:19:10:24 Colby: Okay, so that two part thing, I guess if you could give me one more comment, like do you miss it? You know, do you miss the VHS? You know, rewinding and you know, having, you know, all that the blockbuster and what do you. What, if anything, would you change today? And then what were your favorite, you know, tapes? Or your.  00:19:10:28 - 00:19:34:01 Steven: Yeah. Yeah. Yeah. So I mentioned earlier, my two favorites when I was young was Land Before Time. The original Land Before time. The first one. Petrie, Longneck, and all the, Sharptooth. That was, I've watched that on repeat, I think. And, and then later when I was a little older, it was, Home Alone, the original Home Alone with Macaulay Culkin. And I just thought that was hilarious.  00:19:34:04 - 00:19:53:05 Steven: It’s kind of slapstick humor, you know? And so those are the two that were my favorite. As far as, you know, do I miss it? Absolutely. I miss the way things were, because I think I missed the way I was, and my family was, and other people were. That's what I missed. It's not that I miss blockbuster itself.  00:19:53:07 - 00:20:21:08 Steven: I miss the type of world that we lived in when we still had a blockbuster. When movies were still special. I didn't say earlier, but you know, as a, as a ninth-grade high school teacher, when we, when I was young and we had a special movie day that was like the best day ever. And so, as a teacher, I thought, hey, when they've really worked hard, I'm going to give them a special movie day occasionally, because I love that when I was young. And I tried that.  00:20:21:11 - 00:20:45:07 Steven: And I've learned that you can't get these kids to focus on a movie anymore. They're so desensitized. They're so overstimulated. They won't even watch a movie anymore. They don't care about movies anymore. I miss how much people cared about movies. So, yeah, I miss it. It's not that I miss VHS again. It's just I miss the way people were.  00:20:45:10 - 00:21:03:00 Steven: And I don't think we can ever get that back. I think we're too far away from that. I don't think we get back to that. So as far as the second part, you know, what could, I what would I change if I could change something? What would I want to change I don't think I have the power to change.  00:21:03:02 - 00:21:23:03 Steven: I want families to sit together on a couch on a Friday night, like I did with a couple pizzas and a show and watch it together, and laugh together, and have time together like family should. That's what I want to happen. but I can't make that happen for other people. I can try to make it happen in my home.  00:21:23:05 - 00:21:47:25 Steven: And, and I've been trying to do that more, you know? I've been consciously trying to do that more in my own home. But I can't do it for other peoples. And so, what I'm seeing in our culture is a shift away from, from loving one another, from spending time, quality time together, and for giving ourselves, as parents, a pass for spending time with our kids.  00:21:47:25 - 00:22:08:07 Steven: And sometimes, even for parenting our kids. Because it's easier just to put them in front of an iPad or a TV screen and just let them watch a movie than it is to discipline, or to ask them how their day was, or to troubleshoot things in their lives, or to help them with their math homework.  00:22:08:09 - 00:22:28:24 Steven: It’s easier just to let them stream something. So I don't know how we fix that, Colby. That's that's something that I've thought about a lot lately. How do we, as a society, as a culture, get back to at least some part of what we used to be when blockbuster still existed? I don't know, I don't know the answer to that.  00:22:28:24 - 00:22:52:17 Steven: I think it's a. It’s a question that people have to challenge themselves with personally. They have to know who they are, what they've become, what they want to be, and then find a way to, to find that middle ground between what's enough streaming and what's too much streaming for themselves as parents, as adults, and also for their children.  00:22:52:19 - 00:23:00:15 Steven: And I just don't have a good answer to that, even though I wish I could. Colby:  Sweet. That was a very good answer.Paul Navis  Interviewed by Cole Kennedy Transcript

      Good job running the interviews as conversations rather than spitting the questions out, without any follow up questions! I also appreciate that the transcripts were cleaned up and made easier to navigate.

    1. If run inside an activated virtual environment, pelican-quickstart will look for an associated project path inside $VIRTUAL_ENV/.project.

      I thought this was some file that'd tell Pelican how to structure the project during quick-start, but no -- it seems like it's just a file to tell Pelican where to put the project's files, when I invoke the pelican-quickstart command in some other directory. Since I'm using venv, this isn't that interesting.

    1. And is it true that you enjoy taking typewriters apart and putting them back together? Doherty : Of course. It’s just that I’m so attracted to these objects that I want to look inside them, to understand how they work. It’s a form of learning, for me it would be a dream to have a shop specializing in repairing typewriters. Maybe one day… Among the many, I love the Valentine model by Olivetti, sooner or later I absolutely want to visit Ivrea.

      The Olivetti Valentine is one of Pete Doherty's favorites.

    1. Andrej’s original tweet in full (with my emphasis added): There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding—I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

      The intent was never serious software dev. Instead, quick weekend projects.

  12. Apr 2025
    1. When designers and programmers don’t think to take into account different groups of people, then they might make designs that don’t work for everyone.

      I think that this is incredibly important to think about and to highlight. I think the expectation in our current society is to create things that work for the majority of people and then later down the line develop something that works for minority groups. Most features aren't accessible right away due to constraints and also just the fact that minority groups are such a small percentage of users that it's not feasible due to time and budget to develop something that works for everyone. I think this can be tied back to some of the ethical frameworks and it makes you wonder if it's more beneficial to put out a product that can help people in the moment but excludes a group, or to prolong the wait for that product until it can include everyone. I wonder what other people value and what they think is the best way to go about situations like that.

  13. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Ash. Autism is NOT A Disability. July 2022. URL: https://www.autism360.com/autism-is-not-a-disability/ (visited on 2023-12-07).

      I read the article “Autism is NOT a Disability” and thought it gave a really different view. The author talks about autism as a different way of thinking, not something that needs to be fixed. That stood out to me because it shows how our idea of “accessible” is shaped by what we think is normal. It connects back to the chapter because it shows how progress isn’t just about making new tools — it’s also about changing how we see people in the first place.

    2. Social model of disability. November 2023. Page Version ID: 1184222120. URL: https://en.wikipedia.org/w/index.php?title=Social_model_of_disability&oldid=1184222120#Social_construction_of_disability (visited on 2023-12-07).

      The article is fascinating. It not only introduces a much fairer way of looking at disability, but also shows that disability isn't just about a person's body, it's more about how society fails to adapt. For example, a building without ramps would naturally make it hard for wheelchair users, not because of their impairment. This model makes me realize we need to change our surroundings and attitudes, instead of expecting disabled people to fit into an unwelcoming world.

  14. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. disability is an ability that a person doesn’t have, but that their society expects them to have.[1] For example: If a building only has staircases to get up to the second floor (it was built assuming everyone could walk up stairs), then someone who cannot get up stairs has a disability in that situation. If a physical picture book was made with the assumption that people would be able to see the pictures, then someone who cannot see has a disability in that situation.

      One thing that was most salient to me in this chapter is that disability is not just about the body or mind of a person, but what society expects individuals to be able to do. That really changed my mind. For instance, the effectiveness of the stair example opened my eyes to how likely we are to design space with only one type of body in mind. Having never had to give a second thought to taking the stairs or looking at a picture book, I realize now how much privilege I’ve taken for granted. Along with that, it also causes me to think about all of the things in daily life that could be so easily altered to be more accessible—such as putting in ramps, captions, or auditory description. It’s not the body of the person that is the issue—it’s the system that doesn’t account for them. This was similar to the social model of disability we read about in class earlier in the semester that also points to how structures and environments can cause disability.

    2. Disabilities can be accepted as socially normal, like is sometimes the case for wearing glasses or contacts, or it can be stigmatized [j5] as socially unacceptable, inconvenient, or blamed on the disabled person.

      It's really disappointing that disabilities often face unfair conditions that are not caused by their fault. But when we see examples like glasses, where a form of disability is widely accepted. Hence, I believe that we need to work towards a world where every disability is regarded as a normal aspect of human variety, just like needing glasses. The different view attached to disabilities creates so much harm to them. We should focus on learning and teaching others to avoid these harmful ideas and build a community that welcomes and values everyone equally.

    1. This month, without socials, I’ve actually been able to devote time to thinking about the things I care about with some depth. I watched The Girl in the River, a documentary about an attempted honor killing in Pakistan, and it rocked me deeply. Without a timeline of competing horrors to scroll through in the background, I was able to fully devote my attention to the feelings the film brought up within me. I had capacity to do further research and free time to sit alone in my apartment, just thinking and feeling through the inhumanity of global femicide. I had space inside myself to really reflect on my own immense privilege, and it felt differently from when I would feel that pang while scrolling on socials — a pang that was immediately numbed by the next overload of information.

      This is an interesting point, but also, note that it's hard to see how the world benefited from her "really [reflecting] on [her] own immense privilege"

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. Create a short list of main comparison criteria before you start. You can always add more criteria if it makes sense. This will keep your research guided.Remember to add the product you’re designing to the analysis to see how your product compares to the competition.Know when to stop. Start with 3–5 main competitors. Once you uncover the information you need in order to inform your design decisions, it’s time to stop.Don’t simply copy the designs you find in your research. The competitors may not be using best practices. Instead, be inspired by the solutions found in your research and adapt the solutions to fit your brand, product, and users.Be tool agnostic. Choose the tool that helps you present your findings based on the information you are documenting and sharing.Know when to perform a “comparative analysis.” Study solutions from products that are not direct competitors.

      Last summer, I spent half my internship just on competitive analysis and SWOT analysis of our organization for the business plan development. However, it wasn't design-centered . It was the most fun I've had researching I am excited to do this for the second group project assignment, and have set a bookmark to this to re-vist for the assignment.

    2. Don’t simply copy the designs you find in your research. The competitors may not be using best practices. Instead, be inspired by the solutions found in your research and adapt the solutions to fit your brand, product, and users

      I couldn’t agree more. It’s easy to fall into the trap of thinking that because a competitor uses a certain design, it must be the best option — but that’s not always the case. I find this advice very useful because it encourages critical thinking and creativity rather than imitation. It reminds me that competitive analysis should be about understanding the market while innovating based on the specific needs of users, rather than just copying what already exists. This has changed my perspective a bit because I used to think of competitor research mainly as a blueprint to follow, but now I think of it more as a source of inspiration to build better products.

    3. Don’t simply copy the designs you find in your research. The competitors may not be using best practices. Instead, be inspired by the solutions found in your research and adapt the solutions to fit your brand, product, and users.

      I agree with this idea because sometimes even big companies make design mistakes that users don't like, but people assume it's the "right" way just because it's popular. This made me realize that doing a competitive analysis should be about critical thinking, not just following trends, and that we should always focus on what’s best for our own users and goals.

    4. A competitive analysis provides strategic insights into the features, functions, flows, and feelings evoked by the design solutions of your competitors.

      I find it interesting that the article includes “feelings” as something to analyze, not just features and functions. This reminds me that user experience is about more than just what a product does; it’s also about how it makes people feel. I think it’s important to remember that emotions can be a big part of why someone chooses one product over another.

    5. Don’t simply copy the designs you find in your research.

      This advice stands out to me because it’s tempting to just imitate what seems to work for others. I appreciate the reminder that every product and audience is different, so copying might not actually lead to the best results. It encourages me to think more critically and creatively about how to use what I learn from competitors.

    6. There’s no need to reinvent the wheel. Learn from what has been tried and is currently in use, map it out in a competitive analysis, and leverage your findings to differentiate your solution from the competition. And if you are new to a particular vertical, i.e. financial technology, then a competitive analysis will be imperative to grow your understanding of the basic features and functions of a financial technology platform. Understanding the landscape of competitors not only helps inform your design decisions but it also helps inform the overall product strategy. A UX competitive analysis uncovers valuable opportunities to create a superior product and stand out from the competition.

      The Medium article on competitive analysis really shifted how I think about looking at competitors. I used to see competitive analysis as just "finding flaws" in other apps, but this reading emphasized learning what works, too — and identifying gaps where you can differentiate. I completely agree with the idea that competitive analysis isn’t about copying features, it’s about making smarter design decisions based on what users already expect or are missing. It made me realize that a great product often succeeds not just because it's new, but because it better fits user needs in ways others haven’t yet. I’m excited to use these strategies for my own projects!

    1. For many years, surveyors approached questionnaire design as an art, but substantial research over the past forty years has demonstrated that there is a lot of science involved in crafting a good survey questionnaire. Here, we discuss the pitfalls and best practices of designing questionnaires.

      I find this point particularly interesting because it challenges the traditional view that survey writing is purely intuitive or creative. I completely agree that crafting effective survey questions is deeply rooted in scientific research and cognitive psychology. This reading made me realize that writing survey questions is much more methodical than I previously thought — it’s not just about asking what you want to know, but how you ask it dramatically shapes the answers you get. It definitely changes my perspective and makes me appreciate how much testing and refinement go into designing reliable surveys.

    2. In general, questions that use simple and concrete language are more easily understood by respondents. It is especially important to consider the education level of the survey population when thinking about how easy it will be for respondents to interpret and answer a question.

      I agree with this statement, it's important to get to know the participants, and adjust the problem space base on that. I think we should not only just with the education level, we should consider the demographic's cultural background, religion background, social-economic status as well.

    3. An example of a contrast effect can be seen in a Pew Research Center poll conducted in October 2003, a dozen years before same-sex marriage was legalized in the U.S. That poll found that people were more likely to favor allowing gays and lesbians to enter into legal agreements that give them the same rights as married couples when this question was asked after one about whether they favored or opposed allowing gays and lesbians to marry (45% favored legal agreements when asked after the marriage question, but 37% favored legal agreements without the immediate preceding context of a question about same-sex marriage). Responses to the question about same-sex marriage, meanwhile, were not significantly affected by its placement before or after the legal agreements question.

      I agree this example shows how much the way a question is asked can shape how people respond. When people hear a more intense option first, such as same sex marriage, they may see legal agreements as a more acceptable or reasonable choice. It’s interesting how opinions can shift just based on the order of the questions, even if we do not notice it.

    4. In addition to the number and choice of response options offered, the order of answer categories can influence how people respond to closed-ended questions.

      This reminds me of ordering food at restaraunts. If I was just asked open-endedly on what food do I want right now, I would answer something completely different from if I was choosing a dish from a menu. Especially, if there are too many options on the menu, it's always very hard to choose "the best one" that I want to order. I imagine a similar psychology is involved in surveys.

    1. Other creativity strategies are more analytical. For example, if you want to think of something new, question assumptions. Einstein asked whether time is really uniform and absolute in space. That’s a pretty disruptive idea. Even questioning smaller assumptions can have big design implications. Consider several of the assumptions that recent software companies questioned:

      I really enjoyed reading Chapter 5 on creativity. I appreciated Amy J. Ko’s perspective that creativity isn't just a magical talent you’re born with, but a skill you can nurture through processes like questioning assumptions, combining ideas, and persisting through failure. I especially agreed with the idea that creative work requires time and persistence — it’s easy to romanticize creativity, but the reality is that it often looks like hard work and small improvements over time. This chapter reminded me that creativity isn’t passive; it’s an active practice, and that mindset shift feels empowering for my future projects.

    1. Private message. November 2023. Page Version ID: 1185376021. URL: https://en.wikipedia.org/w/index.php?title=Private_message&oldid=1185376021 (visited on 2023-12-05).

      The Wikipedia article about Private messages shows how far private communication has come due to the development of technology. It was interesting to me that even though private messaging platforms are intended for one-to-one or small group interactions, they're still at risk of being breached, leaked, or even surveilled by the platform. It made me think of how just because something feels private — such as a DM on Instagram or a Discord message — it's not always necessarily private because the platform may have access. It made me think more deeply about what I want to say on the internet even on places that feel "safe."

  15. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. here are many reasons, both good and bad, that we might want to keep information private. There might be some things that we just feel like aren’t for public sharing (like how most people wear clothes in public, hiding portions of their bodies) We might want to discuss something privately, avoiding embarrassment that might happen if it were shared publicly We might want a conversation or action that happens in one context not to be shared in another (context collapse) We might want to avoid the consequences of something we’ve done (whether ethically good or bad), so we keep the action or our identity private We might have done or said something we want to be forgotten or make at least made less prominent We might want to prevent people from stealing our identities or accounts, so we keep information (like passwords) private We might want to avoid physical danger from a stalker, so we might keep our location private We might not want to be surveilled by a company or government that could use our actions or words against us (whether what we did was ethically good or bad)

      Learning about all the varying reasons for keeping things private really made me understand just how complicated privacy really is. It's not simply a matter of "hiding something bad," as people assume. For instance, I keep my live location private on the web for security reasons and not because I am doing anything wrong. I also linked this to our previous discussions regarding context collapse — when two audiences crash (like colleagues seeing your Facebook postings about family vacations), it actually makes sense that people would want to control where and how information gets distributed. It's not about secrecy; it's about shielding various aspects of our identity within various places. I do wonder, though: could anything be completely under our control regarding privacy these days, considering how much information gets surveilled by default?

    1. He clarified that episodic future thinking doesn’t mean just daydreaming about the future. It’s imagining likely different outcomes—positive or negative—that might be influenced by your choices today. As long as the negativity doesn’t become so overwhelming as to be paralyzing, imagining a dark future can be motivating. It can make us think, What are the behaviors that I have to engage in to offset risk for this future scenario? Edmondson told me.

      Het maakt niet uit of mijn visualisatie positief of negatief is

    1. Conclusions

      My general conclusion:

      Gambling is a tricky subject. At one point, if you understand the risks you are willing to take AND are of age, dare I say let them eat cake. I cannot say the same for those that are underage and not fiscally responsible for their gambling.

      I do gamble, both in the casino and on sports through various books. I have understood one major thing while doing this, which my roommates hate. I bet very small. Like 20 cents. Why? It's a part of the bankroll. The bankroll is technically how much you are willing to bet. In total, it should be about 1% per bet, which is a unit. If I have 20 dollars (case), then 20 cents is about how much I will gamble because it is 1% of my active bankroll (point).

      You have to be responsible. When you are easily influenced by the lights and the money you could possibly win, it is just as easy to lose sight of what happens. Bet responsibly, understand the risks, and PLEASE wait until 21 years of age.

    Annotators

    1. Have a plan for students to get help when you're busy with another student or group. Be sure students know when it's OK to come to you for help—and when it's not—and that there are several options for finding help when you are unavailable. For differentiated instruction to succeed, students must understand that it's never OK for them to just sit and wait for help to come to them or to disrupt someone else through off-task behavior, and that it's each student's responsibility to seek and to offer help in responsible ways as needed. You can help students learn to work collegially by suggesting that they ask a peer for clarification when they get stuck. Some classrooms have an "expert of the day" desk where one or more students especially skilled with the day's task serve as consultants. Astute teachers ensure that all students serve as "experts" at one time or another. (Students can assist by checking answers, proofreading, answering questions about directions or texts, and helping with art or construction tasks.) Or students may try to get themselves unstuck by "thinking on paper" in learning logs, for example.

      I had never thought about this because most students always have their hands up or just sit and wait quietly until it is their turn. If you have something in place that eliminates the delay in work you will keep the attention nor your class.

  16. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. For example, the proper security practice for storing user passwords is to use a special individual encryption process [i6] for each individual password. This way the database can only confirm that a password was the right one, but it can’t independently look up what the password is or even tell if two people used the same password. Therefore if someone had access to the database, the only way to figure out the right password is to use “brute force,” that is, keep guessing passwords until they guess the right one (and each guess takes a lot of time [i7]).

      I think the password storage method mentioned in this chapter makes a lot of sense. Because it doesn't think that the database will never be stolen, but is prepared for someone to steal it. Even if the password is stolen, the bad guys can only guess one by one slowly and it's very difficult to know the real password at once. This makes me think that actually many security protections are like this. It can't rely on just one layer of protection; there need to be several layers. For instance, when we log in to our accounts in daily life, some even send verification codes to their mobile phones, which adds an extra layer of protection.

    1. Around the start of ski season this year, we talked about my plans to go skiing that weekend, and later that day he started seeing skiing-related ads.He thinks it's because his phone listened into the conversation, but it could just as easily have been that it was spending more time near my phone

      Or—get this—it was because despite the fact that he "hasn't been for several years", he used to "ski a lot", and it was the start of ski season.

      You don't have to assume any sophisticated conspiracy of ad companies listening in through your device's microphone or location-awareness and user correlation. This is an outcome that could be effected by even the dumbest targeted advertising endeavor with shoddy not-even-up-to-date user data. (Indeed, those would be more likely to produce this outcome.)

    1. I don’t have an easy time explaining “how” I generate a hypothesis while knowing so little - it feels like I just always have a “guess” at the answer to some topic, whether or not I even want to (though it often takes me a lot of effort to articulate the guess in words). The main thing I have to say about the “how” is that it just doesn’t matter: at this stage the hypothesis is more about setting the stage for more questions about investigation than about really trying to be right, so it seems sufficient to “just start rambling onto the page, and make any corrections/edits that my current state of knowledge already forces.”

      I'm the opposite-- I have a strong hypothesis based off of what I call a "sensical, logical" line of thinking. Then I have to work at finding the wording for the questions to disprove or prove my hypothesis. That is my favorite part, though, and I always start to get distracted by this fascination I have for language. Words are so weighted and so careful, though, it's wonderful.... sigh I digress.

    1. satisfactory antibacterial activity in vivo,and its effect was similar to that of the approved drugretapamulin,

      QUESTION: This article does clearly present data and analysis supporting the anti-bacterial capacity of compound 6j in comparison to retapamulin, but it doesn't answer the question: what are the pharmacokinetic properties of compound 6j and how do they compare to the established drug, retapamulin? This is such a critical question when discussing a compounds potential for drug development because while the anti-bacterial capacity may be great and cytotoxicity data compelling as well, if the absorbance, metabolism, distribution, and/or excretion of 6j is/are not good then it is not a viable option for anti-MRSA drug development for humans. However, data in that type of study may show that compound 6j in fact has a better overall pharmacokinetic profile in comparison to retapamulin, and this would lend greater support to the idea of using 6j for further drug development. That key question just hasn't been answered yet and so it's difficult to say whether this compound is actual a good reference for the development of anti-MRSA drugs.

    1. I read this, and I felt it. I felt, or feel, me. I want this some times, others I want... Well, no I guess I want this all of the time. I have a voice and I love it. It may be ..too reliant(?) on weird punctuation and even weirder sentence structure. But it's me, so.. expect weird, you know?

      I know someone's going to like it. In fact, I'm sure lots of someones will like it.Or at least understand it. And they'll hear me. They'll see me. They'll understand me. Because I know I'm not the only one that gets blessed with this dang.. contemplative.?, expressive, ponderous, ability-to-show-in-writing, kind of voice.

      It just makes for some ugly writing, huh?

      I can make it prettier and more formal, too. That will come out a lot.

      I'm going to let it. I'm really going to let it, too!

      Thanks, [bloggers name, shit I can't remember and can't leave this page...], for being yourself.

      I've finally got the courage to practice on a platform.

      Watch out, world! Watch out, web! Watch out, wide!

    1. family diversity

      The heading is titled what is a family, however the text then jumps to family diversity. I am unsure of the relevance of woman entering the workforce and it's impact on the definition of family. This impacts gender roles, but I don't know that it impacts the definition of family. Suggested edit: What is a Family? The concept of “family” has undergone significant transformation over time, reflecting broader societal, cultural, and legal changes. Traditionally, family was narrowly defined as a nuclear unit—typically consisting of a married heterosexual couple and their biological children. However, this definition has expanded in response to shifting social norms, economic factors, and evolving understandings of identity and relationships.

      According to Gonzalez-Mena (2017), the definition of family must be inclusive of diverse structures, such as single-parent households, same-sex parents, multigenerational families, blended families, and families formed through adoption or foster care. These variations reflect the real-life experiences of many children and caregivers today. Similarly, Gestwicki (2016) emphasizes that early childhood educators must adopt a broad and flexible understanding of family to build respectful, supportive partnerships with all caregivers, regardless of structure.

      This shift in perspective is not just about inclusion—it also aligns with research showing that a child’s sense of belonging, identity, and security is deeply rooted in their family experiences, whatever form those may take (Swick & Williams, 2006). By embracing an expanded definition, educators can better support children’s development and strengthen the home-school connection.

      Gonzalez-Mena, Janet (2016). Child, Family, and Community: Family-Centered Early Care and Education (7th ed.). Pearson.

      Gestwicki, Carol (2016). Developmentally Appropriate Practice: Curriculum and Development in Early Education (6th ed.). Cengage Learning.

      Swick, Kevin J., & Williams, Reginald D. (2006). "An Analysis of Bronfenbrenner’s Bio-Ecological Perspective for Early Childhood Educators: Implications for Working with Families Experiencing Stress." Early Childhood Education Journal, 33(5), 371–378.

  17. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. An ethnographic data set of white children and black children approximately JO years old shows the effects of social class on interactions inside the home. Middle-class parents engage in concerted cultivation by attempting to foster children's talents through organized leisure activities and extensive reasoning. Working-class and poor parents engage in the accomplishment of natural growth, providing the condi-tions under which children can grow but leaving leisure activities to children them-selves. These parents also use directives rather than reasoning. Middle-class chil-dren, both white and black, gain an emerging sense of entitlement from their family life. Race had much less impact than social class.

      This passage is actually quite interesting - it compares the way middle-class families and working-class/poor families raise children. Parents in middle-class families will deliberately arrange interest classes and extracurricular activities, and will use "reasoning" to guide their children; while working-class families are more likely to let their children grow up freely, and they don't arrange many "organized" activities. This reminds me of my childhood, when my parents always said "go play by yourself", and now I think it's quite free.

      But the article also points out that this difference will make children from middle-class families more likely to have a feeling of "I deserve more", which is the so-called "entitlement", which may become a kind of self-confidence in adulthood. Children from working-class families do not develop this mentality so obviously. Therefore, the influence of social class on children's personality and opportunities is actually quite far-reaching, not just a matter of money.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive and helpful comments, which led us to make major changes in the model and manuscript, including adding the results of new experiments and analyses. We believe that the revised manuscript is much better than the previous version and that it addresses all issued raised by the reviewers. 

      Summary of changes made in the revised manuscript:

      (1) We increased the training set size from 39 video clips to 97 video clips and the testing set size from 25 video clips to 60 video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.

      (2) We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy.

      (3) The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESPs). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      (4) In addition, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      (5) Analyzing urination and defecation dynamics in an additional strain of mice revealed interesting strain-specific features, as discussed in the revised manuscript.

      (6) Overall, we found DeePosit accuracy to be stable with no significant bias across stages of the experiment, types of the experiment, gender of the mice, strain of mice, and across experimental conditions.

      (7) We also compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires the annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of training sets. 

      (8) As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°C (Figure 3—Figure Supplement 2).

      (9) We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      (10) In the revised paper, we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript provides a novel method for the automated detection of scent marks from urine and feces in rodents. Given the importance of scent communication in these animals and their role as model organisms, this is a welcome tool.

      We thank the reviewer for the positive assessment of our tool

      Strengths:

      The method uses a single video stream (thermal video) to allow for the distinction between urine and feces. It is automated.

      Weaknesses:

      The accuracy level shown is lower than may be practically useful for many studies. The accuracy of urine is 80%. 

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      This is understandable given the variability of urine in its deposition, but makes it challenging to know if the data is accurate. If the same kinds of mistakes are maintained across many conditions it may be reasonable to use the software (i.e., if everyone is under/over counted to the same extent). Differences in deposition on the scale of 20% would be challenging to be confident in with the current method, though differences of the magnitude may be of biological interest. Understanding how well the data maintain the same relative ranking of individuals across various timing and spatial deposition metrics may help provide further evidence for the utility of the method.

      The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      Reviewer #2 (Public Review):

      Summary:

      The authors built a tool to extract the timing and location of mouse urine and fecal deposits in their laboratory set up. They indicate that they are happy with the results they achieved in this effort.

      Yes, we are.

      The authors note urine is thought to be an important piece of an animal's behavioral repertoire and communication toolkit so methods that make studying these dynamics easier would be impactful.

      We thank the reviewer for the positive assessment of our work.

      Strengths:

      With the proposed method, the authors are able to detect 79% of the urine that is present and 84% of the feces that is present in a mostly automated way.

      Weaknesses:

      The method proposed has a large number of design choices across two detection steps that aren't investigated. I.e. do other design choices make the performance better, worse, or the same? 

      We chose to use a heuristic preliminary detection algorithm for the detection of warm blobs, since warm blobs can be robustly detected with heuristic algorithms without the need for a training set. This design selection might allow easier adaptation of our algorithm for different types of arenas. Another advantage of using a heuristic preliminary detection is the easy control of the preliminary detection parameters such as the minimum temperature difference for detecting a blob, size limits of the detected blob, cooldown rate and so on that may help in adopting it to new conditions. As for the classifier, we chose to feed it with a relatively small window surrounding each preliminary detection, and hence it is not affected by the arena’s appearance outside of its region of interest. This should allow lower sensitivity to the arena’s appearance.  

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Are these choices robust across a range of laboratory environments?

      We tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      How much better are the demonstrated results compared to a simple object detection pipeline (i.e. FasterRCNN or YOLO on the raw heat images)?

      We compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of a training sets. 

      The method is implemented with a mix of MATLAB and Python.

      That is right.

      One proposed reason why this method is better than a human annotator is that it "is not biased." While they may mean it isn't influenced by what the researcher wants to see, the model they present is still statistically biased since each object class has a different recall score. This wasn't investigated. In general, there was little discussion of the quality of the model. 

      We tested the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should ne create any bias of the results. Specifically, the detection accuracy is similar between urine and feces, hence should not impose a bias between the various object classes.

      Precision scores were not reported.

      In the revised paper we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Is a recall value of 78.6% good for the types of studies they and others want to carry out? What are the implications of using the resulting data in a study?

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      How do these results compare to the data that would be generated by a "biased human?"

      We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy (Figure 3).

      5 out of the 6 figures in the paper relate not to the method but to results from a study whose data was generated from the method. This makes a paper, which, based on the title, is about the method, much longer and more complicated than if it focused on the method.

      We appreciate the reviewer's comment, but the analysis of this new dataset by DeePosit demonstrates how the algorithm may be used to reveal novel and distinguishable dynamics of urination and defecation activities during social interactions, which were not yet reported. 

      Also, even in the context of the experiments, there is no discussion of the implications of analyzing data that was generated from a method with precision and recall values of only 7080%. Surely this noise has an effect on how to correctly calculate p-values etc. Instead, the authors seem to proceed like the generated data is simply correct.

      As mentioned above, the increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.  

      Reviewer #3 (Public Review):

      Summary:

      The authors introduce a tool that employs thermal cameras to automatically detect urine and feces deposits in rodents. The detection process involves a heuristic to identify potential thermal regions of interest, followed by a transformer network-based classifier to differentiate between urine, feces, and background noise. The tool's effectiveness is demonstrated through experiments analyzing social preference, stress response, and temporal dynamics of deposits, revealing differences between male and female mice.

      Strengths:

      The method effectively automates the identification of deposits

      The application of the tool in various behavioral tests demonstrates its robustness and versatility.

      The results highlight notable differences in behavior between male and female mice

      We thank the reviewer for the positive assessment of our work.

      Weaknesses:

      The definition of 'start' and 'end' periods for statistical analysis is arbitrary. A robustness check with varying time windows would strengthen the conclusions.

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not used the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      The paper could better address the generalizability of the tool to different experimental setups, environments, and potentially other species.

      As mentioned above, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      The results are based on tests of individual animals, and there is no discussion of how this method could be generalized to experiments tracking multiple animals simultaneously in the same arena (e.g., pair or collective behavior tests, where multiple animals may deposit urine or feces).

      At the moment, the algorithm cannot be applied for multiple animals freely moving in the same arena. However, in the revised manuscript we explicitly discussed what is needed for adapting the algorithm to perform such analyses.

      Recommendations for the authors: 

      -  Add a note and/or perform additional calculations to show that the results do not depend on the specific definitions of 'start' and 'end' periods. For instance, vary the time window thresholds and recalculate the statistics using different windows (e.g., 1-5 minutes instead of 1-4 minutes).

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not use the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      - Condense Figures 4, 5, and 6 to simplify the presentation. Focus on demonstrating the effectiveness of the tool rather than detailed experimental outcomes, as the primary contribution of this paper is methodological.

      We have added to the revised manuscript one technical figure (Figure 3) comparing the accuracy of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESP) as well as comparing its performance to a second human annotator and to YOLOv8. One more partially technical figure (Figure 5) compares the results of the algorithm between white ICR mice in the black arena and black C57BL/6 mice in the white arena. Thus, only Figures 4 and 6 show detailed experimental outcomes.

      - Provide more detail on how the preliminary detection procedure and parameters might need adjustment for different experimental setups or conditions. Discuss potential adaptations for field settings or more complex environments.

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Editor's note:

      Should you choose to revise your manuscript, please ensure your manuscript includes full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We have deposited the detailed statistics of each figure in https://github.com/davidpl2/DeePosit/tree/main/FigStat/PostRevision

    1. Reviewer #1 (Public review):

      I want to reiterate my comment from the first round of reviews: that I am insufficiently familiar with the intricacies of Maxwell's equations to assess the validity of the assumptions and the equations being used by WETCOW. The work ideally needs assessing by someone more versed in that area, especially given the potential impact of this method if valid.

      Effort has been made in these revisions to improve explanations of the proposed approach (a lot of new text has been added) and to add new simulations.

      However, the authors have still not compared their method on real data with existing standard approaches for reconstructing data from sensor to physical space. Refusing to do so because existing approaches are deemed inappropriate (i.e. they "are solving a different problem") is illogical.

      Similarly, refusing to compare their method with existing standard approaches for spatio-temporally describing brain activity, just because existing approaches are deemed inappropriate, is illogical.

      For example, the authors say that "it's not even clear what one would compare [between the new method and standard approaches]". How about:

      (1) Qualitatively: compare EEG activation maps. I.e. compare what you would report to a researcher about the brain activity found in a standard experimental task dataset (e.g. their gambling task). People simply want to be able to judge, at least qualitatively on the same data, what the most equivalent output would be from the two approaches. Note, both approaches do not need to be done at the same spatial resolution if there are constraints on this for the comparison to be useful.

      and

      (2) Quantitatively: compare the correlation scores between EEG activation maps and fMRI activation maps

      The abstract claims that there is a "direct comparison with standard state-of-the-art EEG analysis in a well-established attention paradigm", but no actual comparison appears to have been completed in the paper.

    1. definition of literacy in whatever form is inherently political.

      Of course—here’s a shorter, more reflective version with that “this hadn’t occurred to me before” tone:


      I hadn’t really thought about digital literacy from a political perspective. I’ve always focused on helping students use tools well and to prepare them for the workforce, but not so much on who gets left out or why. This made me realise it’s not just about skills—it’s also about power, access, and feeling like you belong in digital spaces.

    1. If you are geographically separated from your prospective employer, you may be invited to participate in a phone interview or web conference interview, instead of meeting face-to-face. Technology, of course, is a good way to bridge distances. The fact that you’re not there in person doesn’t make it any less important to be fully prepared, though. In fact, you may wish to be all the more “on your toes” to compensate for the distance barrier. Make sure your phone or computer is fully charged and your internet works (if possible, use an ethernet connection instead of wifi). If you’re at home for the interview, make sure the environment is quiet and distraction-free.  If the interview is via web conference, try to make your background neat and tidy (ideally, the background should be a plain wall, but that isn’t always possible). Avoid using a simulated background, as they often look fake and the employer may feel that you are trying to hide something.

      That’s a great reminder—technology definitely helps bridge the gap, but it’s true that virtual interviews require just as much, if not more, preparation. I always try to test everything at least 30 minutes beforehand to avoid any last-minute issues.

      Have you ever had a virtual interview where something unexpected happened? How did you handle it?

    1. To most people, it may seem that family life is important, but so many other things—getting a good education, establishing a career and, in one way or another, making the world a better place—matter more. In fact, just 34 percent of respondents endorsed the first statement, and 64 percent endorsed the second. As we have seen, the rate of marriage in the United States recently hit a fifty-year low, and people who do marry are now doing so considerably later in life.

      Interestingly, most people believe it has to be one or the other. You can want to be fully present for your family, to give them everything you've got, and at the same time want to chase your own goals, get a solid education, and build a meaningful career. It's not selfish to enjoy both. Pursuing your growth can make you an even better partner, parent, or role model. Life isn’t always about choosing one path; it’s often about finding a way to walk both. It can't be so black and white in discussions like these.