Reviewer #2 (Public review):
Summary:
The manuscript by Foucault, Weber, and Hunt examines human learning behavior across change-point and continuously changing environments. The authors suggest that humans normatively adjust their learning dynamics to the current environmental dynamics. Moreover, they argue that humans not only track the means of the outcome-generating process, but also the variance, which extends recent work in this domain. The present results suggest that human learners are well able to distinguish the two moments and adjust their behavior accordingly.
Strengths:
(1) The paper is clearly written, and the figures demonstrate the results well. The authors clearly explain the two key results and their implications for the field.
(2) The paper uses a common modeling framework for the two environments. This makes it less likely that differences in learning behavior between the two environments are driven by general model properties rather than the specific learning mechanisms.
Weaknesses:
(1) Interpretation in terms of normative learning
(1.1) Perseveration and paddle movement
The model presented in the main manuscript is equipped with a response-probability mechanism that controls whether the paddle is updated. Especially on smaller prediction errors, the paddle is often not updated (perseveration). I wonder whether this mechanism truly reflects normative updating behavior or rather a heuristic strategy. Not moving the paddle is non-normative. A fully Bayesian model would hardly ever show a learning rate of exactly zero (one could argue only when the error is itself zero or after a massive amount of trials). This is partly apparent in Supplementary Figure 1, where the lowest learning rates are around alpha = 0.2 (change-point environment) and 0.5 (random walk).
Supplementary Figure 1 shows the learning rate for the normative model without the response-probability mechanism. Primarily in the random-walk environment, but to some extent also in the change-point condition, the shape of the learning rate changes quite dramatically compared to Figure 4. In the random-walk environment, the learning rate appears relatively stable, with a value slightly larger than 0.5. In the change-point case, the learning rate is somewhat higher in the range of smaller prediction errors. Doesn't this speak against the interpretation that the model in the main manuscript is really behaving in a purely normative fashion? The tendency to perseverate might reflect a simplified strategy, which is sometimes described as "satisficing". That is, in line with the authors' description of the mechanism, perseveration occurs when it seems "good enough" (Simon, 1956), which has been demonstrated in a belief updating context before (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021).
Supplementary Figure 3 suggests that humans show quite a lot of this type of behavior. It indicates that in the change-point condition, in only 20% of the trials in the minimal prediction error range, participants update their prediction (i.e., in 80% of these trials, they perseverate on the previous prediction). This update probability increases as a function of the prediction error. In the random-walk condition, update probabilities are higher, starting at around 40% and also increasing as a function of the error.
Indeed, Supplementary Figure 4 suggests that the shape of the learning rate for true update trials is much shallower for humans and the "perseverative" model compared to the model in Supplementary Figure 1. This suggests that the curve in Figure 4 (main manuscript), hinting at a continuous increase in the learning rate, could be the result of a mixture of perseveration (alpha = 0) and higher learning rates compared to the normative model without the response-probability mechanism.
(1.2) Control models
One might reply that the response-probability mechanism just adds noise, while the actual learning mechanism is still normative. However, a standard Rescorla-Wagner model with the same response-probability mechanism might also show increasing apparent learning rates as a function of prediction error (when perseveration trials and regular update trials are averaged as a function of the prediction error).
Therefore, I suggest adding a control analysis with a Rescorla-Wagner model. One version with the same response mechanism yielding perseveration, and one standard Rescorla-Wagner model without this mechanism. This should help identify how well the present analyses can distinguish true learning-rate dynamics from averaging artifacts due to perseveration.
(1.3) Discussion of the possibility of non-normative learning mechanisms
Given the considerations above, I suggest a more balanced discussion of potential non-normative influences on learning, in particular, perseveration. Several previous papers have similarly shown that perseveration prominently characterizes human learning and decision-making (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021), and in my opinion, it would be relevant to discuss how normative and non-normative mechanisms might jointly shape learning.
(2) Model description
The Bayesian model is quite central to the paper. However, the mathematical details are sparse, and I did not fully understand the differences between the model variants and how they were implemented. In particular, what approximations were used to make the model tractable? And how does the variance inference work? Is the learning rate directly computed, similar to the Nassar model, or is it derived from updates and prediction errors?
(3) Apparent learning rates in humans
The main learning-rate analyses compute the fraction of updates and prediction errors. For quality assurance, it would be useful to see a few supplementary histograms of the apparent learning rates. It would be great to have one plot across all participants and a few example plots for single participants. These analyses will reveal the distribution of learning rates and the proportion at the boundaries, which can sometimes be a source of bias.
References:
Bruckner, R., Nassar, M. R., Li, S.-C., & Eppinger, B. (2025). Differences in learning across the lifespan emerge via resource-rational computations. Psychological Review, 132(3), 556-580. https://doi.org/10.1037/rev0000526.
Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394. https://doi.org/10.1016/j.cognition.2020.104394.
Nassar, M. R., Waltz, J. A., Albrecht, M. A., Gold, J. M., & Frank, M. J. (2021). All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefs. Brain, 144(3), 1013-1029. https://doi.org/10.1093/brain/awaa453.
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129-138. https://doi.org/10.1037/h0042769.