Reviewer #1 (Public review):
Summary:
This manuscript by Harris and Gallistel investigates how the rate of learning and strength of conditioned behavior post learning depend on the various temporal parameters of Pavlovian conditioning. They replicate results from Gibbon and Balsam (1981) in rats to show that the rate of learning is proportional to the ratio between the cycle duration and the cue duration. They further show that the strength of conditioned behavior post learning is proportional to the cue duration, and not the above ratio. The overall findings here are interesting, provide context to many conflicting recent results on this topic, and are supported by reasonably strong evidence. Nevertheless, there are some major weaknesses in the evidence presented for some of the stronger claims in the manuscript.
Strengths:
This manuscript has many strengths including a rigorous experimental design, several different approaches to data analysis, careful consideration of prior literature, and a thorough introduction and discussion. The central claim-that animals track the rates of events in their environment, and that the ratio of two rates determine the rate of learning-is supported with solid evidence.
Weaknesses:
Despite the above major strengths, some key aspects of the paper need major improvement. These are listed below.
(1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981) and the rats in this study. I think the evidence for this claim is weak as presented here. First, the exact measure used for identifying trials to criterion makes a big difference in Fig 3. As best as I understand, the authors do not make any claims about which of these approaches is the "best" way. Second, the measure used for identifying trials to criterion in Fig 1 appears different from any of the criteria used in Fig 3. If so, to make the claim that the quantitative relationship is one and the same in both datasets, the authors need to use the same measure of learning rate on both datasets and show that the resultant plots are statistically indistinguishable. Currently, the authors simply plot the dots from the current dataset on the plot in Fig 1 and ask the readers to notice the visual similarity. This is not at all enough to claim that both relationships are the same. In addition to the dependence of the numbers on the exact measure of learning rate used, the plots are in log-log axis. Slight visual changes can mean a big difference in actual numbers. For instance, between Fig 3 B and C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5. The authors need to perform much more rigorous quantification to make the strong claim that the quantitative relationships obtained here and in Gibbon and Balsam 1981 are identical.
(2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, I am confused why the data used to the test the ITI proportionality come from the last 5 sessions. Specifically, if the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Fig 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ pre and post cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to cue reward rate instead of cue reward rate plus contextual reward rate?
(3) I think there was a major conceptual disconnect between the gradual nature of learning shown in Figs 7 and 8 and the information theoretic model proposed by the authors. To the extent that I understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to rate of rewards, why is it changing as animals go from 10% to 90% of peak response? I think the manuscript would be much strengthened if these results are explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, please do explicitly state this in the manuscript.
(4) I find the idea stated in the Conclusion section that any model considering probability of reinforcement cannot be correct because it doesn't have temporal units to be weak. I think the authors might mean that existing models based on probability do not work and not that no possible model can work. For any point process, the standard mathematical treatment of continuous time is to compute the expected count of events as p*dt where p is the probability of occurrence of the event in that time bin and dt is an infinitesimal time bin. There is obviously a one-to-one mapping between probability of an event in a point process and its rate. Existing models use an arbitrary time bin/trial and thus, I get the authors' argument in the discussion. However, I think their conclusion is overstated.
(5) The discussion states that the mutual information defined in equation 1 does not change during partial reinforcement. I am confused by this. The mean delay between reinforcements increases in inverse proportion to the probability of reinforcement, but doesn't the mean delay between cue and next reinforcement increase by more than this amount (next reinforcement is greater than or equal to the cue-to-cue interval away from the cue for many trials)? Why is this ratio invariant to partial reinforcement?
Comments on revisions:
Update following revision
(1) This point is discussed in more detail in the attached file, but there are some important details regarding the identification of the learned trial that require more clarification. For instance, isn't the original criterion by Gibbon et al. (1977) the first "sequence of three out of four trials in a row with at least one response"? The authors' provided code for the Wilcoxon signed rank test and nDkl thresholds looks for a permanent exceeding of the threshold. So, I am not yet convinced that the approaches used here and in prior papers are directly comparable. Also, there's still no regression line fitted to their data (Fig 3's black line is from Fig 1, according to the legends). Accordingly, I think the claim in the second paragraph of the Discussion that the old data and their data are explained by a model with "essentially the same parameter value" is not yet convincing without actually reporting the parameters of the regression. Related to this, the regression for their data based on my analysis appears to have a slope closer to -0.6, which does not support strict timescale invariance. I think that this point should be discussed as a caveat in the manuscript.
(2) The authors report in the response that the basis for the apparent gradual/multiple step-like increases after initial learning remains unclear within their framework. This would be important to point out in the actual manuscript. Further, the responses indicating the fact that there are some phenomena that are not captured by the current model would be important to state in the manuscript itself.
(3) There are several mismatches between results shown in figures and those produced by the authors' code, or other supplementary files. As one example, rat 3 results in Fig 11 and Supplementary Materials don't match and neither version is reproduced by the authors' code. There are more concerns like this, which are detailed in the attached review file.