Reviewer #1 (Public Review):
The manuscript is well written, clearly describes the scientific background and hypotheses, and provides a sound illustration of the results, which can advance our current understanding of the neural basis of decision-making processes. The main conclusion is that pallidal stimulation in patients with dystonia leads to an increased number of exploratory choices, i.e. choosing the option with a lower expected value instead of exploiting the option with the highest expected value. There are, however, some shortcomings that limit the interpretability of the data in its current form regarding the lack of a healthy control group, inconsistency between frequentist and Bayesian statistics applied, and the limited specificity of the connectome correlation analysis. These shortcomings should be addressed by the authors in order to improve the paper.
Detailed description of comments:
(1) Generalizability:<br />
Studying dystonia patients gives the unique opportunity to study the effects of electrical pallidal stimulation on decision-making in humans and given that dystonia primarily affects movements rather than cognition/decision-making this might also well be representative of healthy people. This (i.e. the similarity between task performance of patients and healthy people) is, however, not demonstrated in this study. In the introduction, the authors state that reward prediction error is intact in dystonic patients, but the paper that they cite for this (ref 34) is titled '... abnormal reward learning in cervical dystonia'. Furthermore, albeit clearly less pronounced than movement symptoms cognitive problems are present in dystonia patients (see Jahanshahi 2017 Movement Disorders). I would therefore recommend enrolling a healthy control group allowing to compare DBS ON and DBS OFF to healthy people.
(2) Statistics:<br />
I understand that Bayesian statistics cannot always directly be compared to non-Bayesian frequentist statistics. However, to me, the frequentist and Bayesian statistics are not consistent in this study. ANOVAs, etc are applied on subject-averages data using a p-value of 0.05 to distinguish between significant vs. non-significant results. In the Bayesian modelling analysis, the 95% HDI is computed. While this number is arbitrary (just as a p-value of 0.05) it still has a rationale to it given that in the scientific community 95% is also used for frequentist confidence intervals. Therefore, I think that 95% would be the most consistent choice here. However, none of the model parameters differ between ON vs. OFF regarding the 95% HDIs, since they overlap with 0 (see 'Contrast' in table 1). Especially the decision threshold and drift rate scaling parameter HDIs have a large overlap with 0, but they are still interpreted as significant based on the Bayes factor. The Bayes factor, however, is not used for the behavioral analyses. For example, there are no effects of DBS on decision times, but at the computational level, several parameters (which predict the decision time) are affected. I think for the sake of consistency of analyses within the paper the statistics of the Bayesian analyses should rely on the 95% HDI.
(3) Connectome correlation analysis:<br />
If I understand it correctly, the connectome analysis relates behavioral effects of stimulation to whole-brain networks rather than just local effects in the pallidum by testing whether patients who showed stronger effects of stimulation have electrodes that are closer to connections with different brain areas. In the abstract, the results of this analysis are reported as "... was predicted by the degree of functional connectivity between the stimulating electrode and prefrontal and sensorimotor cortices". In the discussion, it is stated that "...DBS-induced enhanced exploration correlated with the functional connectivity of the stimulation volume in the GPI to frontal cortical regions identified previously in functional imaging studies of explore-exploit decision making ... The exploration-enhancing effects of GPI-DBS in our study were predicted by functional connectivity to brain regions whose neurons encode uncertainty [27] and predict behavioural switching[430 29, 30]". However, figure 4 essentially shows that almost the whole brain correlates with inter-individual differences in behavior reaching correlation coefficients as strong as -0.7 e.g. lower brain stem, cerebellum, and occipital cortex, none of which are mentioned in the paper. To me, it seems that there are correlations with very large and very distributed cortical areas rather than with specific areas in the prefrontal and sensorimotor cortex as stated in the paper.<br />
Related to this point: The variable used for the connectomic correlation analysis is not the same variable that was affected by DBS in the statistical analysis. The statistical analysis found that P(explore) differed between DBS ON vs OFF irrespective of the session. Instead the "maximum within-session increase in P(Explore) DBS-ON - P(Explore ) DBS-OFF" was used.
In general, could you please explain this analysis in more detail? If I understand it correctly each voxel had a value for 'connectivity' to the stimulation field and a value for 'behavioral effect' and across patients, this then gave an R-map. How was figure 4 thresholded (only the maximum positive and negative Rs are given in the color bar)? Then p-values are listed. One is 0.04 and another one is 0.009. What is the difference between the two? These values seem to reflect the correlation of similarity between the individual map with the group map and the behavioral variable, but was the correlation with the behavioral variable not already used for creating the R-map? Describing the analysis in more detail might help make it more understandable to the audience not familiar with the analysis (including me).
4) It is my understanding that high exploration (e.g. P(Explore) of 0.2) should be related to poorer task performance since the optimal strategy would always use the high-value option and only switch rarely to identify the reversal(s). Why is it then that DBS can affect exploration but not the sum of rewards if the two are related? Should DBS not affect the sum of rewards if it for example was more pronounced in its effect on P(explore)?
5) Would the authors have predicted different effects for subthalamic deep brain stimulation? The DBS effects on the GPi are mainly interpreted in terms of reduced firing rate/activity. Since the STN exerts glutamatergic innervation of the GPi, should STN suppression lead to similar results? Conversely, GPe exerts GABAergic innervation of the STN. Should GPe suppression lead to the opposite behavioral effect? Were some of the electrodes localized within or close to the GPe rather than GPi and if so, did these patients show different behavioral effects?
6) Was the OFF vs ON DBS order counterbalanced? 3 patients did not complete the task OFF, and the ON dataset was not available in another patient. Did the authors check if the DBS order was relevant for the DBS effect on P(explore)?