Reviewer #1 (Public review):
Summary:
The question of how or whether "extensive memory training affects neocortical memory engrams" (to use the words of the authors) is an interesting question and an area where I think there is room for advancing current knowledge. That said, I do not think the current paper succeeds in meaningfully addressing this question. At a conceptual level, I really struggled with the predictions and interpretations of the findings. There are also several elements of the experimental paradigm and analysis decisions that feel incompatible with the claims that are made. While the manuscript does demonstrate that several measures of neural pattern similarity differ between the various groups of individuals, the issue is that it is difficult to draw clear conclusions from these findings.
Strengths:
(1) This is a very unique dataset. Being able to recruit and enroll high-level memory athletes is impressive.
(2) In principle, comparing memory athletes to control subjects, active control subjects (who received working memory training), and trained subjects (who received method of loci training) is very appealing.
(3) In several ways, the authors were rigorous in their analyses.
(4) In principle, the question of how memory training influences neural similarity vs. dissimilarity is of potential interest.
Weaknesses:
(1) As far as I can tell, the training manipulation is fully confounded with instructions. That is, subjects were only instructed to use the method of loci if they had completed method of loci training (or if they were the memory athletes). For the training group, in the pre-training session, there was no strategy instruction (subjects could do whatever they wanted), but post-training, they were told to use the method of loci. I understand the argument, of course, that naïve subjects might not be very good at using the method of loci if they had no experience with it. But, it does seem entirely possible that some (or even many) of the observed fMRI results that are attributed to "extensive training" are better explained by strategy use. That is, maybe the effects can be explained by TRYING to use the method of loci as opposed to actual proficiency with the method of loci. It seems impossible to address this, given the design of the experiments. As such, any claims about the effects of memory training, per se, feel inappropriate. It feels equally plausible that the effects are due to the strategy instruction. If the same results could be obtained through a simple strategy manipulation without ANY training at all, that would radically alter the interpretation of the effects. I think the strategy use account is, in fact, quite viable because it is very easy to improve subjects' memories with a method of loci instruction (relative to no strategy instruction) without ANY practice at all. Obviously, practice does improve memory performance with the method of loci, but my point is that even without any meaningful practice, there is likely to be SOME immediate benefit to adopting the method of loci as a strategy. There is also the question of why the effects for the memory athletes weren't obviously stronger than for the trained group, given that the memory athletes have much more experience with the method of loci. Ultimately, the problem with the current design is that I don't see how one can tease apart the role of training, per se, vs. strategy use.
(2) There is no clear theoretical framework for the predictions or interpretations. The Results section is mostly a list of lots of different permutations of analyses (similarity within a group, between groups, between trials, across trials between subjects, during encoding vs. retrieval, frontal vs. hippocampal vs. parietal ROIs, etc). For each analysis, I did not have an intuition for what the prediction should be (e.g., should athletes have higher or lower pattern similarity?), and even after seeing all the results, I still do not have an intuition for how to interpret them. For the main results related to dissimilarity in prefrontal cortex, I would have, if anything, predicted the opposite: that when individuals are trained to use a common strategy, there would be MORE similarity between them. The Discussion acknowledges a very wide range of possible factors that might contribute to measures of similarity/dissimilarity, but I am ultimately left feeling that I have no idea how to interpret the results because the design and analyses were not structured such that any of these interpretations could be teased apart.
(3) Same theme: the analyses shift from frontal regions (when looking at encoding) to hippocampus and precuneus (when looking at temporal recency). This shift in ROIs is confusing. The analyses (encoding vs. recognition) are essentially confounded with the ROIs (frontal vs. hippocampal/precuneus), so it's hard to know whether different analyses yielded different patterns or different ROIs yielded different patterns. Why were the frontal regions that were important for encoding ignored for the temporal recency judgments? And the fact that medial temporal lobe regions showed opposite effects to the frontal regions during encoding did not get much attention. Given that there were opposing patterns (dissimilarity vs. similarity) across different brain regions, the framing of the paper (that "the method of loci may bolster uniqueness") feels like a very selective representation of the data.
(4) One of the more surprising aspects of the analyses (or at least one of the analyses) is that representational similarity analyses (RSA) are used to compare the average activity pattern (averaged across all trials) between different individuals. At a conceptual level, this really just reduces to a univariate analysis. It is not standard (or intuitive) to think about RSA that is essentially blind to the actual representational content. In other words, averaging across trials obviously washes out the content, and what is left are process-level effects. For process-level analyses, univariate analyses are far more common and seem more straightforward. However, these 'RSA' analyses are described as reflecting the "uniqueness of each word-location association" (an account which strongly implies content-level effects). This feels like an inappropriate description of what the analyses actually reflect.
(5) I think the analysis looking at trial-by-trial similarity during word encoding (showing greater dissimilarity among the experienced individuals) is a somewhat interesting result, but again, I think the interpretation is very difficult. It is hard (or, impossible, I think) to get a clear sense of what is driving those differences. Is it the association of a unique spatial context? Is it somehow a product of better encoding, per se (as opposed to distinct spatial contexts)? These things could be tested by actually manipulating the spatial contexts in a more controlled way. For example, the paper by Liu et al. that is cited several times - and also a just-published paper by Christopher Baldassano (Nature Human Behaviour) - each used a very controlled paradigm where the (imagined) spatial location associated with each item was known/manipulated. However, the design of the current study does not allow for these things to be teased apart.
(6) Relatedly, the training group seemed to receive instruction on a common spatial route, but, surprisingly, "Participants were free to choose which route and how many they would use to anchor the 72 items." Thus, if I understand correctly, we don't know whether the trained individuals were using common or distinct locations. And the fact that they learned a 50-location route but then studied a 72-word list is also a bit strange. Not having control or knowledge of the location that was associated with each word (sequence position) is a major limitation and also a major difference between the current study and other recent studies. For that matter, the word order was also randomized, so there was no control over whether the words and/or locations matched. These issues really complicate interpretation.
(7) Again, same theme: for the result showing lower trial-by-trial similarity (within-subject similarity), the question is why, exactly, training/experience is associated with lower trial-by-trial similarity. Does training specifically or preferentially lead to greater differentiation between temporally-adjacent trials (as in Liu et al)? Does it lead to greater differentiation IF subjects associate each word with a unique location? Or maybe there is a more abstract effect of sequence/position that is independent of spatial location? Importantly, each of these three possibilities that I mention here has a precedent in prior studies that were more tightly controlled. But here, there is no way to tease these apart because of the experimental design, limiting the conclusions.
(8) The ISC analysis described on p. 9 (line 328) is confusing. If I understand correctly, correlations between different trials were not computed (e.g., subject 1 trial 1 was not correlated with subject 2 trial 2). Rather, trial 1 was always correlated with trial 1 (in other subjects). Thus, it is not clear whether trial-level alignment matters at all. Maybe the same results would be obtained if there were no correspondence across subjects in trial number. Or if the trial order was shuffled within the subject. Given this, I simply don't know how to think about the data. And why did memory athletes show higher pattern similarity in this analysis as opposed to lower pattern similarity (as in some other analyses)? And why was this analysis performed by comparing memory athletes to each other as opposed to memory athletes to non-athletes? And, conceptually, why was this selective to the memory athletes or to the precuneus? And why was it selective to the temporal order test and not encoding? I am not asking the authors to answer each of these questions; rather, the point I am trying to make is that this analysis, and many of the analyses, seem to raise more questions than they answer.
(9) The ISC analyses are interpreted in terms of scene construction and context reinstatement, but these conclusions go (very) far beyond what the data actually shows. Again, I don't see how this analysis lends itself to a meaningful conclusion. And this general critique applies to many of the analyses reported in this paper.
(10) The fact that words were in random order per subject also makes the ISC analysis even more confusing to think about. The memory athletes had unique spatial routes (that they used for the method of loci) and unique word lists. So, why would it make sense to look at trial-level ISC? At a conceptual level, I simply don't understand what this is intended to capture.
(11) Differences in the pattern of results between the encoding and temporal memory recognition task are hard to make sense of and are not addressed in much detail. Why would it make more sense to have across-trial similarity during recognition than during encoding? I think any account of this is very speculative.