Reviewer #3 (Public Review):
Original review
This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their findings to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human.
This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing.
Methods - Design issues:
(1) The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless).
(2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front.
(3) Humans have big advantages in processing and discriminating spoken stimuli and associating them to visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences.
(4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do?
Methods - Analysis issues:
(5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative.
FRAMING:
(6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions.
First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights about how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy.
Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task.
Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above).
(7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and Dehaene-Lambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").
Comments on revised version:
I thank the authors for engaging constructively with my comments. I'm convinced by the responses to my original points 1, 2, 3, and 4. I'm also partially convinced by the response to point 6 (with qualifications discussed below). I do want to clear the record on points 1 and 6 (about which the authors expressed offense at aspects of my original comments), and to press on points 5 and 7.
(1) It's very helpful to know that the plan was always to extend training in Expt 1. The rationale is now clear in the methods, although I'd encourage the authors to also emphasize this if space permits in the vicinity of lines 211-216, which still read as if the extended training was a post hoc decision ("the canonical congruity effect... was not significant... after 3 days of exposure... Thus... monkeys were further exposed..."). The authors have objected to my original use of "p hacking", which I agree was too strong (my apologies). My intention was only to point out that *if it were the case that training duration was conditional on the monkeys' success at learning the canonical association* (which the authors have now clarified was not the case), then this would be steering the study post hoc to achieve a desired outcome. I recognize the authors' point that the canonical direction was a sanity check, not the effect of interest (reversed association), but it's still true that they needed to achieve this sanity check in order for the absence of a reversed effect to be meaningful. This was the source of my original concern. This point is only clarificational (no action is recommended).
(5) The authors have said they don't understand my concern about "double-dipping" in the statistical analyses, so I will attempt to clarify. First, I should stress that this concern applies only to the whole-brain results (Tables 1-4), not the fROI results. As the authors point out, this was indeed unclear, and I apologize. My concern about Tables 1-4 is that they seem to be derived using the classical technique of thresholding contrasts at some significance level to define clusters and then reporting cluster statistics (in this case, t-values) derived from *the same contrast in the same activation maps*. If this is not what was done (i.e., if orthogonal data and/or contrasts were used to define clusters and quantify contrasts within clusters, as in the fROI analyses), then this point is moot (and clarification in the paper would be helpful). But if this is what was done, then this procedure is known to be distortionary (e.g., Kriegeskorte et al 2009, "Nonindependent selective analysis is incorrect and should not be acceptable in neuroscientific publications").
(6) The authors have objected to my use of the term "insinuate" as pejorative. I don't share this impression (and insult was certainly not my intent) but I'm happy to concede that a less loaded term (e.g., "suggest") would have been a better choice. I apologize. In any case, I stand by my intended original concern that a key idea in this piece (that reversible symbolic inference is a singularity of the human brain) is being advanced rhetorically rather than empirically, by repeatedly supplying it to readers (albeit with qualifiers like "potential") as an interpretive lens through which to view empirical results that only directly support a more modest claim (that macaques spontaneously reverse sequential associations less readily than humans do). To be clear, it is good that the authors don't make this stronger claim outright, and it is fine to motivate a more modest research question (e.g., do species differ in spontaneous reversal of associations) on the grounds that it is a stepping stone to a bigger one (what is the singularity). But by placing the bigger framing front and center in this way, there's a risk that this paper will be received by the community as establishing a conclusion that it does not actually establish.
(7) The authors have said they don't understand the circularity I'm alleging. Having read the revision, I believe the issue is still there, so I'll make another attempt. The problem is most clearly apparent in the Discussion text quoted in my original comment (lines 347-350 of the revision, emphasis mine): "Several studies previously found behavioural evidence for a *uniquely human* ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was *therefore* proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and Dehaene-Lambertz, 2019; Nieder, 2009)." In other words, reversal of associations is selected as a defining feature of symbols and targeted by this study *because* it is thought to be uniquely human. This is fine, but it prohibits you from then advocating the hypothesis that symbolic cognition is the singularity (lines 49-52), because "symbol" is being defined such that this is necessarily the case. To minimally paraphrase what I perceive to be the circular logic in the framing, the argument seems to go: "What is uniquely human? Symbols. What are symbols? That which is uniquely human." In my original comment, I suggested a reframing that would fix this issue, namely: "What is uniquely human? Spontaneous reversal of temporal associations." The authors say they don't see the difference between this framing and their own, so I'll try to clarify: the difference is that it sidesteps the notion of "symbol", and in so doing removes the circular definitions of "symbol" and "singularity" in terms of each other. This suggestion was given not as a prescription but as an example to show that the issue can be remedied by revisions to the framing without doing damage to the empirical claims. If the authors prefer a different remedy that avoids circular definitions of terms, that's fine.