Reviewer #1 (Public review):
Summary:
Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.
Strengths:
The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.
Weaknesses:
Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.
Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.
Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.
Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time). Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms. Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms). As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).