Reviewer #2 (Public Review):
General linear modelling (GLM) forms a cornerstone in analyses of task-based functional magnetic resonance imaging (fMRI) data. Obtaining robust single-trial fMRI beta estimates is a difficult problem given the relatively high levels of noise in the fMRI data. The introduced toolbox significantly improves such estimates using a few key features: 1) estimating a separate hemodynamic response function (HRF) for each voxel, 2) including noise regressors to improve reliability of the betas across repetitions, using a cross-validated approach, 3) using ridge regression on the beta estimates. The authors explain these steps well and compare the results obtained on subsequent metrics when choosing the include, or not, the different features along this procedure. They also compare their new approach to the Least-Squares Separate technique for beta estimation. For their demonstrations, they use two condition-rich datasets (NSD and BOLD5000) to show the improvements that different components of GLMsingle afford.<br /> The metrics used for comparisons are well chosen and relevant. Especially the test-retest reliability of GLM beta profiles is often a prerequisite for most subsequent analyses. Additionally, they consider temporal autocorrelation between beta estimates of neighbouring trials, and a few potential downstream analyses, looking at representational similarity analysis and condition classification. Thus, they really consider a range of possible applications and provide the reader with useful pointers to inspect what is most relevant for a given application.<br /> This manuscript and toolbox present a major advancement for the field of neuroimaging and should be of interest to essentially any researcher working with task-based fMRI data.
The strengths of the manuscript and toolbox are numerous, and presented results are convincing. To further the impact of the toolbox and paper, the authors could provide more guidelines on implementation for various common uses of the toolbox and/or factors to consider when deciding which steps to implement in one's analysis pipeline (FitHRF, DenoiseGLM, RR).
Additionally, there are a few considerations that could be addressed directly:<br /> 1) The authors use crossvalidation to determine the number of nuisance regressors to add in the model. Thus, any variability in responses to a single condition is considered to be 'noise'. How might this influence a potential use of single-trial estimates to assess brain-behaviour correlations (e.g. differences in behavioural responses to a single condition), or within-session learning conditions? For such uses, would the authors suggest to instead use LSS or a subset of their features in GLMsingle (i.e. not using GLMdenoise)?<br /> 2) In the results, using a fixed HRF leads to drastically lower performance on a variety of subsequent measures compared to fitting an HRF to each voxel, especially as regards to beta map test-retest reliability (Fig. 2-3). Have the authors ensured that the HRF chosen is the most appropriate one for the region of interest? In other words, is the chosen HRF also the one that most voxels are fitted in the flexible option?