Reviewer #3 (Public Review):
The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.
* Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.
* Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.
Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.
Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.
If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.
* Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.
* More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed.
* Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).
At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).
Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.
Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.
* On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?
* Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.
* Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.
* RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.
* Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.