On 2017 Apr 24, Lydia Maniatis commented:
This article’s casual approach to theory is evident in the first few sentences. After noting irrelevantly, that “Since their introduction (Wilkinson, Wilson, & Habak, 1998), RF patterns have become a popular class of stimuli in vision science, commonly used to study various aspects of shape perception,” the authors immediately continue to say that “Theoretically, RF pattern detection (discrimination against a circle) could be realized either by local filters matched to the parts of the pattern, or by a global mechanism that integrates local parts operating on the scale of the entire pattern.” No citation is offered for this vague and breezy assertion, which begs a number of questions.
How did we jump from “shape perception” to “RF detection against a circle”? How is the latter related to the former?
Is the popularity of a pattern sufficient reason to assume that there exist special mechanisms – special detectors, or filters – tailored to its characteristics? Is there any basis whatsoever for this assertion?
Given that we know that the whole does determine the parts perceived, why are we talking about integration of “local” elements? And how do we define local? Doesn’t a piece of a shape also consist of smaller pieces, etc? What is the criterion for designating part and whole in a stimulus pattern (as opposed to the fully-formed percept)?
Apparently, there have been many ‘models’ proposed for special mechanisms for “RF detection against a circle,” addressing the question in these local/local-to-global terms. Could the mechanism involve maximum curvature integration, tangent orientations at inflection points, etc.? These simply take for granted the underlying assumption that there are special “filters” for “RF discrimination against a circle.” The only question is to what details of the figure are these mechanisms attuned.
What if we were dealing with different types of shapes? What if the RF boundary shape were formed by different sized dots, or dashes, or rays of different lengths radiating from a center? Would we be talking about dot filters, or line length filters? Why put RF patterns in general, and RF patterns of this type in particular, on such an explanatory pedestal?
More critically, how is it possible to leverage such patterns to dissect the neural processes underlying perception? When I look at one of these patterns, I don’t have any trouble distinguishing it from a circle. What can this tell me about the underlying process?
A subculture of vision science has opted to uncritically embrace the view that underlying processes can be inferred quite straightforwardly on the basis of certain procedures that mimic the general framework of signal detection. This view is labeled “signal detection theory” or SDT, but “theory” is overstating it. As noted in my earlier comment, Schmidtmann and Kingdom (2017) never explain why they make what, to a naïve observer, must seem very arbitrary methodological choices, nor does their main reference, Wilkerson, Wilson and Habak (1998). So we have to go back further to find some suggestion of a rationale.
The founding fathers of the aforementioned subculture include Swets, Tanner and Birdsall (e.g. 1961). As may be seen from a quote from that article (below), the framing of the problem is artificial; major assumptions are adopted wholesale; “perception” is casually converted to “detection” (in order to fit the analogy of a radar observer attempting to guess which blip is the object of interest).
“In the fundamental detection problem, an observation is made of events occurring in a fixed interval of time and a decision is made; based on this observation, whether the interval contained only the background interference or a signal as well. The interference, which is random, we shall refer to as noise and denote as N; the other alternative we shall term signal plus noise, SN. In the fundamental problem, only these two alternatives exist…We shall, in the following, use the term observation to refer to the sensory datum on which the decision is based. We assume that this observation may be represented as varying continuously along a single dimension…it may be helpful to think of the observation as…the number of impulses arriving at a given point in the cortex within a given time.” Also “We imagine the process of signal detection to be a choice between Gaussian variables….The particular decision that is made depends on whether or not the observation exceeds a criterion value….This description of the detection process is an almost direct translation of the theory of statistical decision.”
In what sense does the above framework relate to visual perception? I think we can easily show that, in concept and application, it is wholly incoherent and irrational.
I submit, first, that when I look around me, I don’t see any noise, I just see things. I’m also not conscious of looking for a signal to compare to noise; I just see whatever comes up. I don’t have a criterion for spotting what I don’t know will come up, and I don’t feel uncertain of - I certainly hardly ever have to guess at – what I’m seeing. The very effortlessness of perception is what made it so difficult to discern the fundamental theoretical problems. This is not, of course, to say that what the visual system does in constructing the visual percept from the retinal stimulation isn’t guesswork; but the actual process is light years more complex and subtle than a clumsy and artificial “signal detection” framework.
Given the psychological certainty of normal perceptual experience, it’s hard to see how to apply this SDT framework. The key seems to be to make conditions of observation so poor as to impede normal perception, making the observer so unsure of what they saw or didn’t see that they must be forced to choose a response, i.e. to guess. One way to degrade viewing conditions is to make the image of interest very low contrast, so that it is barely discernible; another way is to flash it for very brief intervals. Now, in these presentations, the observer presumably sees something; so these manipulations don’t necessarily produce an uncertain perceptual situation (though the brevity of the presentation may make the recollection of that impression mnemonically challenging). Where the uncertainty comes in is in the demand by investigators that observers decide whether the impression is consistent with a quick, degraded glimpse of a particular figure, in this case an RF of a certain type or a circle. I don’t see how one can defend the notion put forth by Swets et al (1961) that this decision, which is more a conscious, cognitive one than a spontaneous perceptual one, is based on a continuously varying criterion. The decision, for example, may be based on a glimpse of one diagnostic feature or another, or on where, by chance, the fovea happens to fall in the 180ms (Schmidtmann and Kingdom, 2017) or 167ms (Wilkerson et al, 1998) interval allowed. But the forced noisiness (due to the poor conditions), the Gaussian presumptions, the continuous variable assumption, and the binary forced choice outputs are needed for the SDT framework to be laid on top of the data.
For rest of comment (here limited by comment size limits), please see PubPeer.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.