On 2017 May 22, Lydia Maniatis commented:
Part 1 This publication is burdened with an unproductive theoretical approach as well as methodological problems (including intractable sampling problems). Conclusions range from trivial to doubtful.
Contemporary vision science seems determined to take organization of the retinal stimulation out of the picture, and replace it with raw numbers, whether neural firing rates or statistics. This is a fundamental error. A statistical heuristic strategy doesn’t work in any discipline, including physics. For example, a histogram of the relative heights of all the point masses in a particular patch of the world wouldn’t tell anything about the mechanical properties of the objects in that scene, because it would not tell us about distribution and cohesiveness of masses. (Would it tell us anything of interest?)
In perception, it is more than well established that the appearance of any point in the visual field –with respect to lightness, color, shape, etc - is intimately dependent on the intensities/spectral compositions of the points in the surrounding (the entire) field (specifically their effects on the retina) and on the principles of organization that the visual process effectively applies to the stimulation. Thus, a compilation of, for example, the spectral statistics of Purves’ colored cube would not allow us either to explain or predict the appearance of colored illumination or transparent overlays. Or, rather, it wouldn’t allow us to predict these things unless we employed a very special sample of images, all of which produced such impressions of colored illumination. Then we might get a relatively weak correlation. This is because, within this sample, a preponderance of certain wavelengths would tend to correlate with e.g. a yellow, illumination impression, rather than being due, as might be true for the general case, to the presence of a number of unified apparently yellow and opaque surfaces. Thus, we see how improper sampling can allow us to make better (and, I would add, predictable) predictions without implying explanatory power. In perception, explanatory power strictly requires we take into account principles of organization.
In contrast, the authors here take the statistics route. They want to show, or rather, don’t completely fail to corroborate the observation that when surfaces are wet, their look colors are deeper and more vivid, and also to corroborate the fact that changes in perception are linked to changes in the retinal stimulation. Using a set of ready-made images (criteria for the selection of which are not provided), they apply to them a manipulation (among others) that has the general effect of increasing the saturation of the colors perceived. One way to ascertain whether this manipulation causes a surface to appear wet would be to simply ask observers to describe the surface, without any clues to what was expected. Would the surface be spontaneously be described as “wet” or “moist”? This would be the more challenging test, but is not the approach taken.
Instead, observers are first trained on images (examples of which are not provided - I have requested examples) that we are told appear very wet (and the dry versions), and include shape-based cues, such as drops of water or puddles. They are told to use these as a guide to what counts as very wet, or a rating of 5. They are then shown a series of images containing both original and manipulated images (with more saturated colors, but lacking any shape-based cues), and asked to rate wetness from 1 to 5.
The results are messy, with some transformed images getting higher ratings than the originals and others not, though on average they are more highly rated. But the ratings for all the images are relatively low; and we have to ask, how have the observers understood their task? Are they reporting an authentic perception of wetness or moistness, or do they believe are they trying to guess at how wet a surface actually is, based on a rule of thumb adopted during the training phase, in which, presumably, the wet images were also more color-saturated? (In other words, is the task authentically perceptual, or is it more cognitive guesswork?) What does it mean to rate the wetness of a surface at e.g. the “2” level?
The cost of ignoring the factor of shape/structure is evident in the authors’ attempt to explain why the ratings for all images were so low, reaching 4 in only one case. They explain that it may be because their manipulation didn’t include areas that looked like drops or puddles. Does this mean that the presence of drops or puddles actually changes the appearance of the surrounding areas, and/or that perhaps those very different training images included other organized features that were overlooked and that affected perception? Did the training teach observers to apply a cue in practice that by itself produces somewhat different perceptual outcomes? I suppose we could ask the observers about their strategy, but this would muddy the facade of quantitative purity.
At any rate, the manipulation (like most ad hoc assumptions) fails as a tool for prediction, leading the authors to acknowledge that “The image transformation greatly increased the wetness rating for some images but not for others…” (Again, it isn’t clear that “wetness rating” correlates with an authentically perceptual scale). Thus, relative success or failure of the transformation is image-specific, and thus sample-specific; some samples and sample sets would very likely not reach statistical significance. Thus the decision to investigate further (Experiment 1b) using (if I’m reading this correctly) only a single custom-made image that was not part of the original set (on what basis was this chosen?) seems unwise. (This might seem to worsen the sampling problem, but the problem is intractable anyway. As there is no possible sample that would allow the researchers to generate reliable statistics-based predictions for the individual case, any generalization would be instantly falsifiable, and thus lack explanatory power).
The degree to which any conclusions are tied to the specific (and unrationalized) sample is illustrated by the fact that the technical manipulations were tailored to it (from Experiment 1a): “In deciding [the] parameters of the WET transformation, we preliminarily explored a range of parameters and chose ones that did not disturb the apparent naturalness of all the images used in Experiment 1a.” Note the lack of objective criteria for “naturalness.”). (We’re not told on what basis the parameters in Experiment 1b were chosen). In short, I don’t think this numbers game can tell us anything more from a theoretical point of view than casual observation and e.g., trial and error by artists, already have.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.