Hypothesis

8 Matching Annotations

Jul 2018
europepmc.org europepmc.org

https://www.ncbi.nlm.nih.gov/pubmed/26790845

4
1. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2016 Mar 08, Lydia Maniatis commented:
  
  Thanks to the authors for taking the trouble to respond to my comments. However, I feel they're missing the point. Below, I've copy-pasted their reply and broken it up into statements by them and replies by me.
  Authors: Our conclusions do not depend on any model, falsifiable or otherwise. They were based on the mathematical incompatibility between certain visual computations and empirical measurements reported in our paper.
  Me: What visual computations?
  Authors: (It should also be noted that we also described a model consistent with our measurements: The Markovian Subsampler. However, that model was ignored by the commentator.)
  Me: Ad hoc models will usually be consistent with the data, that's their purpose. This doesn't make them relevant.
  Authors: Our conclusions follow logically from our measurements of efficiency.
  Me: Conclusions with respect to what?
  Authors: These measurements do not rest on any unproven theoretical constructs. Efficiency is a purely descriptive measure of performance in relation to the available information.
  Me: What does this measure tell us about the “visual computations” involved in the “performance?”
  Authors: Given any sample size N, efficiency is the ratio of M to N, where M is the sample size that the ideal observer would need in order to estimate a statistic with the same precision as a human observer. Since the ideal observer is, by definition, at least as good as any other observer, human decisions are necessarily based on M or more elements.
  Me: Why, as vision scientists, should we care about your measure of performance? What insights does it provide with respect to visual perception and related processes?
  Authors: Estimates of efficiency tell us nothing about visual appearance.
  Me: So the “visual computation” referred to in the first paragraph is not connected to “visual appearance.” What, then, is “visual” about the computation?
  Author: The relationship between appearance and performance would be something interesting to study, but our study was about performance alone.
  Me: The text refers to a “visual computation,” “visual statistics,” "visual information." What is the meaning of the term “visual” if not “pertaining to appearance?” Furthermore, your abstract clearly implies that you are interested in appearance, e.g. when you say that "With orientation...it is relatively safe to use an item's physical value as an approximation for its average perceived value." Your stimuli are contrasted with those used in studies where "observers are asked to make decisions on perceived average size," the idea being that in these circumstances the percept is not as reliable as in the case of orientation.
  Authors: Our observers were asked to estimate expected values, and they were pressed for time. Inferring the minimum sample sizes used by our observers is a purely mathematical exercise.
  Me: How is this exercise of interest to readers of a journal on visual perception?
  Authors: On average (i.e. across observers) we found that this minimum increased from approximately 2 to approximately 3 (average efficiencies increased from approximately 2/8 to approximately 3/8), as we eased the time constraint by providing longer presentations. Of course these numbers are subject to measurement error, so we performed statistical tests to see whether the 3 was significantly greater than the 2 and whether the 2 was significantly greater than 1. Both differences proved significant.
  Me: You might get similar results asking viewers the following question: Is the average of the numbers to the right of the colon smaller or larger than the number to the left of the colon.
  8: 5, 11, 3, 9, 13, 2, 9, 6.
  What would this tell us about visual perception (that we didn't already know)?
  Authors: Our conclusion against a purely parallel computation is valid, because our data unequivocally support an increase in efficiency with time.
  Me: Most tasks are easier given more time. However, saying that I'm able to perform better on a task given more time doesn't actually make me more efficient at the task, in the ordinary meaning of the term. If one becomes more efficient at a task, they can do it better in the same amount of time. I think at best you're measuring improvement in “accuracy.” People's answers become more accurate given more time, especially when some durations are very brief. This is a pretty sure thing regardless of the task. How did your experimental conditions give such a finding added value for visual perception?
  References to “parallel computation” are empty of content if, as you said above, you're not interested in process, but only in performance. “Computation” refers to process. There are obviously different types of processes involved in the task, and all mental processes involve both serial and parallel neural processes. So unless you're more specific about the type of computations that you're referring to, and how your method allowed you to isolate them, you aren't saying anything interesting.
  Authors: However, as noted in the published paper, our conclusion against a purely serial computation isn’t as strong. It is based on the second of the aforementioned significant differences, but there remains the possibility that our most stringent time constraint (0.1 s) wasn’t sufficiently stringent.
  Me: a) Why didn't you make conditions sufficiently stringent to achieve your goals? b) Similarly to what I said above, the phrase “a purely serial computation” is lumping observers' experience and mental activity and decision-making strategy together into an undifferentiated and wholly uninformative reference to “a computation.”
  All in all, you seem to be saying that you devised a task that observers are bad at (as ascertained by your special measure but as would be obvious using any measure) and do better at if given more time (as would be expected for most tasks), and that you don't care how they were doing it or how it aids in the understanding of visual perception.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
2. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2016 Mar 01, Joshua A Solomon commented:
  
  Our conclusions do not depend on any model, falsifiable or otherwise. They were based on the mathematical incompatibility between certain visual computations and empirical measurements reported in our paper. (It should also be noted that we also described a model consistent with our measurements: The Markovian Subsampler. However, that model was ignored by the commentator.)
  Our conclusions follow logically from our measurements of efficiency. These measurements do not rest on any unproven theoretical constructs. Efficiency is a purely descriptive measure of performance in relation to the available information. Given any sample size N, efficiency is the ratio of M to N, where M is the sample size that the ideal observer would need in order to estimate a statistic with the same precision as a human observer. Since the ideal observer is, by definition, at least as good as any other observer, human decisions are necessarily based on M or more elements.
  Estimates of efficiency tell us nothing about visual appearance. The relationship between appearance and performance would be something interesting to study, but our study was about performance alone.
  Our observers were asked to estimate expected values, and they were pressed for time. Inferring the minimum sample sizes used by our observers is a purely mathematical exercise. On average (i.e. across observers) we found that this minimum increased from approximately 2 to approximately 3 (average efficiencies increased from approximately 2/8 to approximately 3/8), as we eased the time constraint by providing longer presentations. Of course these numbers are subject to measurement error, so we performed statistical tests to see whether the 3 was significantly greater than the 2 and whether the 2 was significantly greater than 1. Both differences proved significant.
  Our conclusion against a purely parallel computation is valid, because our data unequivocally support an increase in efficiency with time. However, as noted in the published paper, our conclusion against a purely serial computation isn’t as strong. It is based on the second of the aforementioned significant differences, but there remains the possibiility that our most stringent time constraint (0.1 s) wasn’t sufficiently stringent.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
3. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2016 Feb 10, Lydia Maniatis commented:
  
  Part 2 DATA, MODEL, AND MODEL-FITTING
  The authors propose to “quantify how well summary statistics like averages are calculated [using] an Equivalent Noise (Nagaraja, 1964; Pelli, 1990; Dakin, 2001) framework...” (p. 1). The first two references discuss “luminance noise” and contrast thresholds. The mathematical framework and supporting arguments seems chiefly provided by the Pelli, (1990) reference (Dakin, 2001, takes the applicability of the Equivalent Noise paradigm to orientation for granted). However, the conclusion of the Pelli chapter includes the following statements: "It will be important to test the model of Fig. 1.7 [the most general, schematic expression of the proposed model – which nevertheless refers specifically to contrast-squared)]. For gratings in dynamic white noise, [the main prediction of the model] has been confirmed by Pelli (1981), disconfirmed by Kersten (1984) and reconfirmed by Thomas (1985). More work is warranted.” (p. 18).
  Also, Pelli's arguments seem to overlook basic facts of vision, such as the inhibitory mechanisms at the retinal level. Has his model actually been tested in the 25 years since the chapter was written, with respect to contrast, with respect to orientation? Where are the supporting references? (It is worth noting that Pelli seems to be unfamiliar with the special significance of “disconfirmations,” i.e. falsifications, in the testing of scientific hypotheses. Newton's theory has been confirmed many times, and can continue to be confirmed indefinitely, but it stopped being an acceptable theory after the falsification of a necessary prediction).
  Agnostic as to the perceptual abilities, processes or functional mechanisms underlying observer performance (the method confounds perception, attention and cognition), and assuming that a “just-noticeable contrast level” is computationally interchangable with a “just-comparable angle (via "averaging")," the authors proceed to fit the data to a mathematical model.
  From data points at two locations on the x-axis, they construct non-linear curves, which differ significantly from observer to observer. If the curves mean anything at all, they predict performance at intermediate levels of x-axis values - unless we are required to assume a priori that the model makes accurate predictions (in which case it is a metaphysical, not an empirical, model). The problem, as mentioned above, is that there is high inter-observer variability, such that the curves differ significantly from one observer to the next. (I also suspect that there was high intra-observer variability, though this statistic is not reported. ). Thus, a test of the model predictions for intermediate x-values would seem to require that we retest the same observers at new levels of the independent variable. (Why weren't observers tested with at least one more x-value?) I'm not at all sure that the results would confirm the predictions, but even if they did, this is supposed to be a general model. So what if we wanted to test it on new observers at new, intermediate levels of the independent variable? How would the investigators arrive at their predictions for this case?
  If there are no criteria for testing (i.e. potentially rejecting) the model - if any two data points can always be - can ONLY be - fitted post hoc - then this type of model-fitting exercise lies outside the domain of empirical science.
  It is always possible to compare “models” purporting to answer the wrong question, to investigate a nonexistent phenomenon. To use a rough example, we could ask,”Is the Sun's orbit around the Earth more consistent with a circular or an elliptical model?” Using the apparent movements of the Sun in relation to the Earth and other cosmic landmarks, we could compare models and conclude that one of them “better fits” the data or that it "fits the data well" (it's worth noting that the model being fitted here has dozens of free parameters: "The model with the fewest parameters had 55 free parameters" (p. 5)). But this wouldn't amount to a theoretical advance. I think that this is the kind of thing going on here.
  Asking what later turns out to be the wrong question is par for the course in science, and excusable if you have a solid rationale consistent with knowledge at the time. Here, this does not seem to be the case.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
4. PubMedCommonsArchive 02 Jul 2018
  
  in Europe PMC
  
  On 2016 Feb 10, Lydia Maniatis commented:
  
  Part 1 The authors are drawing conclusions about a non-existent “visual computation” using an uncorroborated, unfalsifiable model.
  In this study, Solomon, May and Tyler (2016) investigate how observers arrive at a “statistical summary of visual input” (p. 1) which they refer to as “orientation averaging.” They ask “whether observers...can process the feature content of multiple items in parallel, or...cognitively combine serial estimates from individual items in order to attain an estimate for the desired statistic (in our case, the average orientation of an array of [striped discs].” They propose to quantify “how well summary statistics like average orientation are calculated [using] an Equivalent noise (Nagaraja, 1964; Pelli, 1990; Dakin, 2001).
  There two major problems with such a project. First, the authors offer no theoretical or empirical evidence in support of the notion that observers can or do actually calculate average orientations. On the contrary, studies cited by the authors seem to indicate that they cannot: “Solomon (2010) describes one professional psychophysicist who completed 2,000 trials, yet achieved an effective set size no greater than 1 [i.e. only a single item was “averaged.”]” The authors speculation that this poor result may have been due to the memory challenges involved in that particular study task, but here they claim that the results of the present study falsify this hypothesis.
  Readers may judge for themselves whether they personally possess the ability to estimate the average orientation of the discs in the stimuli presented by Solomon, May and Tyler (2016) by inspecting their Figure 1. Do you perceive an average orientation of the striped discs? If you were asked to decide whether the “average orientation” of the eight surrounding discs is clockwise or counterclockwise to the orientation of the “probe” in the center, how would you go about it?
  The method used by the investigators does not allow a decision as to whether observers are actually averaging anything, or whether they are using a rule of thumb such as: “Look at the first comparison disc, and if it's clockwise say clockwise”; or: “Look at the first disc, and then at a second, and if they're both clockwise say clockwise, otherwise look at a third tie-breaker disc.” Such a strategy is consistent with the authors' conclusions that observers are using a very small subsample of discs (“an effective set size of 2” (p. 6)) to generate their responses.
  Given that there does not seem to be something in our perceptual experience corresponding to an “average orientation of a set of striped discs,” perhaps we are supposed to be dealing with a kind of blindsight, where what feels like guessing yields an uncannily high percentage of correct responses. This also does not seem to be the case, given the “inefficiency” of the performance.
  Interestingly, the authors' own description of the problem, quoted in the first paragraph of this comment, doesn't necessarily imply actual averaging. When I look at Figure 1, I am perceiving multiple items in parallel (simultaneously, at least in experience), but I am still not seeing averages. Likewise, serial attention to the orientation of individual items does not equal generating an average.
  Thus, the potential “mechanisms” the authors claim to be evaluating are not necessarily “averaging mechanisms.” In other words, in stating that it is possible that “the same mechanism is responsible for computing the average orientation of crowded and uncrowded Gabors (p. 6)” the authors may well be referring to a mechanism underlying a process that doesn't exist.
  Given that the task involves perception, attention, and cognition, there can be no question that both “serial and parallel” (neural) processes underlie it, so it's not clear how why the authors suggest either “purely serial” or “purely parallel” as possibilities, especially since they appear to be unconcerned with distinguishing among the various processes/functions that are engaged between the stimulus “input” and the response “output.” Confusingly, however, and in contradiction to their titular conclusions, they acknowledge that: “It is conceivable [that there was enough time, even at the smallest stimulus duration] for a serial mechanism to utilize two items (p. 6)” If the data are to be described as compatible with a “purely serial” mechanism, then we are probably talking about the known-to-be-serial inspection of items via the conscious shifting of attention, which, again, does not necessarily imply any averaging.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
Visit annotations in context

Tags

PubMedCommonsArchive

PMID:26790845

Annotators

PubMedCommonsArchive

URL

europepmc.org/abstract/MED/26790845
Feb 2018
europepmc.org europepmc.org

https://www.ncbi.nlm.nih.gov/pubmed/26790845

4
1. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2016 Feb 10, Lydia Maniatis commented:
  
  Part 1 The authors are drawing conclusions about a non-existent “visual computation” using an uncorroborated, unfalsifiable model.
  In this study, Solomon, May and Tyler (2016) investigate how observers arrive at a “statistical summary of visual input” (p. 1) which they refer to as “orientation averaging.” They ask “whether observers...can process the feature content of multiple items in parallel, or...cognitively combine serial estimates from individual items in order to attain an estimate for the desired statistic (in our case, the average orientation of an array of [striped discs].” They propose to quantify “how well summary statistics like average orientation are calculated [using] an Equivalent noise (Nagaraja, 1964; Pelli, 1990; Dakin, 2001).
  There two major problems with such a project. First, the authors offer no theoretical or empirical evidence in support of the notion that observers can or do actually calculate average orientations. On the contrary, studies cited by the authors seem to indicate that they cannot: “Solomon (2010) describes one professional psychophysicist who completed 2,000 trials, yet achieved an effective set size no greater than 1 [i.e. only a single item was “averaged.”]” The authors speculation that this poor result may have been due to the memory challenges involved in that particular study task, but here they claim that the results of the present study falsify this hypothesis.
  Readers may judge for themselves whether they personally possess the ability to estimate the average orientation of the discs in the stimuli presented by Solomon, May and Tyler (2016) by inspecting their Figure 1. Do you perceive an average orientation of the striped discs? If you were asked to decide whether the “average orientation” of the eight surrounding discs is clockwise or counterclockwise to the orientation of the “probe” in the center, how would you go about it?
  The method used by the investigators does not allow a decision as to whether observers are actually averaging anything, or whether they are using a rule of thumb such as: “Look at the first comparison disc, and if it's clockwise say clockwise”; or: “Look at the first disc, and then at a second, and if they're both clockwise say clockwise, otherwise look at a third tie-breaker disc.” Such a strategy is consistent with the authors' conclusions that observers are using a very small subsample of discs (“an effective set size of 2” (p. 6)) to generate their responses.
  Given that there does not seem to be something in our perceptual experience corresponding to an “average orientation of a set of striped discs,” perhaps we are supposed to be dealing with a kind of blindsight, where what feels like guessing yields an uncannily high percentage of correct responses. This also does not seem to be the case, given the “inefficiency” of the performance.
  Interestingly, the authors' own description of the problem, quoted in the first paragraph of this comment, doesn't necessarily imply actual averaging. When I look at Figure 1, I am perceiving multiple items in parallel (simultaneously, at least in experience), but I am still not seeing averages. Likewise, serial attention to the orientation of individual items does not equal generating an average.
  Thus, the potential “mechanisms” the authors claim to be evaluating are not necessarily “averaging mechanisms.” In other words, in stating that it is possible that “the same mechanism is responsible for computing the average orientation of crowded and uncrowded Gabors (p. 6)” the authors may well be referring to a mechanism underlying a process that doesn't exist.
  Given that the task involves perception, attention, and cognition, there can be no question that both “serial and parallel” (neural) processes underlie it, so it's not clear how why the authors suggest either “purely serial” or “purely parallel” as possibilities, especially since they appear to be unconcerned with distinguishing among the various processes/functions that are engaged between the stimulus “input” and the response “output.” Confusingly, however, and in contradiction to their titular conclusions, they acknowledge that: “It is conceivable [that there was enough time, even at the smallest stimulus duration] for a serial mechanism to utilize two items (p. 6)” If the data are to be described as compatible with a “purely serial” mechanism, then we are probably talking about the known-to-be-serial inspection of items via the conscious shifting of attention, which, again, does not necessarily imply any averaging.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
2. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2016 Feb 10, Lydia Maniatis commented:
  
  Part 2 DATA, MODEL, AND MODEL-FITTING
  The authors propose to “quantify how well summary statistics like averages are calculated [using] an Equivalent Noise (Nagaraja, 1964; Pelli, 1990; Dakin, 2001) framework...” (p. 1). The first two references discuss “luminance noise” and contrast thresholds. The mathematical framework and supporting arguments seems chiefly provided by the Pelli, (1990) reference (Dakin, 2001, takes the applicability of the Equivalent Noise paradigm to orientation for granted). However, the conclusion of the Pelli chapter includes the following statements: "It will be important to test the model of Fig. 1.7 [the most general, schematic expression of the proposed model – which nevertheless refers specifically to contrast-squared)]. For gratings in dynamic white noise, [the main prediction of the model] has been confirmed by Pelli (1981), disconfirmed by Kersten (1984) and reconfirmed by Thomas (1985). More work is warranted.” (p. 18).
  Also, Pelli's arguments seem to overlook basic facts of vision, such as the inhibitory mechanisms at the retinal level. Has his model actually been tested in the 25 years since the chapter was written, with respect to contrast, with respect to orientation? Where are the supporting references? (It is worth noting that Pelli seems to be unfamiliar with the special significance of “disconfirmations,” i.e. falsifications, in the testing of scientific hypotheses. Newton's theory has been confirmed many times, and can continue to be confirmed indefinitely, but it stopped being an acceptable theory after the falsification of a necessary prediction).
  Agnostic as to the perceptual abilities, processes or functional mechanisms underlying observer performance (the method confounds perception, attention and cognition), and assuming that a “just-noticeable contrast level” is computationally interchangable with a “just-comparable angle (via "averaging")," the authors proceed to fit the data to a mathematical model.
  From data points at two locations on the x-axis, they construct non-linear curves, which differ significantly from observer to observer. If the curves mean anything at all, they predict performance at intermediate levels of x-axis values - unless we are required to assume a priori that the model makes accurate predictions (in which case it is a metaphysical, not an empirical, model). The problem, as mentioned above, is that there is high inter-observer variability, such that the curves differ significantly from one observer to the next. (I also suspect that there was high intra-observer variability, though this statistic is not reported. ). Thus, a test of the model predictions for intermediate x-values would seem to require that we retest the same observers at new levels of the independent variable. (Why weren't observers tested with at least one more x-value?) I'm not at all sure that the results would confirm the predictions, but even if they did, this is supposed to be a general model. So what if we wanted to test it on new observers at new, intermediate levels of the independent variable? How would the investigators arrive at their predictions for this case?
  If there are no criteria for testing (i.e. potentially rejecting) the model - if any two data points can always be - can ONLY be - fitted post hoc - then this type of model-fitting exercise lies outside the domain of empirical science.
  It is always possible to compare “models” purporting to answer the wrong question, to investigate a nonexistent phenomenon. To use a rough example, we could ask,”Is the Sun's orbit around the Earth more consistent with a circular or an elliptical model?” Using the apparent movements of the Sun in relation to the Earth and other cosmic landmarks, we could compare models and conclude that one of them “better fits” the data or that it "fits the data well" (it's worth noting that the model being fitted here has dozens of free parameters: "The model with the fewest parameters had 55 free parameters" (p. 5)). But this wouldn't amount to a theoretical advance. I think that this is the kind of thing going on here.
  Asking what later turns out to be the wrong question is par for the course in science, and excusable if you have a solid rationale consistent with knowledge at the time. Here, this does not seem to be the case.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
3. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2016 Mar 01, Joshua A Solomon commented:
  
  Our conclusions do not depend on any model, falsifiable or otherwise. They were based on the mathematical incompatibility between certain visual computations and empirical measurements reported in our paper. (It should also be noted that we also described a model consistent with our measurements: The Markovian Subsampler. However, that model was ignored by the commentator.)
  Our conclusions follow logically from our measurements of efficiency. These measurements do not rest on any unproven theoretical constructs. Efficiency is a purely descriptive measure of performance in relation to the available information. Given any sample size N, efficiency is the ratio of M to N, where M is the sample size that the ideal observer would need in order to estimate a statistic with the same precision as a human observer. Since the ideal observer is, by definition, at least as good as any other observer, human decisions are necessarily based on M or more elements.
  Estimates of efficiency tell us nothing about visual appearance. The relationship between appearance and performance would be something interesting to study, but our study was about performance alone.
  Our observers were asked to estimate expected values, and they were pressed for time. Inferring the minimum sample sizes used by our observers is a purely mathematical exercise. On average (i.e. across observers) we found that this minimum increased from approximately 2 to approximately 3 (average efficiencies increased from approximately 2/8 to approximately 3/8), as we eased the time constraint by providing longer presentations. Of course these numbers are subject to measurement error, so we performed statistical tests to see whether the 3 was significantly greater than the 2 and whether the 2 was significantly greater than 1. Both differences proved significant.
  Our conclusion against a purely parallel computation is valid, because our data unequivocally support an increase in efficiency with time. However, as noted in the published paper, our conclusion against a purely serial computation isn’t as strong. It is based on the second of the aforementioned significant differences, but there remains the possibiility that our most stringent time constraint (0.1 s) wasn’t sufficiently stringent.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
4. PubMedCommonsArchive 09 Feb 2018
  
  in Public
  
  On 2016 Mar 08, Lydia Maniatis commented:
  
  Thanks to the authors for taking the trouble to respond to my comments. However, I feel they're missing the point. Below, I've copy-pasted their reply and broken it up into statements by them and replies by me.
  Authors: Our conclusions do not depend on any model, falsifiable or otherwise. They were based on the mathematical incompatibility between certain visual computations and empirical measurements reported in our paper.
  Me: What visual computations?
  Authors: (It should also be noted that we also described a model consistent with our measurements: The Markovian Subsampler. However, that model was ignored by the commentator.)
  Me: Ad hoc models will usually be consistent with the data, that's their purpose. This doesn't make them relevant.
  Authors: Our conclusions follow logically from our measurements of efficiency.
  Me: Conclusions with respect to what?
  Authors: These measurements do not rest on any unproven theoretical constructs. Efficiency is a purely descriptive measure of performance in relation to the available information.
  Me: What does this measure tell us about the “visual computations” involved in the “performance?”
  Authors: Given any sample size N, efficiency is the ratio of M to N, where M is the sample size that the ideal observer would need in order to estimate a statistic with the same precision as a human observer. Since the ideal observer is, by definition, at least as good as any other observer, human decisions are necessarily based on M or more elements.
  Me: Why, as vision scientists, should we care about your measure of performance? What insights does it provide with respect to visual perception and related processes?
  Authors: Estimates of efficiency tell us nothing about visual appearance.
  Me: So the “visual computation” referred to in the first paragraph is not connected to “visual appearance.” What, then, is “visual” about the computation?
  Author: The relationship between appearance and performance would be something interesting to study, but our study was about performance alone.
  Me: The text refers to a “visual computation,” “visual statistics,” "visual information." What is the meaning of the term “visual” if not “pertaining to appearance?” Furthermore, your abstract clearly implies that you are interested in appearance, e.g. when you say that "With orientation...it is relatively safe to use an item's physical value as an approximation for its average perceived value." Your stimuli are contrasted with those used in studies where "observers are asked to make decisions on perceived average size," the idea being that in these circumstances the percept is not as reliable as in the case of orientation.
  Authors: Our observers were asked to estimate expected values, and they were pressed for time. Inferring the minimum sample sizes used by our observers is a purely mathematical exercise.
  Me: How is this exercise of interest to readers of a journal on visual perception?
  Authors: On average (i.e. across observers) we found that this minimum increased from approximately 2 to approximately 3 (average efficiencies increased from approximately 2/8 to approximately 3/8), as we eased the time constraint by providing longer presentations. Of course these numbers are subject to measurement error, so we performed statistical tests to see whether the 3 was significantly greater than the 2 and whether the 2 was significantly greater than 1. Both differences proved significant.
  Me: You might get similar results asking viewers the following question: Is the average of the numbers to the right of the colon smaller or larger than the number to the left of the colon.
  8: 5, 11, 3, 9, 13, 2, 9, 6.
  What would this tell us about visual perception (that we didn't already know)?
  Authors: Our conclusion against a purely parallel computation is valid, because our data unequivocally support an increase in efficiency with time.
  Me: Most tasks are easier given more time. However, saying that I'm able to perform better on a task given more time doesn't actually make me more efficient at the task, in the ordinary meaning of the term. If one becomes more efficient at a task, they can do it better in the same amount of time. I think at best you're measuring improvement in “accuracy.” People's answers become more accurate given more time, especially when some durations are very brief. This is a pretty sure thing regardless of the task. How did your experimental conditions give such a finding added value for visual perception?
  References to “parallel computation” are empty of content if, as you said above, you're not interested in process, but only in performance. “Computation” refers to process. There are obviously different types of processes involved in the task, and all mental processes involve both serial and parallel neural processes. So unless you're more specific about the type of computations that you're referring to, and how your method allowed you to isolate them, you aren't saying anything interesting.
  Authors: However, as noted in the published paper, our conclusion against a purely serial computation isn’t as strong. It is based on the second of the aforementioned significant differences, but there remains the possibility that our most stringent time constraint (0.1 s) wasn’t sufficiently stringent.
  Me: a) Why didn't you make conditions sufficiently stringent to achieve your goals? b) Similarly to what I said above, the phrase “a purely serial computation” is lumping observers' experience and mental activity and decision-making strategy together into an undifferentiated and wholly uninformative reference to “a computation.”
  All in all, you seem to be saying that you devised a task that observers are bad at (as ascertained by your special measure but as would be obvious using any measure) and do better at if given more time (as would be expected for most tasks), and that you don't care how they were doing it or how it aids in the understanding of visual perception.
  This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
  
  PubMedCommonsArchive PMID:26790845
Visit annotations in context

Tags

PubMedCommonsArchive

PMID:26790845

Annotators

PubMedCommonsArchive

URL

europepmc.org/abstract/MED/26790845

Tags

Annotators

URL

Tags

Annotators

URL