Reviewer #3:
This manuscript reports results from an eye tracking study of humans walking in natural terrain. These eye movements together with images simultaneously obtained by a head-fixed camera are used to calculate optic flow fields as seen by the retina and as seen by the head-fixed camera. Next, the structure of these flow fields is described. It is noted that this structure is somewhat stable in the retinal image, due to compensatory gaze stabilisation reflexes, but varies wildly in the head-centric image. Then, the authors estimate the focus of expansion in the head-centric flow and argue that it cannot be used for locomotor control, because it also varies wildly during walking. In a second, more theoretical section of the manuscript, they calculate retinal flow for a movement over an artificial ground plane, given the locomotor and eye movements recorded previously. They describe the structure of the retinal flow and compute the distribution of curl and divergence across the retina as well as in a projection onto the ground plane. They argue that curl around the fovea and the location of the maximum of divergence can be used to estimate the direction of walking relative to the direction of gaze and in relation to the ground plane.
I really like the experimental part of the study. However, I see fundamental issues in the theoretical part, in the general framing of the presentation, and in misrepresentations of previous literature.
The simultaneous measurement of head-centric image and gaze with sufficient temporal resolution to calculate retinal flow during natural walking provides a beautiful demonstration of retinal flow fields, and confirms many known aspects of retinal flow. The calculation of head-centric flow from the head camera images provides a compelling, though not unexpected, demonstration that the FOE in head-centric flow is not useful for locomotor control. It is not unexpected since one of the most well-known issues in optic flow is that the FoE is destroyed when self-motion contains rotational components (Regan and Beverley, 1982, Warren and Hannon, 1990, Lappe et al. 1999). Although this is often presented as an issue of eye movements in retinal flow, it applies to all rotations and combinations of rotations that exist on top of any translational motion of the observer. Thus, the oscillatory bounce and sway motion of the head during walking is expected to render any use of the FOE in a head-centric image futile.
Yet, the first part of the manuscript is very much framed as a critique of the idea of a stable FoE in head-centric flow, presuming that this is what previous researchers commonly believed. This argument contains a logical fallacy. Previous research argued that there is no FoE in retinal flow because of eye rotations (e.g. Warren and Hannon, 1990). This does not predict, inversely, that there is an FoE in head-centric flow. In fact, it does not provide any prediction on head-centric flow. The authors often suggest that a stable FoE in head-centric flow is tacitly implied, commonly believed, etc without providing reference. In fact, the only paper I know that specifically proposed a head-centric representation of heading is by van den Berg and Beintema (1997).
Instead, the fundamental problem of heading perception is to estimate self-motion from retinal flow when the self-motion that generates retinal flow combines all kinds of translations and rotations. The present study shows, consistent with much of the prior literature, that the patterns of retinal flow are sufficiently stable and informative to obtain the direction of one's travel in a retinal frame of reference, and, via projection, with respect to the ground plane. This is due to the stabilising gaze reflexes that keep motion small near the fovea and produce (in case of a ground plane) a spiralling pattern of retinal flow. This is well known from theoretical and lab studies (e.g. Warren and Hannon, 1990, Lappe et al., 1998, Niemann et al., 1999, Lappe et al. 1999) and, to repeat, beautifully shown for the natural situation in the present data. The presentation should link back to this work rather than trying to shoot down purported mechanisms that are obviously invalid.
The second part of the manuscript presents a theoretical analysis of the retinal flow for locomotion across a ground plane under gaze stabilisation. This has two components: (a) the structure of the retinal flow and the utility of gaze stabilisation, and (b) ways to recover information about self-motion from the retinal flow. Both aspects have a long history of research that is neglected in the present manuscript. The essential circular structure of the retinal flow during gaze stabilisation is long known (Warren and Hannon, 1990, van den Berg, 1996, Lappe et al., 1998, Lappe et al. 1999). Detailed analyses of the statistical structure of retinal flow during gaze stabilisation have shown the impact and utility of gaze stabilisation (Calow et al., 2004; Calow and Lappe, 2007; Roth and Black, 2007) and provided links to properties of neurons in the visual system (Calow and Lappe, 2008). These studies included simulated motions of the head during walking, as in the current manuscript, and extended to natural scenes other than a simple ground plane.
Given the structure of the retinal flow during gaze stabilisation the central question is how to recover information about self-motion from it. The authors investigate a proposal originally made by Koenderink and van Doorn (1976; 1984) that relies on estimates of curl and divergence in the visual field. They propose that locomotor heading may be determined directly in retinotopic coordinates (l. 314). This is true, but it fails to mention that other models of heading perception during gaze stabilisation similarly determine heading in retinotopic coordinates (e.g. Lappe and Rauschecker, 1993; Perrone and Stone, 1994; Royden, 1997). In fact, as outlined above, the mathematical problem of self-motion estimation is typically presented in retinal (or camera) coordinates (e.g. Longuet-Higgins and Prazdny, 1980). The problem with the divergence model in comparison to the other models above is threefold. First, it really only works for a plane, not in other environments. Second, it requires a local estimate of divergence at each position in the visual field. The alternative models above combine information across the visual field and are therefore much more robust against noise in the flow. One would need to see whether the estimate of the divergence distribution is sufficient to work with the natural flow fields. Third, being a local measure it requires a dense flow field while heading estimation from retinal flow is known to work with sparse flow fields (Warren and Hannon, 1990). Thus, the theoretical part of the manuscript should either provide proof that the maximum of divergence is superior to these other models or broaden the view to include these models as possibilities to estimate self motion from retinal flow.
The case is similar for the use of curl. It is true that the rotational or spiral pattern around the fovea in retinal flow provides information about the direction of self motion with respect to the direction of gaze, as has been noted many times before. This structure is used by many models of heading estimation. However, curl is, like divergence, a local property and thus not as robust as models that use the entire flow field. It may be interesting to note that neurons in optic flow responsive areas of the monkey brain can pick up this rotational pattern and respond to it in consistency with their preference for self-motion across a plane (Bremmer et al., 2010; Kaminiarz et al. 2014).
I think what the authors may want to draw more attention to is the dynamics of the retinal flow and the associated self-motion in retinal (or plane projection) coordinates. The movies provide compelling illustrations of how the direction of heading (or the divergence maximum, if you want to focus on that) sways back and forth on the retina and on the plane with each step. This requires that the analysis of retinal flow (and the estimation of self-motion) has to be fast and dynamic, or maybe should include some form of temporal prediction or filtering. Work on the dynamics of retinal flow perception has indeed shown that heading estimation can work with very brief flow fields (Bremmer et al. 2017), that the brain focuses on instantaneous flow fields (Paolini et al. 2000) and that short presentations sometime provide better heading estimates than long presentations (Grigo and Lappe, 1999). The temporal dynamics of retinal flow is an underappreciated problem that could be more in the focus of the present study.
Additional specific comments:
Footnote on page 2: It is not only VOR but also OKN (Lappe et al., 1998, Niemann et al., 1999) that stabilises gaze in optic flow fields.
Line 55: Natural translation and acceleration patterns of the head have been considered by (Cutting et al., 1992; Palmisano et al. 2000; Calow and Lappe, 2007, 2008; Bossard et al., 2016)
Line 59: The statement is misleading that the key assumption behind work on the rotation problem is that the removal of the rotational component of flow will return a translational flow field with a stable FoE. Only one class of models, those using differential motion parallax (Rieger and Lawton, 1985, Royden, 1997) explicitly constructs a translational flow field and aims to locate the FoE in that field. Other models (Koenderink and van Doorn, 1976, 1984; Lappe and Rauschecker, 1993; Perrone and Stone, 1994) do not subtract the rotation but estimate heading in retinal coordinates from the combined retinal flow. This also applies to line 109.
Last paragraph on page 5: Measures of eye movement during walking in natural terrain were also taken by Calow and Lappe (2008) and 't Hart and Einhäuser (2012).
Lines 140 to 163: This paragraph is problematic and misleading as pointed out before.
Line 193: The lack of stability is expected, as outlined above. The use of a straight line motion in psychophysical experiments reflects an experimental choice to investigate the rotation problem in retinal flow, not an implicit assumption that bodily motion is usually along a straight line.
Line 200: That gaze stabilization may be an important component in understanding the use of optic flow patterns has also long been assumed (Lappe and Rauschecker, 1993; 1994; 1995; Perrone and Stone, 1994; Glennerster et al. 2001; Angelaki and Hess, 2005; Pauwels et al., 2007).
Line 314: Locomotor heading may be determined directly in retinotopic coordinates. Yes, and this is precisely what the above mentioned models do.
Line 334: What is meant by "robust" here? The videos seem to show simulated flow for a ground plane, not the real flow from any of the terrains. It is not clear whether the features can be extracted from the real terrain retinal flow.
First paragraph on page 15: This is an important discussion about the dynamics of retinal flow in conjunction with the dynamics of the gait cycle. It should be expanded and better balanced with respect to previous work and other models. It is true that any simple inference of an FoE would not work. However, models that estimate heading (not FoE) in the retinal reference frame would be consistent with the discussion. Oscillations of the head during walking affect the location of the divergence maximum and curl as much as the direction of heading in retinal coordinates. In fact, the videos nicely show how these variables oscillate with each step. This applies to all retinal flow analyses, and is a problem for any model. It requires a dynamical analysis.
The speed of neural computations is an issue, of course, but it applies to divergence and curl in the same way as to other models. There is some indication, however, that neural computations on optic flow are fast, deal with instantaneous flow fields, and respond consistently to natural (spiral) retinal flow, as described above.
Line 393: This paragraph is misleading in suggesting that naturally occurring flow fields have not been used in psychophysical and electrophysiological experiments.
Line 516: This has been done by Bremmer et al. (2010) and Kaminiarz et al. (2014). Their results are consistent with computing heading directly in a retinal reference frame as predicted by several models of retinal flow analysis (e.g. Lappe et al. 1999).
References:
Angelaki, D. E. and Hess, B. J. M. (2005). Self-motion-induced eye movements: effects an visual acuity and navigation. Nat. Rev. Neurosci., 6:966-976.
Bossard, M., Goulon, C., and Mestre, D. R. (2016). Viewpoint oscillation improves the perception of distance travelled based on optic flow. J Vis, 16(15):4.
Bremmer, F., Kubischik, M., Pekel, M., Hoffmann, K. P., and Lappe, M. (2010). Visual selectivity for heading in monkey area MST. Exp. Brain Res., 200(1):51-60.
Calow, D., Krüger, N., Wörgötter, F., and Lappe, M. (2004). Statistics of optic flow for self-motion through natural scenes. In Ilg, U., Bülthoff, H. H., and Mallot, H. A., editors, Dynamic Perception, Workshop of the GI Section 'Computer Vision', pages 133-138, Berlin. Akademische Verlagsgesellschaft Aka GmbH.
Calow, D. and Lappe, M. (2007). Local statistics of retinal optic flow for self- motion through natural sceneries. Network, 18(4):343-374.
Calow, D. and Lappe, M. (2008). Efficient encoding of natural optic flow. Network Comput. Neural Syst., 19(3):183-212.
Cutting, J. E., Springer, K., Braren, P. A., and Johnson, S. H. (1992). Wayfinding on foot from information in retinal, not optical, flow. J. Exp. Psychol. Gen., 121(1):41-72.
Grigo, A. and Lappe, M. (1999). Dynamical use of different sources of information in heading judgments from retinal flow. JOSA A, 16(9):2079-2091.
't Hart, B. M. and Einhäuser, W. (2012). Mind the step: complementary effects of an implicit task on eye and head movements in real-life gaze allocation. Exp. Brain Res., 223(2):233-249.
Kaminiarz, A., Schlack, A., Hoffmann, K.-P., Lappe, M., and Bremmer, F. (2014). Visual selectivity for heading in the macaque ventral intraparietal area. J. Neurophys. 112(10):2470-80
Lappe, M., Pekel, M., and Hoffmann, K. P. (1998). Optokinetic eye movements elicited by radial optic flow in the macaque monkey. J. Neurophysiol., 79(3):1461-1480.
Lappe, M. and Rauschecker, J. P. (1993). A neural network for the processing of optic flow from ego-motion in man and higher mammals. Neural Comp., 5(3):374-391.
Lappe, M. and Rauschecker, J. P. (1994). Heading detection from optic flow. Nature, 369(6483):712-713.
Lappe, M. and Rauschecker, J. P. (1995). Motion anisotropies and heading detection. Biol. Cybern., 72(3):261-277.
Niemann, T., Lappe, M., Büscher, A., and Hoffmann, K. P. (1999). Ocular responses to radial optic flow and single accelerated targets in humans. Vision Res., 39(7):1359-1371.
Pauwels, K., Lappe, M., and Hulle, M. M. (2007). Fixation as a mechanism for stabilization of short image sequences. Int. J. Comp. Vis., 72(1):67-78.
Perrone, J. A. and Stone, L. S. (1994). A model of self-motion estimation within primate extrastriate visual cortex. Vision Res., 34(21):2917-2938.
Regan, D. and Beverley, K. I. (1982). How do we avoid confounding the direction we are looking and the direction we are moving? Science, 215:194-196.
Rieger, J. H. and Lawton, D. T. (1985). Processing differential image motion. J. Opt. Soc. Am. A, 2(2):354-360.
Roth, S. and Black, M. J. (2007). On the spatial statistics of optical flow. Int. J. Comp. Vis., 74(1):33-50.
Royden, C. S. (1997). Mathematical analysis of motion-opponent mechanisms used in the determination of heading and depth. J. Opt. Soc. Am. A, 14(9):2128-2143.
van den Berg, A. V. (1996). Judgements of heading. Vision Res., 36(15):2337-2350.
van den Berg, A. V. and Beintema, J. A. (1997). Motion templates with eye velocity gain fields for transformation of retinal to head centric flow. NeuroReport, 8(4):835-840.