By Geoff Walker
The question of how you know how far away something is may not grab your attention in everyday life, but it’s very important to engineers trying to create perceptually correct 3D displays. Dr. Kurt Akeley from Lytro spent the first third of his Monday Seminar, Stereo 3D, Light Fields, and Perception, on this question.
The answer that’s commonly assumed is “binocular depth cues,” including vergence (rotation of the eyes towards a fixation point), accommodation (adjustment of the focal length of the lens in the eye to match the fixation distance), retinal disparity (the out-of-focus retinal images of objects closer or further away from the fixation point), and binocular parallax (the difference in the images sensed by each eye).
Everybody automatically uses retinal disparity (also known as stereopsis) as a depth cue without thinking about it. When you look at an object at some distance away, the relative blurriness of objects closer and further away gives your vision system a “context” that helps it judge where the object is in space.
Image blur also affects perceived scale. In the left-hand photo below, the city looks normal. However, in the right-hand photo, the background and foreground have been blurred. Since we tend to assume that blurred objects are close to us, the city suddenly looks like a miniature model.
While binocular depth cues are important, and depth-sensing can be achieved using binocular parallax even if all other depth cues are eliminated, there are many other depth cues. Some of the others include the following:
· Retinal Image Size: Since you know that people are generally between five and six feet tall, your brain compares the sensed size of the people on the hiking trail in the photo below with what you know, producing an automatic estimate of how far away they are. This is also an example of the fact that most depth cues involve a “prior” – that is, some prior knowledge of something that’s related.
· Texture Gradient: Because we tend to assume that a texture gradient is uniform (another prior), a difference in texture such as in the two sets of rocks circled in the photo above also becomes a depth cue.
· Lighting: The illumination of an object with light produces shades and shadows; this can make it easier to identify an object, which in turn can make it easier to determine how far away it is. If multiple objects are involved, since we know that an object that casts a shadow on another object is closer to the light source, we can determine which object is closer.
· Linear Perspective: Depending on the prior, a scene shown in linear perspective can be identified as something with depth (e.g., a straight and level road going to a point at the horizon) or an abstract grouping of shapes and lines that provide no depth cues.
· Aerial perspective: People who live in mountainous regions are used to judging distance based on the haziness of distant mountains (the haze is caused by small water and dust particles in the air). The further away the distant mountains are, the hazier they look. However, if these people go to a different mountainous region with a different amount of haze, their prior no longer works.
· Motion Parallax: We tend to judge an object’s distance based on how quickly it moves. The closer an object is to us, the quicker it appears to move; the further an object is from us, the slower it appears to move. Because objects that are further away stay in our visual fields longer, we perceive the objects that are further away as moving slower. This is a monocular depth cue, that is, it is perceivable through the use of one eye.
· Monocular Movement Parallax: Closing one of our eyes and moving our head produces a depth cue because the human visual system can extract depth data from two similar images sensed serially in the same way that it can combine two images from different eyes.
· Occlusion: When one object partially blocks the view of another, we instinctively know which object is in front, i.e., which is closer. In one of his demos, Dr. Akeley showed the audience a short 3D video of three rectangular objects rotating around each other in space and asked the audience to identify what they were seeing. In the first viewing, occlusion was turned off in the graphics software so that one object never blocked the view of another. In this situation it was extremely difficult to determine what we were seeing. When Dr. Akeley turned on occlusion, it was immediately obvious what the objects were. Interestingly, even though we then knew what the objects were, when occlusion was turned off, it once again became very difficult to identify them. The human visual system expects and uses occlusion as a fundamental depth cue – in fact, occlusion is believed to be the strongest depth cue of all, even stronger than binocular effects.