By
Geoff Walker
The question of how you
know how far away something is may not grab your attention in everyday life,
but it’s very important to engineers trying to create perceptually correct 3D displays.
Dr. Kurt Akeley from Lytro spent the first third of his Monday Seminar, Stereo 3D, Light Fields, and Perception, on
this question.
The answer that’s
commonly assumed is “binocular depth cues,” including vergence (rotation of the
eyes towards a fixation point), accommodation (adjustment of the focal length
of the lens in the eye to match the fixation distance), retinal disparity (the
out-of-focus retinal images of objects closer or further away from the fixation
point), and binocular parallax (the difference in the images sensed by each
eye).
Everybody automatically
uses retinal disparity (also known as stereopsis) as a depth cue without
thinking about it. When you look at an
object at some distance away, the relative blurriness of objects closer and
further away gives your vision system a “context” that helps it judge where the
object is in space.
Image blur also affects
perceived scale. In the left-hand photo
below, the city looks normal. However,
in the right-hand photo, the background and foreground have been blurred. Since we tend to assume that blurred objects
are close to us, the city suddenly looks like a miniature model.
While binocular depth
cues are important, and depth-sensing can be achieved using binocular parallax
even if all other depth cues are eliminated, there are many other depth cues. Some of the others include the following:
· Retinal Image Size:
Since you know that people are generally between five and six feet tall,
your brain compares the sensed size of the people on the hiking trail in the
photo below with what you know, producing an automatic estimate of how far away
they are. This is also an example of the
fact that most depth cues involve a “prior” – that is, some prior knowledge of
something that’s related.
· Texture Gradient: Because we tend to assume that a texture
gradient is uniform (another prior), a difference in texture such as in the two
sets of rocks circled in the photo above also becomes a depth cue.
· Lighting:
The illumination of an object with light produces shades and shadows;
this can make it easier to identify an object, which in turn can make it easier
to determine how far away it is. If multiple objects are involved, since we
know that an object that casts a shadow on another object is closer to the
light source, we can determine which object is closer.
· Linear Perspective:
Depending on the prior, a scene shown in linear perspective can be
identified as something with depth (e.g., a straight and level road going to a
point at the horizon) or an abstract grouping of shapes and lines that provide
no depth cues.
· Aerial perspective:
People who live in mountainous regions are used to judging distance
based on the haziness of distant mountains (the haze is caused by small water
and dust particles in the air). The further away the distant mountains are, the
hazier they look. However, if these people go to a different mountainous region
with a different amount of haze, their prior no longer works.
· Motion Parallax:
We tend to judge an object’s distance based on how quickly it moves. The
closer an object is to us, the quicker it appears to move; the further an
object is from us, the slower it appears to move. Because objects that are
further away stay in our visual fields longer, we perceive the objects that are
further away as moving slower. This is a monocular depth cue, that is, it is perceivable
through the use of one eye.
· Monocular Movement Parallax:
Closing one of our eyes and moving our head produces a depth cue because
the human visual system can extract depth data from two similar images sensed
serially in the same way that it can combine two images from different eyes.
· Occlusion:
When one object partially blocks the view of another, we instinctively
know which object is in front, i.e., which is closer. In one of his demos, Dr.
Akeley showed the audience a short 3D video of three rectangular objects rotating
around each other in space and asked the audience to identify what they were
seeing. In the first viewing, occlusion was turned off in the graphics software
so that one object never blocked the view of another. In this situation it was extremely difficult
to determine what we were seeing. When Dr. Akeley turned on occlusion, it was
immediately obvious what the objects were. Interestingly, even though we then
knew what the objects were, when occlusion was turned off, it once again became
very difficult to identify them. The human
visual system expects and uses occlusion as a fundamental depth cue – in fact,
occlusion is believed to be the strongest depth cue of all, even stronger than
binocular effects.