Figure 1: Progress in Robot Spatial Awareness: By 1980 the Stanford Cart had (sometimes, slowly) managed to negotiate obstacle courses by tracking and avoiding the 3D locations of a few dozen object corners in the route ahead. The top panel shows the Cart's view of a room, superimposed with red dots marking points its program has selected and stereoscopically ranged. The consequent 3D map at the right shows the same points, with diagonal stalks indicating height, and a planned obstacle-avoiding path. (Labels were added by hand.) The program updated map and plan each meter of travel. The sparse maps were barely adequate, and blunders occurred every few tens of meters.

The second panel shows a dense 2D grid map of 150 meters of corridor produced in 1993 by a program by Barry Brummitt controlling Carnegie Mellon's Xavier robot via a remote Sparc 2 workstation. The sensor was a ring of sonar rangefinders, whose interpretation was automatically learned. In the map image evidence of occupancy ranges from empty (black) through unknown (grey) to occupied (white). Regular indentations marking doors are evident, also bumps where cans, water coolers, fire extinguishers, poster displays, etc. protrude. The curvature is dead-reckoning error.

The last panel shows work in progress. As with the Cart, the left image is a robot's eye view of a scene. The right image, though resembling a fuzzy photograph, is actually a perspective view of the occupied cells of a 3D map of the scene, built from about 100,000 range measurements extracted from 20 stereoscopic views similar to the one on the left. The grid is 256 cells wide by 256 deep by 128 high, covering 6x6x3 meters. Of the eight million total cells, about 100,000 are occupied. The realistic occupied cell colors are a side effect of a learning process. The shape of the evidence patterns corresponding to stereoscopic range values, among other system parameters, are tuned up automatically to make the best grids. A candidate grid is evaluated by "projecting" colors from the original images onto the grid's occupied cells from the appropriate directions. Each cell in a perfect grid would collect colors from different views of the same thing in real space. Since most objects show the same color from different viewpoints, the various colorings of each single cell would agree with one another. Incorrect extra cells, however, would intercept many disparate background colors from different points of view. Conversely, colors of incorrectly missing cells would be "sprayed" across various background cells, spoiling their uniformity. The learning program tunes the system to minimize total color variance. The maps so far are ragged around the edges, and many promising improvements remain to be tried, but the results are very encouraging nevertheless. Compare the richness of the 3D maps in the first and third panels. Both were produced by processing about 20 stereoscopic image sets, the 1980 result on a 1 MIPS DEC KL-10 mainframe computer with 500 kilobytes of memory, the 2000 result on a 1,000 MIPS Macintosh G4 with 500 megabytes of memory.