Researchers at Carnegie Mellon University have shown that they can significantly improve the accuracy of autonomous vehicles’ detection of objects such as other cars or pedestrians by helping the vehicle also recognize what it doesn’t see.
CMU research shows that what a self-driving car doesn’t see (in green) is as important to navigation as what it actually sees (in red). Credit: Carnegie Mellon University
That objects in your sight may obscure your view of things that lie further ahead is obvious to people. But Peiyun Hu, a Ph.D. student in CMU’s Robotics Institute, said that’s not how self-driving cars typically reason about objects around them.
Rather, they use 3D data from lidar to represent objects as a point cloud and then try to match those point clouds to a library of 3D representations of objects. The problem, Hu said, is that the 3D data from the vehicle’s lidar isn’t really 3D—the sensor can’t see the occluded parts of an object, and current algorithms don’t reason about such occlusions.
Hu’s work enables a self-driving car’s perception systems to consider visibility as it reasons about what its sensors are seeing. Reasoning about visibility is already used when companies build digital maps.
Map-building fundamentally reasons about what’s empty space and what’s occupied. But that doesn’t always occur for live, on-the-fly processing of obstacles moving at traffic speeds.—Deva Ramanan, an associate professor of robotics and director of the CMU Argo AI Center for Autonomous Vehicle Research
In research to be presented at the Computer Vision and Pattern Recognition (CVPR) conference, being held virtually 13–19 June, Hu and his colleagues borrow techniques from map-making to help the system reason about visibility when trying to recognize objects.
When tested against a standard benchmark, the CMU method outperformed the previous top-performing technique, improving detection by 10.7% for cars, 5.3% for pedestrians, 7.4% for trucks, 18.4% for buses and 16.7% for trailers.
One reason previous systems may not have taken visibility into account is a concern about computation time. But Hu said his team found that was not a problem: their method takes just 24 milliseconds to run. (For comparison, each sweep of the lidar is 100 milliseconds.)
In addition to Hu and Ramanan, the research team included Jason Ziglar of Argo AI and David Held, assistant professor of robotics. The Argo AI Center supported this research.