by Bill Cooke
|VisLab 3D modeled view (foreground), bird’s-eye view (right) and photograph of scenario (back). Click to enlarge.|
As inventors across the world work to make autonomous driving a reality, one of the most basic problems is for the vehicle to perceive its surroundings. Google’s vehicles rely on a lidar (laser radar) system. Dr. Alberto Broggi of the University of Parma and a spinoff company, VisLab, in Northern Italy believes 3D computer vision is an affordable and aesthetically acceptable way to capture much of the same data.
A pioneer of machine vision applied to driverless cars and unmanned vehicles, Dr. Broggi is the principal investigator of multiple projects involving autonomous vehicles, such as the ARGO prototype vehicle, the TerraMax entry at the DARPA Grand Challenge and Urban Challenge, and BRAiVE. Under his leadership VisLab organized the first intercontinental driverless trip in history: VIAC - VisLab Intercontinental Autonomous Challenge. VisLab is involved in basic and applied research, developing machine vision algorithms and intelligent systems for different applications, primarily for the automotive field.
The DARPA Grand Challenge for autonomous vehicles in 2005 dealt with an extreme environment (desert of the American Southwest) but only static objects were encountered. When two vehicles interacted, the slower vehicle was stopped so it could be treated as a static object.
The DARPA Urban Challenge of 2007 dealt with on-road scenarios but the autonomous vehicles had to interact with moving traffic. Traffic was simulated by 50 professional drivers who behaved rationally and followed the appropriate rules.
Their behavior allowed testing driverless vehicles in conjunction with vehicles whose trajectory was predictable by common sense rules, Dr. Broggi said, noting that over the recent history, the field of autonomous driving has come up with three representative cases:
Case 1: Fast autonomous driving in a static and predictable environment. An example would be a vehicle doing a timed lap on an empty racetrack.
Case 2: Autonomous driving in a complex known environment. Vehicles are programmed to drive a specific set of routes and are equipped with detailed maps, GPS and inertial systems. “The Google system has really precise maps,” he notes. Google’s self driving cars are an example.
Case 3: Autonomous driving in extreme and unknown environments. In 2010 VisLab undertook a drive from Parma, Italy to Shanghai, China using four autonomous test vehicles and a caravan of support vehicles. “It focused on perception, control of the vehicle, trajectory planning and route planning. We continuously improved the system during the trip,” Broggi said. The trip took 3 months and resulted in 25 Terabytes of data.
VisLab has a more affordable solution for 3D that is more easily integrated into the vehicle’s design (Parma is in Northern Italy and they take aesthetics seriously). “Our approach is based upon low cost and highly integrated sensors,” according to Dr. Broggi. He does admit the VISLAB solution achieves lower performance than the lidar system but they expect performance to increase as the image processing techniques that they are developing progress.
Broggi notes that “A lidar system (Velodyne HDL-64E S2) spinning at 10 Hz can process 2.5 million distance estimations per second...the resolution is very good up to 60m and still effective up to 120m...the sensor needs to be mounted on the roof of the car, it is not something that can be avoided...At this point, the cost is very expensive, perhaps twice the cost of the car.”
In June 2012 USA Today reported that a prototype lidar system such as Google’s cost $70,000 but German supplier Ibeo was planning on offering systems for $250/vehicle in 2014 at automotive volumes. The $250 system only scans around 100 degrees, while the Velodyne scans 360 degrees all around the vehicle.
On the other hand, two stereo cameras (1024 x 768 pixels) can process 5 million distance estimations per second at 10 Hz. “As an indirect system, the stereo cameras have more noise and a lower accuracy at long distances...For almost every single pixel you have color and distance...if you compare the two streams, one with color and another with distance, you can get the 3D image... the system provides very good processing up to 50m away,” according to Broggi.
The images can be used for the two main tasks. They can be used to calculate the position and velocity of each point and the stereo images are easier to classify than lidar readings and patterns are easier to recognize as well.
Once we create our 3D map, we label each point. If the point belongs to the road we treat it differently than if it belongs to an obstacle. The next step is to take all of the obstacles together and create objects and then track the objects. You can even track very thin objects like poles on the side of the road. Detection, labeling and then tracking. …All of the processing is being done right now on a top of the line PC but we’re moving the processing to an FPGA (Field Programmable Gate Array) and then the PC will be just providing the high level control.
Challenges for 3D vision include night time driving with difficult lighting situations. The field of view and measuring precision are lower for a stereo system vs. LIDAR but can create resolution challenges but results are promising so far. Advantages include a lower cost, a cleaner integration aesthetically, improved robustness because there are no moving parts. There is also no interference with many systems working closely together.—Alberto Broggi