The Toyota Research Institute (TRI) announced the acceptance of six research papers in the field of machine learning at the International Conference on Computer Vision (ICCV). The research advances understanding across various tasks crucial for robotic perception, including semantic segmentation, 3D object detection and multi-object tracking.
Over the last six years, TRI’s researchers have made significant strides in robotics, automated driving and materials science in large part due to machine learning—the application of computer algorithms that constantly improve with experience and data.
Machine learning is the foundation of our research. We are working to create scientific breakthroughs in the discipline of machine learning itself and then apply those breakthroughs to accelerate discoveries in robotics, automated driving, and battery testing and development.—Dr. Gill Pratt, CEO of TRI
As the International Conference on Computer Vision (ICCV) started, TRI shared six papers demonstrating TRI’s robust research in machine learning, including geometric deep learning for 3D vision, self-supervised learning and simulation to real or “sim-to-real” transfer.
Within the field of machine learning, scalable supervision is our focus. It is impossible to manually label everything you need at Toyota’s scale, yet this is the state-of-the-art approach, especially for Deep Learning and Computer Vision. Thankfully, we can leverage Toyota’s domain expertise in vehicles, robots or batteries to invent alternative forms of scalable supervision, whether via simulation or self-supervised learning from raw data. This approach can boost performance in a wide array of tasks important for automated cars to be safer everywhere anytime, robots to learn faster and battery development to speed up lengthy testing cycles.—Adrien Gaidon, head of TRI’s Machine Learning team
In the six papers accepted at ICCV, TRI researchers report several key findings. Notably, they show that geometric self-supervised learning significantly improves sim-to-real transfer for scene understanding. The resulting unsupervised domain adaptation algorithm enables recognizing real-world categories without requiring any expensive manual real-world labels.
In addition, TRI’s research on multi-object tracking reveals that synthetic data could endow machines with fundamental human cognitive abilities, like object permanence, that are historically hard for machine learning models but second nature for humans. This new development increases the robustness of computer vision algorithms, making them more aligned with people’s visual common sense.
Finally, TRI’s research on pseudo-lidar shows that large-scale self-supervised pre-training considerably boosts performance of image-based 3D object detectors. The proposed geometric pre-training enables training powerful 3D Deep Learning models from limited 3D labels, which are expensive or sometimes impossible to get from images alone.