BMW Group publishes SORDI, largest open-source dataset for super-efficient AI applications in production
The BMW Group is publishing the world’s largest data set to streamline and accelerate significantly the training of artificial intelligence in production. The synthesized AI dataset—known as SORDI (Synthetic Object Recognition Dataset for Industries)—consists of more than 800,000 photorealistic images. These are divided into 80 categories of production resources, from pallets and pallet cages to forklifts, and include objects of particular relevance to the core technologies of automotive engineering and logistics.
The AI dataset SORDI—consisting of more than 800,000 photorealistic images—includes objects of particular relevance to the core technologies of automotive engineering and logistics.
By publishing SORDI, the BMW Group together with its partners Microsoft, NVIDIA and idealworks is making available the world’s largest reference dataset for artificial intelligence in the field of manufacturing.
The visual data is of particularly high quality, and the integrated digital labels enable basic image processing tasks to be carried out, such as classification, object detection or segmentation for relevant areas of production in general.
The BMW Group has been using artificial intelligence since 2019. AI has already been utilized in various quality assurance applications in production at the plants. SORDI, the new, synthetic dataset makes AI models much faster to train and AI considerably more cost-efficient in production.—Michele Melchiorre, Senior Vice President of BMW Group Production System, Planning, Tool and Plant Engineering
To create the synthesized AI training data non-manually, the simulated environment for robotics, the digital twin of the production system and the AI training environment were all fused within the NVIDIA Omniverse. NVIDIA Omniverse is a scalable, multi-GPU real-time reference development platform for 3D simulation and design collaboration, based on Pixar’s Universal Scene Description and NVIDIA RTX technology.
The fundamental representation of assets in Omniverse is Pixar’s open-source Universal Scene Description (USD), a powerful scene representation and interchange framework that enables complex property inheritance, instancing, layering, lazy loading, and a wide variety of other key features.
The rendering pipeline from the BMW Tech Office in Munich allows any number of photos, including labels, to be synthesized in sufficient photorealistic HD quality for them to be used in the creation of highly robust AI models.
SORDI can be utilized by IT professionals to develop and tailor AI solutions for manufacturing, and by production employees to maintain mature AI systems for validation purposes ready for the start of production.
Freely available to software developers, the publication of the dataset represents the next targeted step in the BMW Group’s systematic expansion of activities to democratize artificial intelligence. The publications of no-code AI and SORDI complement each other: on the one hand, the BMW Labelling Tool Lite and published AI training tools explicitly allow users to use AI intuitively, even if they lack sound IT expertise. On the other, SORDI’s synthesis significantly accelerates and simplifies the training of AI models for production applications.