LeddarTech to showcase first 3D solid-state LiDAR IC for autonomous driving at CES 2018
Honda installs new bi-directional charging technology at European R&D center

IBM scientists demonstrate 10x faster large-scale machine learning using GPUs

Scientists at IBM Research, together with their colleagues at EPFL, have developed a scheme for training big data sets quickly. It can process a 30 Gigabyte training dataset in less than one minute using a single graphics processing unit (GPU)—a 10x speedup over existing methods for limited memory training. The results, which efficiently utilize the full potential of the GPU, are being presented at the 2017 NIPS Conference in Long Beach, California.

Although specialized hardware devices such as GPUs have been gaining traction in many fields for accelerating compute-intensive workloads, it’s difficult to extend this to very data-intensive workloads. To exploit the massive compute power of GPUs, data needs to be stored inside the GPU memory in order to access and process it. However, GPUs have a limited memory capacity (currently up to 16GB) so this is not practical for very large data.

The IBM team set out to create a technique that determines which smaller part of the data is most important to the training algorithm at any given time. For most datasets of interest, the importance of each data-point to the training algorithm is highly non-uniform, and also changes during the training process. Processing the data-points in the right order enables faster learning.

The researchers developed a new, re-useable component for training machine learning models on heterogeneous compute platforms: Duality-gap based Heterogeneous Learning (DuHL). In addition to an application involving GPUs, the scheme can be applied to other limited memory accelerators (e.g. systems that use FPGAs instead of GPUs) and has many applications, including large data sets from social media and online marketing, which can be used to predict which ads to show users. Additional applications include finding patterns in telecom data and for fraud detection.

DuHL in action for the application of training large-scale Support Vector Machines on an extended, 30GB version of the ImageNet database. NVIDIA Quadro M4000 GPU with 8GB of memory. The scheme that uses sequential batching actually performs worse than the CPU alone, whereas the new approach using DuHL achieves a 10x speed-up over the CPU. Source: IBM. Click to enlarge.

IBM’s goal to offer DuHL as a service in the cloud.



The comments to this entry are closed.