Scientists at IBM Research, together with their colleagues at EPFL, have developed a scheme for training big data sets quickly. It can process a 30 Gigabyte training dataset in less than one minute using a single graphics processing unit (GPU)—a 10x speedup over existing methods for limited memory training. The results, which efficiently utilize the full potential of the GPU, are being presented at the 2017 NIPS Conference in Long Beach, California.
Although specialized hardware devices such as GPUs have been gaining traction in many fields for accelerating compute-intensive workloads, it’s difficult to extend this to very data-intensive workloads. To exploit the massive compute power of GPUs, data needs to be stored inside the GPU memory in order to access and process it. However, GPUs have a limited memory capacity (currently up to 16GB) so this is not practical for very large data.
The IBM team set out to create a technique that determines which smaller part of the data is most important to the training algorithm at any given time. For most datasets of interest, the importance of each data-point to the training algorithm is highly non-uniform, and also changes during the training process. Processing the data-points in the right order enables faster learning.
The researchers developed a new, re-useable component for training machine learning models on heterogeneous compute platforms: Duality-gap based Heterogeneous Learning (DuHL). In addition to an application involving GPUs, the scheme can be applied to other limited memory accelerators (e.g. systems that use FPGAs instead of GPUs) and has many applications, including large data sets from social media and online marketing, which can be used to predict which ads to show users. Additional applications include finding patterns in telecom data and for fraud detection.
IBM’s goal to offer DuHL as a service in the cloud.
C. Dünner, S. Forte, M. Takac, M. Jaggi. 2016. Primal-Dual Rates and Certificates. In Proceedings of the 33rd International Conference on Machine Learning – Volume 48 (ICML 2016).
Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems, Celestine Dünner, Thomas Parnell, Martin Jaggi