An efficient algorithm for data parallelism based on stochastic optimization

Khalid Abdulaziz Alnowibet, The American University in Cairo (AUC)
Imran Khan
Karam M. Sallam
Ali Wagdy Mohamed, The American University in Cairo (AUC)

Abstract

Deep neural network models can achieve greater performance in numerous machine learning tasks by raising the depth of the model and the amount of training data samples. However, these essential procedures will proportionally raise the cost of training deep neural network models. Accelerating the training process of deep neural network models in a distributed computing environment has become the most often utilized strategy for developers in order to better cope with a huge quantity of training overhead. The current deep neural network model is the stochastic gradient descent (SGD) technique. It is one of the most widely used training techniques in network models, although it is prone to gradient obsolescence during parallelization, which impacts the overall convergence. The majority of present solutions are geared at high-performance nodes with minor performance changes. Few studies have taken into account the cluster environment in high-performance computing (HPC), where the performance of each node varies substantially. A dynamic batch size stochastic gradient descent approach based on performance-aware technology is suggested to address the aforesaid difficulties (DBS-SGD). By assessing the processing capacity of each node, this method dynamically allocates the minibatch of each node, guaranteeing that the update time of each iteration between nodes is essentially the same, lowering the average gradient of the node. The suggested approach may successfully solve the asynchronous update strategy’s gradient outdated problem. The Mnist and cifar10 are two widely used image classification benchmarks, that are employed as training data sets, and the approach is compared with the asynchronous stochastic gradient descent (ASGD) technique. The experimental findings demonstrate that the proposed algorithm has better performance as compared with existing algorithms.