1-2hit |
Xianqiang BAO Nong XIAO Yutong LU Zhiguang CHEN
NoSQL systems have become vital components to deliver big data services due to their high horizontal scalability. However, existing NoSQL systems rely on experienced administrators to configure and tune the wide range of configurable parameters for optimized performance. In this work, we present a configuration management framework for NoSQL systems, called xConfig. With xConfig, its users can first identify performance sensitive parameters and capture the tuned parameters for different workloads as configuration policies. Next, based on tuned policies, xConfig can be implemented as the corresponding configuration optimiaztion system for the specific NoSQL system. Also it can be used to analyze the range of configurable parameters that may impact the runtime performance of NoSQL systems. We implement a prototype called HConfig based on HBase, and the parameter tuning strategies for HConfig can generate tuned policies and enable HBase to run much more efficiently on both individual worker node and entire cluster. The massive writing oriented evaluation results show that HBase under write-intensive policies outperforms both the default configuration and some existing configurations while offering significantly higher throughput.
Chenxu WANG Yutong LU Zhiguang CHEN Junnan LI
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.