1-3hit |
Chenxu WANG Yutong LU Zhiguang CHEN Junnan LI
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
Tao QIN Wei LI Chenxu WANG Xingjun ZHANG
With the ever-growing prevalence of web 2.0, users can access information and resources easily and ubiquitously. It becomes increasingly important to understand the characteristics of user's complex behavior for efficient network management and security monitoring. In this paper, we develop a novel method to visualize and measure user's web-communication-behavior character in large-scale networks. First, we employ the active and passive monitoring methods to collect more than 20,000 IP addresses providing web services, which are divided into 12 types according to the content they provide, e.g. News, music, movie and etc, and then the IP address library is established with elements as (servicetype, IPaddress). User's behaviors are complex as they stay in multiple service types during any specific time period, we propose the behavior spectrum to model this kind of behavior characteristics in an easily understandable way. Secondly, two kinds of user's behavior characters are analyzed: the character at particular time instants and the dynamic changing characters among continuous time points. We then employ Renyi cross entropy to classify the users into different groups with the expectation that users in the same groups have similar behavior profiles. Finally, we demonstrated the application of behavior spectrum in profiling network traffic patterns and finding illegal users. The efficiency and correctness of the proposed methods are verified by the experimental results using the actual traffic traces collected from the Northwest Regional Center of China Education and Research Network (CERNET).
Chenxu WANG Hideki KAWAGUCHI Kota WATANABE
An approach to dedicated computers is discussed in this study as a possibility for portable, low-cost, and low-power consumption high-performance computing technologies. Particularly, dataflow architecture dedicated computer of the finite integration technique (FIT) for 2D magnetostatic field simulation is considered for use in industrial applications. The dataflow architecture circuit of the BiCG-Stab matrix solver of the FIT matrix calculation is designed by the very high-speed integrated circuit hardware description language (VHDL). The operation of the dedicated computer's designed circuit is considered by VHDL logic circuit simulation.