Daisuke HIBINO Tomoharu SHIBUYA
Distributed computing is one of the powerful solutions for computational tasks that need the massive size of dataset. Lagrange coded computing (LCC), proposed by Yu et al. [15], realizes private and secure distributed computing under the existence of stragglers, malicious workers, and colluding workers by using an encoding polynomial. Since the encoding polynomial depends on a dataset, it must be updated every arrival of new dataset. Therefore, it is necessary to employ efficient algorithm to construct the encoding polynomial. In this paper, we propose Newton coded computing (NCC) which is based on Newton interpolation to construct the encoding polynomial. Let K, L, and T be the number of data, the length of each data, and the number of colluding workers, respectively. Then, the computational complexity for construction of an encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L(K+T)log (K+T)) for the proposed method. Furthermore, by applying the proposed method, the computational complexity for updating the encoding polynomial is improved from O(L(K+T)log 2(K+T)log log (K+T)) for LCC to O(L) for the proposed method.
Akio KAWABATA Bijoy CHAND CHATTERJEE Eiji OKI
When mission-critical applications are provided over a network, high availability is required in addition to a low delay. This paper proposes a multi-homing network design model, named MHND, that achieves low delay, high availability, and the order guarantee of events. MHND maintains the event occurrence order with a multi-homing configuration using conservative synchronization. We formulate MHND as an integer linear programming problem to minimize the delay. We prove that the distributed server allocation problem with MHND is NP-complete. Numerical results indicate that, as a multi-homing number, which is the number of servers to which each user belongs, increases, the availability increases while increasing the delay. Noteworthy, two or more multi-homing can achieve approximately an order of magnitude higher availability compared to that of conventional single-homing at the expense of a delay increase up to two times. By using MHND, flexible network design is achieved based on the acceptable delay in service and the required availability.
Marcus WALLDEN Stefano MARKIDIS Masao OKITA Fumihiko INO
We propose a novel compositing pipeline and a dynamic load balancing technique for volume rendering which utilizes a two-layered group structure to achieve effective and scalable load balancing. The technique enables each process to render data from non-contiguous regions of the volume with minimal impact on the total render time. We demonstrate the effectiveness of the proposed technique by performing a set of experiments on a modern GPU cluster. The experiments show that using the technique results in up to a 35.7% lower worst-case memory usage as compared to a dynamic k-d tree load balancing technique, whilst simultaneously achieving similar or higher render performance. The proposed technique was also able to lower the amount of transferred data during the load balancing stage by up to 72.2%. The technique has the potential to be used in many scenarios where other dynamic load balancing techniques have proved to be inadequate, such as during large-scale visualization.
Yiqiang SHENG Jinlin WANG Haojiang DENG Chaopeng LI
In this paper, we propose a novel architecture for a deep learning system, named k-degree layer-wise network, to realize efficient geo-distributed computing between Cloud and Internet of Things (IoT). The geo-distributed computing extends Cloud to the geographical verge of the network in the neighbor of IoT. The basic ideas of the proposal include a k-degree constraint and a layer-wise constraint. The k-degree constraint is defined such that the degree of each vertex on the h-th layer is exactly k(h) to extend the existing deep belief networks and control the communication cost. The layer-wise constraint is defined such that the layer-wise degrees are monotonically decreasing in positive direction to gradually reduce the dimension of data. We prove the k-degree layer-wise network is sparse, while a typical deep neural network is dense. In an evaluation on the M-distributed MNIST database, the proposal is superior to a state-of-the-art model in terms of communication cost and learning time with scalability.
Masatoshi KAWARASAKI Hyuma WATANABE
MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.
Jiang LI Yusuke ATSUMARI Hiromasa KUBO Yuichi OGISHIMA Satoru YOKOTA Hakaru TAMUKOH Masatoshi SEKINE
A processing system with multiple field programmable gate array (FPGA) cards is described. Each FPGA card can interconnect using six I/O (up, down, left, right, front, and back) terminals. The communication network among FPGAs is scalable according to user design. When the system operates multi-dimensional applications, transmission efficiency among FPGA improved through user-adjusted dimensionality and network topologies for different applications. We provide a fast and flexible circuit configuration method for FPGAs of a multi-dimensional FPGA array. To demonstrate the effectiveness of the proposed method, we assess performance and power consumption of a circuit that calculated 3D Poisson equations using the finite difference method.
Tomohiro WARASHINA Kazuo AOYAMA Hiroshi SAWADA Takashi HATTORI
This paper presents an efficient method using Hadoop MapReduce for constructing a K-nearest neighbor graph (K-NNG) from a large-scale data set. K-NNG has been utilized as a data structure for data analysis techniques in various applications. If we are to apply the techniques to a large-scale data set, it is desirable that we develop an efficient K-NNG construction method. We focus on NN-Descent, which is a recently proposed method that efficiently constructs an approximate K-NNG. NN-Descent is implemented on a shared-memory system with OpenMP-based parallelization, and its extension for the Hadoop MapReduce framework is implied for a larger data set such that the shared-memory system is difficult to deal with. However, a simple extension for the Hadoop MapReduce framework is impractical since it requires extremely high system performance because of the high memory consumption and the low data transmission efficiency of MapReduce jobs. The proposed method relaxes the requirement by improving the MapReduce jobs, which employs an appropriate key-value pair format and an efficient sampling strategy. Experiments on large-scale data sets demonstrate that the proposed method both works efficiently and is scalable in terms of a data size, the number of machine nodes, and the graph structural parameter K.
This paper presents a new scalable method to considerably reduce the rollback propagation effect of the conventional optimistic message logging by utilizing positive features of reliable FIFO group communication links. To satisfy this goal, the proposed method forces group members to replicate different receive sequence numbers (RSNs), which they assigned for each identical message to their group respectively, into their volatile memories. As the degree of redundancy of RSNs increases, the possibility of local recovery for each crashed process may significantly be higher. Experimental results show that our method can outperform the previous one in terms of the rollback distance of non-faulty processes with a little normal time overhead.
Xu ZHOU Kai LU Xiaoping WANG Wenzhe ZHANG Kai ZHANG Xu LI Gen LI
The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
Guangchun LUO Hao CHEN Caihui QU Yuhai LIU Ke QIN
Tree partitioning arises in many parallel and distributed computing applications and storage systems. Some operator scheduling problems need to partition a tree into a number of vertex-disjoint subtrees such that some constraints are satisfied and some criteria are optimized. Given a tree T with each vertex or node assigned a nonnegative integer weight, two nonnegative integers l and u (l < u), and a positive integer p, we consider the following tree partitioning problems: partitioning T into minimum number of subtrees or p subtrees, with the condition that the sum of node weights in each subtree is at most u and at least l. To solve the two problems, we provide a fast polynomial-time algorithm, including a pre-processing method and another bottom-up scheme with dynamic programming. With experimental studies, we show that our algorithm outperforms another prior algorithm presented by Ito et al. greatly.
Yukio OGAWA Go HASEGAWA Masayuki MURATA
When computing resources are consolidated in a few huge data centers, a massive amount of data is transferred to each data center over a wide area network (WAN). This results in increased power consumption in the WAN. A distributed computing network (DCN), such as a content delivery network, can reduce the traffic from/to the data center, thereby decreasing the power consumed in the WAN. In this paper, we focus on the energy-saving aspect of the DCN and evaluate its effectiveness, especially considering traffic locality, i.e., the amount of traffic related to the geographical vicinity. We first formulate the problem of optimizing the DCN power consumption and describe the DCN in detail. Then, numerical evaluations show that, when there is strong traffic locality and the router has ideal energy proportionality, the system's power consumption is reduced to about 50% of the power consumed in the case where a DCN is not used; moreover, this advantage becomes even larger (up to about 30%) when the data center is located farthest from the center of the network topology.
Masahiro KIMOTO Tatsuhiro TSUCHIYA Tohru KIKUNO
The exact time complexity of Hsu and Huan's self-stabilizing maximal matching algorithm is provided. It is n2 + n - 2 if the number of nodes n is even and n2 + n - if n is odd.
Masahiro KIMOTO Tatsuhiro TSUCHIYA Tohru KIKUNO
In this letter we give a lower bound on the worst-case time complexity of Dijkstra's three-state mutual exclusion algorithm by specifying a concrete behavior of the algorithm. We also show that our result is more accurate than the known best bound.
Seiichiro TANI Masaki NAKANISHI Shigeru YAMASHITA
This paper describes a general quantum lower bounding technique for the communication complexity of a function that depends on the inputs given to two parties connected via paths, which may be shared with other parties, on a network of any topology. The technique can also be employed to obtain a lower-bound of the quantum communication complexity of some functions that depend on the inputs distributed over all parties on the network. As a typical application, we apply our technique to the distinctness problem of deciding whether there are a pair of parties with identical inputs, on a k-party ring; almost matching upper bounds are also given.
Hiroyuki MIYAGI Yusuke OKAZAKI Ryota USUI Yutaka ARAKAWA Satoru OKAMOTO Naoaki YAMANAKA
In a grid computing environment, the network characteristics such as bandwidth and latency affect the task performance. The demands for bandwidth of wide-area networks become large and it reaches more than 100 Gbps. In this article, we focus on parallel routes transmission, such as link aggregation, to realize large bandwidth network. The performance of grid computing with parallel routes transmission is evaluated on the emulated wide-area network.
The steady approach of advanced nations toward realization of ubiquitous computing societies has given birth to rapidly growing demands for new-generation distributed computing (DC) applications. Consequently, economic and reliable construction of new-generation DC applications is currently a major issue faced by the software technology research community. What is needed is a new-generation DC software engineering technology which is at least multiple times more effective in constructing new-generation DC applications than the currently practiced technologies are. In particular, this author believes that a new-generation building-block (BB), which is much more advanced than the current-generation DC object that is a small extension of the object model embedded in languages C++, Java, and C#, is needed. Such a BB should enable systematic and economic construction of DC applications that are capable of taking critical actions with 100-microsecond-level or even 10-microsecond-level timing accuracy, fault tolerance, and security enforcement while being easily expandable and taking advantage of all sorts of network connectivity. Some directions considered worth pursuing for finding such BBs are discussed.
Wei SUN Yuanyuan ZHANG Yasushi INOGUCHI
Heterogeneous distributed computing environments are well suited to meet the fast increasing computational demands. Task scheduling is very important for a heterogeneous distributed system to satisfy the large computational demands of applications. The performance of a scheduler in a heterogeneous distributed system normally has something to do with the dynamic task flow, that is, the scheduler always suffers from the heterogeneity of task sizes and the variety of task arrivals. From the long-term viewpoint it is necessary and possible to improve the performance of the scheduler serving the dynamic task flow. In this paper we propose a task scheduling method including a scheduling strategy which adapts to the dynamic task flow and a genetic algorithm which can achieve the short completion time of a batch of tasks. The strategy and the genetic algorithm work with each other to enhance the scheduler's efficiency and performance. We simulated a task flow with enough tasks, the scheduler with our strategy and algorithm, and the schedulers with other strategies and algorithms. We also simulated a complex scenario including the variant arrival rate of tasks and the heterogeneous computational nodes. The simulation results show that our scheduler achieves much better scheduling results than the others, in terms of the average waiting time, the average response time, and the finish time of all tasks.
Md. Nurul HUDA Eiji KAMIOKA Shigeki YAMADA
Privacy is increasingly viewed as a key concern in multi-agent based algorithms for Distributed Constraint Satisfaction Problems (DCSP) such as the Meeting Scheduling (MS) problem. Many algorithms aim for a global objective function and as a result, incur performance penalties in computational complexity and personal privacy. This paper describes a mobile agent-based scheduling scheme called Efficient and Privacy-aware Meeting Scheduling (EPMS), which results in a tradeoff among complexity, privacy, and global utility. It also introduces a privacy loss model for collaborative computation, multiple criteria for evaluating privacy in the MS problem, and a privacy measurement metric called the Min privacy metric. We have utilized a common computational space in EPMS for reducing the complexity and the privacy loss in the MS problem. The analytical results show that EPMS has a polynomial time computational complexity. In addition, simulation results show that the obtained global utility for scheduling multiple meetings with EPMS is close to the optimal level and the resulting privacy loss is less than for those in existing algorithms.
Jiahong WANG Takuya FUKASAWA Shintaro URABE Toyoo TAKATA Masatoshi MIYAZAKI
Data mining across different companies, organizations, online shops, or the likes is necessary so as to discover valuable shared patterns, associations, trends, or dependencies in their shared data. Privacy, however, is a concern. In many situations it is required that data mining should be conducted without any privacy being violated. In response to this requirement, in this paper we propose an effective distributed privacy-preserving data mining approach called SDDM. SDDM is characterized by its ability to resist collusion. Unless the number of colluding sites in a distributed system is larger than or equal to 4, privacy cannot be violated. Results of performance study demonstrated the effectiveness of SDDM.
Cho-chin LIN Da-wei WANG Tsan-sheng HSU
We discuss the problem of finding a dominant sequence for sending input data items from a low-end client to a server for computational intensive tasks under the realistic assumption of unpredictable communication behavior. Under this assumption, the client has to send the input data items using a specified sequence to maximize the number of computations performed by the server at any time. The sequence-finding problem is NP-hard for the general case. In this paper, we address three fundamental and useful applications: the product of two polynomials, matrices multiplication and Fast Fourier Transform. We show that the sequence-finding problems of the three applications can be solved optimally in linear time. However, we also show counter examples to rule out any possibility of finding a dominant sequence for sparse cases of the three applications. Finally, a simulation is conducted to show the usefulness of our method.