Daisuke AMAYA Shunsuke HOMMA Takuji TACHIBANA
In resource-constrained network function virtualization (NFV) environments, it is expected that data throughput for service chains is maintained by using virtual network functions (VNFs) effectively. In this paper, we formulate an optimization problem for maximizing the total data throughput in resource-constrained NFV environments. Moreover, based on our formulated optimization problem, we propose a heuristic service chain construction algorithm for maximizing the total data throughput. This algorithm also determines the placement of VNFs, the amount of resources for each VNF, and the transmission route for each service chain. It is expected that the heuristic algorithm can construct service chains more quickly than the meta-heuristic algorithm. We evaluate the performance of the proposed methods with simulations, and we investigate the effectiveness of our proposed heuristic algorithm through a performance comparison. Numerical examples show that our proposed methods can construct service chains so as to maximize the total data throughput regardless of the number of service chains, the amount of traffic, and network topologies.
Skip Graph is a promising distributed data structure for large scale systems and known for its capability of range queries. Although several methods of routing range queries in Skip Graph have been proposed, they have inefficiencies such as a long path length or a large number of messages. In this paper, we propose a novel routing method for range queries named Split-Forward Broadcasting (SFB). SFB introduces a divide-and-conquer approach, enabling nodes to make full use of their routing tables to forward a range query. It brings about a shorter average path length than existing methods, as well as a smaller number of messages by avoiding duplicate transmission. We clarify the characteristics and effectiveness of SFB through both analytical and experimental comparisons. The results show that SFB can reduce the average path length roughly 30% or more compared with a state-of-the-art method.
Yoshiyuki MIHARA Shuichi MIYAZAKI Yasuo OKABE Tetsuya YAMAGUCHI Manabu OKAMOTO
In this article, we propose a method to identify the link layer home network topology, motivated by applications to cost reduction of support centers. If the topology of home networks can be identified automatically and efficiently, it is easier for operators of support centers to identify fault points. We use MAC address forwarding tables (AFTs) which can be collected from network devices. There are a couple of existing methods for identifying a network topology using AFTs, but they are insufficient for our purpose; they are not applicable to some specific network topologies that are typical in home networks. The advantage of our method is that it can handle such topologies. We also implemented these three methods and compared their running times. The result showed that, despite its wide applicability, our method is the fastest among the three.
Hiroshi FUJIWARA Kei SHIBUSAWA Kouki YAMAMOTO Hiroaki YAMAMOTO
The multislope ski-rental problem is an online optimization problem that generalizes the classical ski-rental problem. The player is offered not only a buy and a rent options but also other options that charge both initial and per-time fees. The competitive ratio of the classical ski-rental problem is known to be 2. In contrast, the best known so far on the competitive ratio of the multislope ski-rental problem is an upper bound of 4 and a lower bound of 3.62. In this paper we consider a parametric version of the multislope ski-rental problem, regarding the number of options as a parameter. We prove an upper bound for the parametric problem which is strictly less than 4. Moreover, we give a simple recurrence relation that yields an equation having a lower bound value as its root.
Kazuyuki AMANO Shin-ichi NAKANO
Let P be a set of points on the plane, and d(p, q) be the distance between a pair of points p, q in P. For a point p∈P and a subset S ⊂ P with |S|≥3, the 2-dispersion cost, denoted by cost2(p, S), of p with respect to S is the sum of (1) the distance from p to the nearest point in Ssetminus{p} and (2) the distance from p to the second nearest point in Ssetminus{p}. The 2-dispersion cost cost2(S) of S ⊂ P with |S|≥3 is minp∈S{cost2(p, S)}. Given a set P of n points and an integer k we wish to compute k point subset S of P with maximum cost2(S). In this paper we give a simple 1/({4sqrt{3}}) approximation algorithm for the problem.
Xiaobo ZHANG Wenbo XU Yan TIAN Jiaru LIN Wenjun XU
In the context of compressed sensing (CS), simultaneous orthogonal matching pursuit (SOMP) algorithm is an important iterative greedy algorithm for multiple measurement matrix vectors sharing the same non-zero locations. Restricted isometry property (RIP) of measurement matrix is an effective tool for analyzing the convergence of CS algorithms. Based on the RIP of measurement matrix, this paper shows that for the K-row sparse recovery, the restricted isometry constant (RIC) is improved to $delta_{K+1}<rac{sqrt{4K+1}-1}{2K}$ for SOMP algorithm. In addition, based on this RIC, this paper obtains sufficient conditions that ensure the convergence of SOMP algorithm in noisy case.
Hiroki OKADA Atsushi TAKAYASU Kazuhide FUKUSHIMA Shinsaku KIYOMOTO Tsuyoshi TAKAGI
The Blum-Kalai-Wasserman algorithm (BKW) is an algorithm for solving the learning parity with noise problem, which was then adapted for solving the learning with errors problem (LWE) by Albrecht et al. Duc et al. applied BKW also to the learning with rounding problem (LWR). The number of blocks is a parameter of BKW. By optimizing the number of blocks, we can minimize the time complexity of BKW. However, Duc et al. did not derive the optimal number of blocks theoretically, but they searched for it numerically. Duc et al. also showed that the required number of samples for BKW for solving LWE can be dramatically decreased using Lyubashevsky's idea. However, it is not shown that his idea is also applicable to LWR. In this paper, we theoretically derive the asymptotically optimal number of blocks, and then analyze the minimum asymptotic time complexity of the algorithm. We also show that Lyubashevsky's idea can be applied to LWR-solving BKW, under a heuristic assumption that is regularly used in the analysis of LPN-solving BKW. Furthermore, we derive an equation that relates the Gaussian parameter σ of LWE and the modulus p of LWR. When σ and p satisfy the equation, the asymptotic time complexity of BKW to solve LWE and LWR are the same.
Ryuta KAWANO Ryota YASUDO Hiroki MATSUTANI Michihiro KOIBUCHI Hideharu AMANO
Recently proposed irregular networks can reduce the latency for both on-chip and off-chip systems with a large number of computing nodes and thus can improve the performance of parallel applications. However, these networks usually suffer from deadlocks in routing packets when using a naive minimal path routing algorithm. To solve this problem, we focus attention on a lately proposed theory that generalizes the turn model to maintain the network performance with deadlock-freedom. The theorems remain a challenge of applying themselves to arbitrary topologies including fully irregular networks. In this paper, we advance the theorems to completely general ones. Moreover, we provide a feasible implementation of a deadlock-free routing method based on our advanced theorem. Experimental results show that the routing method based on our proposed theorem can improve the network throughput by up to 138 % compared to a conventional deterministic minimal routing method. Moreover, when utilized as the escape path in Duato's protocol, it can improve the throughput by up to 26.3 % compared with the conventional up*/down* routing.
Takashi YOKOTA Kanemitsu OOTSU Takeshi OHKAWA
Inter-node communication is essential in parallel computation. The performance of parallel processing depends on the efficiencies in both computation and communication, thus, the communication cost is not negligible. A parallel application program involves a logical communication structure that is determined by the interchange of data between computation nodes. Sometimes the logical communication structure mismatches to that in a real parallel machine. This mismatch results in large communication costs. This paper addresses the node-mapping problem that rearranges logical position of node so that the degree of mismatch is decreased. This paper assumes that parallel programs execute one or more collective communications that follow specific traffic patterns. An appropriate node-mapping achieves high communication performance. This paper proposes a strong heuristic method for solving the node-mapping problem and adapts the method to a genetic algorithm. Evaluation results reveal that the proposed method achieves considerably high performance; it achieves 8.9 (4.9) times speed-up on average in single-(two-)traffic-pattern cases in 32×32 torus networks. Specifically, for some traffic patterns in small-scale networks, the proposed method finds theoretically optimized solutions. Furthermore, this paper discusses in deep about various issues in the proposed method that employs genetic algorithm, such as population of genes, number of generations, and traffic patterns. This paper also discusses applicability to large-scale systems for future practical use.
Shanshan JIAO Zhisong PAN Yutian CHEN Yunbo LI
As one of the most popular intelligent optimization algorithms, Simulated Annealing (SA) faces two key problems, the generation of perturbation solutions and the control strategy of the outer loop (cooling schedule). In this paper, we introduce the Gaussian Cloud model to solve both problems and propose a novel cloud annealing algorithm. Its basic idea is to use the Gaussian Cloud model with decreasing numerical character He (Hyper-entropy) to generate new solutions in the inner loop, while He essentially indicates a heuristic control strategy to combine global random search of the outer loop and local tuning search of the inner loop. Experimental results in function optimization problems (i.e. single-peak, multi-peak and high dimensional functions) show that, compared with the simple SA algorithm, the proposed cloud annealing algorithm will lead to significant improvement on convergence and the average value of obtained solutions is usually closer to the optimal solution.
Benhong ZHANG Yiming WANG Jianjun ZHANG Juan XU
The flexibility of wireless communication makes it more and more widely used in industrial scenarios. To satisfy the strict real-time requirements of industry, various wireless methods especially based on the time division multiple access protocol have been introduced. In this work, we first conduct a mathematical analysis of the network model and the problem of minimum packet loss. Then, an optimal Real-time Scheduling algorithm based on Backtracking method (RSBT) for industrial wireless sensor networks is proposed; this yields a scheduling scheme that can achieve the lowest network packet loss rate. We also propose a suboptimal Real-time Scheduling algorithm based on Urgency and Concurrency (RSUC). Simulation results show that the proposed algorithms effectively reduce the rate of the network packet loss and the average response time of data flows. The real-time performance of the RSUC algorithm is close to optimal, which confirms the computation efficiency of the algorithm.
Takahiro NISHIMURA Jacir Luiz BORDIM Yasuaki ITO Koji NAKANO
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise Parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA) using the affine gap penalty. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for the SWA accelerates the computation by over 646 times as compared to a single CPU implementation and by 6.9 times as compared to a multi-core CPU implementation with 160 threads.
Kazuro KIMURA Shinya HIGA Masao OKITA Fumihiko INO
In this paper, we propose an acceleration method for the Held-Karp algorithm that solves the symmetric traveling salesman problem by dynamic programming. The proposed method achieves acceleration with two techniques. First, we locate data-independent subproblems so that the subproblems can be solved in parallel. Second, we reduce the number of subproblems by a meet in the middle (MITM) technique, which computes the optimal path from both clockwise and counterclockwise directions. We show theoretical analysis on the impact of MITM in terms of the time and space complexities. In experiments, we compared the proposed method with a previous method running on a single-core CPU. Experimental results show that the proposed method on an 8-core CPU was 9.5-10.5 times faster than the previous method on a single-core CPU. Moreover, the proposed method on a graphics processing unit (GPU) was 30-40 times faster than that on an 8-core CPU. As a side effect, the proposed method reduced the memory usage by 48%.
Kota ANDO Kodai UEYOSHI Yuka OBA Kazutoshi HIROSE Ryota UEMATSU Takumi KUDO Masayuki IKEBE Tetsuya ASAI Shinya TAKAMAEDA-YAMAZAKI Masato MOTOMURA
Deep neural network (NN) has been widely accepted for enabling various AI applications, however, the limitation of computational and memory resources is a major problem on mobile devices. Quantized NN with a reduced bit precision is an effective solution, which relaxes the resource requirements, but the accuracy degradation due to its numerical approximation is another problem. We propose a novel quantized NN model employing the “dithering” technique to improve the accuracy with the minimal additional hardware requirement at the view point of the hardware-algorithm co-designing. Dithering distributes the quantization error occurring at each pixel (neuron) spatially so that the total information loss of the plane would be minimized. The experiment we conducted using the software-based accuracy evaluation and FPGA-based hardware resource estimation proved the effectiveness and efficiency of the concept of an NN model with dithering.
Shoya OOHARA Mitsuji MUNEYASU Soh YOSHIDA Makoto NAKASHIZUKA
For image restoration, an image prior that is obtained from the morphological gradient has been proposed. In the field of mathematical morphology, the optimization of the structuring element (SE) used for this morphological gradient using a genetic algorithm (GA) has also been proposed. In this paper, we introduce a new image prior that is the sum of the morphological gradients and total variation for an image restoration problem to improve the restoration accuracy. The proposed image prior makes it possible to almost match the fitness to a quantitative evaluation such as the mean square error. It also solves the problem of the artifact due to the unsuitability of the SE for the image. An experiment shows the effectiveness of the proposed image restoration method.
This paper deals with the problem of minimizing roundoff noise and pole sensitivity simultaneously subject to l2-scaling constraints for state-space digital filters. A novel measure for evaluating roundoff noise and pole sensitivity is proposed, and an efficient technique for minimizing this measure by jointly optimizing state-space realization and error feedback is explored, namely, the constrained optimization problem at hand is converted into an unconstrained problem and then the resultant problem is solved by employing a quasi-Newton algorithm. A numerical example is presented to demonstrate the validity and effectiveness of the proposed technique.
Kiyoshi NISHIYAMA Masahiro SUNOHARA Nobuhiko HIRUMA
The least mean squares (LMS) algorithm has been widely used for adaptive filtering because of easily implementing at a computational complexity of O(2N) where N is the number of taps. The drawback of the LMS algorithm is that its performance is sensitive to the scaling of the input. The normalized LMS (NLMS) algorithm solves this problem on the LMS algorithm by normalizing with the sliding-window power of the input; however, this normalization increases the computational cost to O(3N) per iteration. In this work, we derive a new formula to strictly perform the NLMS algorithm at a computational complexity of O(2N), that is referred to as the C-NLMS algorithm. The derivation of the C-NLMS algorithm uses the H∞ framework presented previously by one of the authors for creating a unified view of adaptive filtering algorithms. The validity of the C-NLMS algorithm is verified using simulations.
Naoto KIDO Sumio MASUDA Kazuaki YAMAGUCHI
We consider the problem of placing arrows, which indicate the direction of each edge in directed graph drawings, without making them overlap other arrows, vertices and edges as much as possible. The following two methods have been proposed for this problem. One is an exact algorithm for the case in which the position of each arrow is restricted to some discrete points. The other is a heuristic algorithm for the case in which the arrow is allowed to move continuously on each edge. In this paper, we assume that the arrow positions are not restricted to discrete points and propose an exact algorithm for the problem of finding an arrow placement such that (a) the weighted sum of the numbers of overlaps with edges, vertices and other arrows is minimized and (b) the sum of the distances between the arrows and their edges' terminal vertices is minimized as a secondary objective. The proposed method solves this problem by reducing it to a mixed integer linear programming problem. Since this is an exponential time algorithm, we add a simple procedure as preprocessing to reduce the running time. Experimental results show that the proposed method can find a better arrow placement than the previous methods and the procedure for reducing the running time is effective.
MeiJun DUAN HongYu YANG Bo YANG XiPing WU HaiJun LIANG
Due to its simplicity and efficiency, differential evolution (DE) has gained the interest of researchers from various fields for solving global optimization problems. However, it is prone to premature convergence at local minima. To overcome this drawback, a novel hybrid dragonfly algorithm with differential evolution (Hybrid DA-DE) for solving global optimization problems is proposed. Firstly, a novel mutation operator is introduced based on the dragonfly algorithm (DA). Secondly, the scaling factor (F) is adjusted in a self-adaptive and individual-dependent way without extra parameters. The proposed algorithm combines the exploitation capability of DE and exploration capability of DA to achieve optimal global solutions. The effectiveness of this algorithm is evaluated using 30 classical benchmark functions with sixteen state-of-the-art meta-heuristic algorithms. A series of experimental results show that Hybrid DA-DE outperforms other algorithms significantly. Meanwhile, Hybrid DA-DE has the best adaptability to high-dimensional problems.
We design a new oblivious routing algorithm for two-dimensional mesh-based Networks-on-Chip (NoCs) called LEF (Long Edge First) which offers high throughput with low design complexity. LEF's basic idea comes from conventional wisdom in choosing the appropriate dimension-order routing (DOR) algorithm for supercomputers with asymmetric mesh or torus interconnects: routing longest dimensions first provides better performance than other strategies. In LEF, we combine the XY DOR and the YX DOR. When routing a packet, which DOR algorithm is chosen depends on the relative position between the source node and the destination node. Decisions of selecting the appropriate DOR algorithm are not fixed to the network shape but instead made on a per-packet basis. We also propose an efficient deadlock avoidance method for LEF in which the use of virtual channels is more flexible than in the conventional method. We evaluate LEF against O1TURN, another effective oblivious routing algorithm, and a minimal adaptive routing algorithm based on the odd-even turn model. The evaluation results show that LEF is particularly effective when the communication is within an asymmetric mesh. In a 16×8 NoC, LEF even outperforms the adaptive routing algorithm in some cases and delivers from around 4% up to around 64.5% higher throughput than O1TURN. Our results also show that the proposed deadlock avoidance method helps to improve LEF's performance significantly and can be used to improve O1TURN's performance. We also examine LEF in large-scale NoCs with thousands of nodes. Our results show that, as the NoC size increases, the performance of the routing algorithms becomes more strongly influenced by the resource allocation policy in the network and the effect is different for each algorithm. This is evident in that results of middle-scale NoCs with around 100 nodes cannot be applied directly to large-scale NoCs.