1-10hit |
This paper addresses the problem of developing an efficient fault-tolerant routing method for 2D mesh Network-on-Chips (NoCs) to realize dependable and high performance many core systems. Existing fault-tolerant routing methods have two critical problems of high communication latency and low node utilization. Unlike almost all existing methods where packets always detour faulty nodes, we propose a novel and unique approach that packets can pass through faulty nodes. For this approach, we enhance the common NoC architecture by adding switches and links around each node and propose a fault-tolerant routing method with no virtual channels based on the well-known simple XY routing method. Simulation results show that the proposed method reduces average communication latency by about 97.1% compared with the existing method, without sacrificing fault-free nodes.
Zewen SHI Xiaoyang ZENG Zhiyi YU
Manufacturing defects in the deep sub-micron VLSI process and aging resulted problems of devices during lifecycle are inevitable, and fault-tolerant routing algorithms are important to provide the required communication for NoCs in spite of failures. The proposed algorithm, referred to as scalable and reconfigurable fault-tolerant distributed routing (RFDR), partitions the system into nine regions using the concept of divide-and-conquer. It is a distributed algorithm, and each router guarantees fault-tolerance within one's own region and the system can be still sustained with multiple fault areas. The proposed RFDR has excellent scalability with hardware cost keeping constant independent of system size. Also it is completely reconfigurable when new nodes fail. Simulations under various synthetic traffic patterns show its better performance compared to Extended-XY routing algorithm. Moreover, there is almost no hardware overhead compared to Logic-Based Distributed Routing (LBDR), but the fault-tolerance capacity is enhanced in the proposed algorithm. Hardware cost is reduced 37% compared to Reconfigurable Distributed Scalable Predictable Interconnect Network (R-DSPIN) which only supports single fault region.
Binzhang FU Yinhe HAN Huawei LI Xiaowei LI
The Network-on-Chip (NoC) is limited by the reliability constraint, which impels us to exploit the fault-tolerant routing. Generally, there are two main design objectives: tolerating more faults and achieving high network performance. To this end, we propose a new multiple-round dimension-order routing (NMR-DOR). Unlike existing solutions, besides the intermediate nodes inter virtual channels (VCs), some turn-legally intermediate nodes inside each VC are also utilized. Hence, more faults are tolerated by those new introduced intermediate nodes without adding extra VCs. Furthermore, unlike the previous solutions where some VCs are prioritized, the NMR-DOR provides a more flexible manner to evenly distribute packets among different VCs. With extensive simulations, we prove that the NMR-DOR maximally saves more than 90% unreachable node pairs blocked by faults in previous solutions, and significantly reduces the packet latency compared with existing solutions.
In this paper we study the problem of how to identify multiple disjoint paths that have the minimum total cost OPT and satisfy a delay bound D in a graph G. This problem has lots of applications in networking such as fault-tolerant quality of service (QoS) routing and network-flow load balancing. Recently, several approximation algorithms have been developed for this problem. Here, we propose a new approximation algorithm for it by using the Lagrangian Relaxation method. We then present a simple approximation algorithm for finding multiple link-disjoint paths that satisfy the delay constraints at a reasonable total cost. If the optimal solution under delay-bound D has a cost OPT, then our algorithm can find a solution whose delay is bounded by (1+)D and the cost is no more than (1+k)OPT. The time complexity of our algorithm is much better than the previous algorithms.
Paola FLOCCHINI Antonio Mesa ENRIQUES Linda PAGLI Giuseppe PRENCIPE Nicola SANTORO
We consider the problem of computing the optimal swap edges of a shortest-path tree. This problem arises in designing systems that offer point-of-failure shortest-path rerouting service in presence of a single link failure: if the shortest path is not affected by the failed link, then the message will be delivered through that path; otherwise, the system will guarantee that, when the message reaches the node where the failure has occurred, the message will then be re-routed through the shortest detour to its destination. There exist highly efficient serial solutions for the problem, but unfortunately because of the structures they use, there is no known (nor foreseeable) efficient distributed implementation for them. A distributed protocol exists only for finding swap edges, not necessarily optimal ones. We present two simple and efficient distributed algorithms for computing the optimal swap edges of a shortest-path tree. One algorithm uses messages containing a constant amount of information, while the other is tailored for systems that allow long messages. The amount of data transferred by the protocols is the same and depends on the structure of the shortest-path spanning-tree; it is no more, and sometimes significantly less, than the cost of constructing the shortest-path tree.
Giscard WEPIWE Plamen L. SIMEONOV
The paper presents HiPeer, a robust resource distribution and discovery algorithm that can be used for fast and fault-tolerant location of resources in P2P network environments. HiPeer defines a concentric multi-ring overlay networking topology, whereon dynamic network management methods are deployed. In terms of performance, HiPeer delivers of number of lowest bounds. We demonstrate that for any De Bruijn digraph of degree d 2 and diameter DDB HiPeer constructs a highly reliable network, where each node maintains a routing table with at most 2d+2 entries independent of the number N of nodes in the system. Further, we show that any existing resource in the network with at most d nodes can be found within at most DHiPeer = log d(N(d-1)+d)-1 overlay hops. This result is as close to the Moore bound [1] as the query path length in other outstanding P2P proposals based on the De Bruijn digraphs. Thus, we argue that HiPeer defines a highly connected network with connectivity d and the lowest yet known lookup bound DHiPeer. Moreover, we show that any node's "join or leave" operation in HiPeer implies a constant expected reorganization cost of the magnitude order of O(d) control messages.
Toshinori TAKABATAKE Masato KITAKAMI Hideo ITO
In interconnection networks, deadlock recovery has been studied in routing strategy. The routing strategy for the deadlock recovery is intended to optimize the routing performance when deadlocks do not occur. On the other hand, it is important to improve the routing performance by handling deadlocks if they occur. In this paper, a routing strategy for suspensive deadlock recovery called an escape-restoration routing is proposed and its performance is evaluated. In the principle of the proposed techniques, a small amount of exclusive buffer (escape-buffer) at each router is prepared for handling one of deadlocked packets. The transmission of the packet is suspended by temporarily escaping it to the escape-buffer. After the other deadlocked packets were sent, the suspended transmission resumes by restoring the escaped packet. Evaluation results show that the proposed techniques can improve the routing performance more than that of the previous recovery-based techniques in handling deadlocks.
Deogkyoo LEE Daekeun MOON Ilgu YUN Hagbae KIM
Since components faults occurring at arbitrary places (primarily on the links) affect seriously network performance and reliability, the multicomputers operating in harsh environments should be designed to guarantee normal network-missions in presence of those faults. One solution to the end is a fault-tolerant routing scheme, which enables messages to safely reach their destinations avoiding failed links when transmission of messages is blocked by certain faults. In the paper, we develop a fault-tolerant routing algorithm with deadlock freedom in an n-dimensional meshed network, and validate its efficiency and effectiveness through proper simulations. The aspects of fault-tolerance is adopted by appending partial-adaptiveness and detouring to the e-cube algorithm, while using a wormhole routing for the backbone routing method. The phenomenon of deadlock incurred due to its adaptiveness is eliminated by classifying a physical channel into a couple of virtual channels.
Many researchers have used hypercube interconnection networks for their good properties to construct many parallel processing systems. However, as the number of processors increases, the probability of occurrences of faulty nodes also increases. Hence, for hypercube interconnection networks which have faulty nodes, several efficient dynamic routing algorithms have been proposed which allow each node to hold status information of its neighbor nodes. In this paper, we propose an improved version of the algorithm proposed by Chiu and Wu by introducing the notion of full reachability. A fully reachable node is a node that can reach all nonfaulty nodes which have Hamming distance l from the node via paths of length l. In addition, we further improve the algorithm by classifying the possibilities of detours with respect to each Hamming distance between current and target nodes. We propose an initialization procedure which makes use of an equivalent condition to perform this classification efficiently. Moreover, we conduct a simulation to measure the improvement ratio and to compare our algorithms with others. The simulation results show that the algorithms are effective when they are applied to low-dimensional hypercube interconnection networks.
Jinsoo KIM Ji-Yun KIM Hyunsoo YOON Seung Ryoul MAENG Jung Wan CHO
We propose a fault-tolerant routing algorithm for 2D meshes. Our routing algorithm can tolerate any number of concave fault regions. It is based on xy-routing and uses the concept of the fault ring/chain composed of fault-free elements surrounding faults. Three virtual channels per physical link are used for deadlock-free routing on a fault ring. Four virtual channels are needed for a fault chain. For a concave fault ring, fault-free nodes in the concave region have been deactivated to avoid deadlock in the previous algorithms, which results in excessive loss of the computational power. Our algorithm ensures deadlock-freedom by restricting the virtual channel usage in the concave region, and it minimizes the loss of the computational power. We also extend the proposed routing scheme for adaptive fault-tolerant routing. The adaptive version requires the same number of virtual channels as the deterministic one.