Hiroyuki OKAMURA Satoshi MIYAHARA Tadashi DOHI
This paper considers a transaction-based multi-server system with rejuvenation, and derive the optimal software rejuvenation policies under some system dependability measures; the steady-state availability, the probability of transaction loss and the upper bound of mean response time on transactions. We compare the system configuration based on a single-server with that based on a multi-server in terms of the software rejuvenation scheme. In numerical examples, we calculate the optimal software rejuvenation timing and its associated dependability measure, and refer to the effect of preventive maintenance in the transaction-based multi-server software systems.
Hiroyuki OKAMURA Satoshi MIYAHARA Tadashi DOHI Shunji OSAKI
The software rejuvenation is one of the most effective preventive maintenance technique for operational software systems with high assurance requirement. In this paper, we propose the workload-based software rejuvenation scheme for a server type of software system, and develop stochastic models to determine the optimal software rejuvenation schedules for some dependability measures. In numerical examples, we evaluate quantitatively the performance of workload-based software rejuvenation scheme and compare it with the time-based rejuvenation scheme.
Eun Hye CHOI Tatsuhiro TSUCHIYA Tohru KIKUNO
We propose a two-level hierarchical method for dependability evaluation of distributed systems with replicated programs and data files. Since Markov modeling is limited only to each component in this method, state explosion can be circumvented successfully. Simulation results show that the method can accomplish evaluation even for large systems for which Markov modeling is not feasible.
In a decentralised system the problems of fault tolerance, and in particular error recovery, vary greatly depending on the design assumptions. For example, in a distributed database system, if one disregards the possibility of undetected invalid inputs or outputs, the errors that have to be recovered from will just affect the database, and backward error recovery will be feasible and should suffice. Such a system is typically supporting a set of activities that are competing for access to a shared database, but which are otherwise essentially independent of each other--in such circumstances conventional database transaction processing and distributed protocols enable backward recovery to be provided very effectively. But in more general systems the multiple activities will often not simply be competing against each other, but rather will at times be attempting to co-operate with each other, in pursuit of some common goal. Moreover, the activities in decentralised systems typically involve not just computers, but also external entities that are not capable of backward error recovery. Such additional complications make the task of error recovery more challenging, and indeed more interesting. This paper provides a brief analysis of the consequences of various such complications, and outlines some recent work on advanced error recovery techniques that they have motivated.
Tatsuhiro TSUCHIYA Tomoya KAJIKAWA Tohru KIKUNO
The SDP (Sum of Disjoint Products) approach is a well-known technique for computing network reliability measures. So far several algorithms have been developed based on this approach. In this letter, we present a general framework for parallelization of these SDP algorithms. Based on the framework, we implemented a parallel version of an SDP algorithm called CAREL on a network of workstations. Experimental results show that it works fairly well with almost linear speedups.
Raphael ROCHET Regis LEVEUGLE Gabriele SAUCIER
Synthesis tools are now extensively used in the VLSI circuit design process. They allow a much higher design productivity, but the designer often does not directly control the circuit structure. Thus, when circuits are dedicated to dependable applications, designers have difficulties in implementing manually the devices needed to obtain fault detection or tolerance capabilities. The ASYL-SdF System has been developed over the last few years in order to avoid this break in the design flow, and to facilitate the designer's work when dependability is targeted. This paper gives an overview of the resulting tool, its synthesis flow for fault detection and fault tolerance in Finite State Machines, its limitations and the current developments. Actual circuit implementation results are given in terms of area overheads, expected reliability and experimental fault detection coverage.