Toshiyuki YOROZUYA Koji OHASHI Mineo KANEKO
In this paper, we study loop pipeline scheduling problem under given resource assignment (operation to functional unit assignments and data to register assignments), which is one of the key tasks in data-path synthesis based on the assignment solution space exploration. We show an approach using a precedence constraint graph with parametric disjunctive arcs generated from the specified assignment information, and derive a scheduling method using branch-and-bound exploration of the parameter space. As an application of the proposed scheduling method, it is incorporated with Simulated-Annealing (SA) based exploration of assignment solution space, and it is demonstrated that data-paths of the fifth-order elliptic wave filter are successfully synthesized.
The impact of clock-skew on circuit timing increases rapidly as technology scales. As a result, it becomes important to deal with clock-skew at the early stages of circuit designs. This paper presents a novel datapath design that aims at mitigating the impact of clock-skew in high-level synthesis, by integrating margin (evaluated as the maximum number of clock-cycles to absorb clock-skew) and ordered clocking into high-level synthesis tasks. As a first attempt to the proposed datapath design, this paper presents a 0-1 integer linear programming formulation that focuses on register binding to achieve the minimum cost (the minimum number of registers) under given scheduling result. Experimental results show the optimal results can be obtained without increasing the latency, and with a few extra registers compared to traditional high-level synthesis design.
In this paper, we propose an effective asynchronous datapath synthesis system to optimize statistical performance of asynchronous systems. The proposed algorithm is a heuristic method which simultaneously performs scheduling and resource binding. During the design process, decisions will be made based on the statistical schedule length analysis. It is demonstrated that asynchronous datapaths with the reduced mean total computation time are successfully synthesized for some datapath synthesis benchmarks.
Mineo KANEKO Kimihiko KAZUI Hiroaki KUNIEDA
An optimum placement of capacitors in the layout of Switched Capacitor networks is presented in this paper. The performance of integrated circuits is generally degraded by perturbations of physical parameters of each device and parasitic strays. The optimality imposed in this paper is the minimum degradation of a transfer function with respect to the distribution of capacitance values. A capacitance value per unit area fabricated on a LSI chip is assumed to be perturbed linearly with its x and y coordinates. The capacitor placement is determined so that the effects of such perturbation of capacitances to the overall transfer-characteristics are canceled. As the result, input-output transfer function will stay nominal under the linear perturbation model with arbitrary gradients.
Mineo KANEKO Hiroyuki MIYAUCHI
In this paper, we present Branching Oriented System Equation based on-line error correction scheme for recursive digital signal processing. The target digital signal processing is linear and time-invariant, and the algorithm includes multiplications with constant coefficient, additions and delays. The difficulties of the algorithm-level fault tolerance for such algorithm without structural regularity include error distribution problem and right timing of error correction. To escape the error distribution problem, multiple fan-out nodes in an algorithm are specified as the nodes at which error corrections are performed. The Branching Oriented Graph and Branching Oriented System Equation are so introduced to formulate on-line correction schemes based on this strategy. The Branching Oriented Graph is treated as the collection of computation sub-blocks. Applying checksum code independently to each sub-block is our most trivial on-line error correction scheme, and it results in, with appropriate selection of error identification process, TMR in sub-block level. One of the advantages of our method is in the reduction of redundant operations performed by merging some computation sub-blocks. On the other hand, the schedulability of the system is an important issue for our method since our on-line error correction mechanism induces additional data dependencies. In this paper, the schedulability condition and some modifications on the scheme are also discussed.
Tomohiko OHTSUKA Hiroaki KUNIEDA Mineo KANEKO
This paper describes a new approach towards the performance-driven layout for analog LSIs. Based on our approach, we developed an automatic performance-driven layout system LIBRA. The performance-driven layout has an advantage that numerical evaluations of performance requirements may exactly specify layout requirements so that a better layout result will be expected with regard to both the size and the performances. As the first step to the final goal, we only concern with the DC characteristics of analog circuits affected by the placement and routing. First of all, LIBRA performs the sensitivity analysis with respect to process parameters and wire parasitics, which are major causes for DC performance deviations of analog LSIs, so as to describe every perfomance deviation by its first order approximation. Based on the estimations of those performance deviations, LIBRA designs the placement of devices. The placement approach here is the simulated annealing method driven by their circuit performance specification. The routing of inter-cell wires is performed according to the priority of the larger total wire sensitivities in the net by the maze router. Then, the simple compaction eliminates the empty space as much as possible. After that, the power lines optimization is performed so as to minimize the ferformance deviations. Finally, an advantage of the performance improvement by our approach is demonstrated by showing a layout result of a practical bipolar circuit and its excellent performance evaluations.
Tsuyoshi ISSHIKI Hiroaki KUNIEDA Mineo KANEKO
This paper proposes a designing algorithm for quadrilateral recursive filters which consist of four quarter-plane filters in the four quadrants. This can realize a perfect zero-phase filtering which is essential for image processing. Furthermore, several parallel processing algorithms capable of performing under very high parallel efficiency are developed on line-connected and mesh-connected processor arrays. By these proposals, the advantage of two-dimensional non-causal zero-phase recursive digital filters is made clear.
This paper treats the data routing problem for fault-tolerant systolic arrays based on Triple Modular Redundancy (TMR) in mixed spatial-temporal domain. The number of logical links required in TMR systolic array is basically 9 times larger than the one for corresponding non-fault-tolerant systolic array. The link sharing is a promising method for reducing the number of physical links, which may, however, degrade the fault tolerance of TMR system. This paper proposes several robust data-routing and resource-sharing (plural data transfers share a physical link, or a data transfer and a computational task share a PE as a relay node for the former and as a processor for the latter), by which certain classes of fault tolerant property will be guaranteed. A stage and a dominated set are introduced to characterize the features of routing/resource-sharing in TMR systems, and conditions on the dominated set and their resultant fault-tolerant properties are derived.
This paper treats a fault detection/location of multi-processor systems, and we present a checking scheme based on Modified Processor-Data (MPD) graph with considering an error generation/propagation model for Algorithm-Based Fault Tolerant (ABFT) systems. The error propagation model considered here allows a computation result with multiple (more than one) erroneous inputs to be either erroneous or error-free. Also a basic algorithm for constructing checks for single-fault locatable/two-fault detectable ABFT systems based on the checking scheme is described with design examples.
Ben CHEN Mahoki ONODA Mineo KANEKO
With the development of LSI technologies, conventional circuit simulation using only single type of method has become unsatisfactory, i.e. circuit-level analysis based on device model spends much simulation time and relaxation methods have the problems on their accuracy. Therefore, it is necessary to develop the better methods to realize both high-speed and good accuracy. In this paper, a mixed-mode circuit-timing simulation method has been studied. It uses a new kind of automatic circuit partition approach--dynamic circuit partition process based on checking coupling factors between circuit nodes at every time point for better convergence. This method is based on examining the characteristic of circuit equations rather than circuit topology or function blocks. A mixed-mode simulation program--MMAPC for transient analysis of CMOS large-scale circuit has been developed and some simulation examples have been performed. The results show that MMAPC can be more than two orders of magnitude faster than a standard" circuit-level simulator (SPICE-like) while providing comparable waveform accuracy, and has better convergence property than general timing-level simulators.
Keisuke INOUE Mineo KANEKO Tsuyoshi IWAGAKI
For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold timing constraint, as well as the setup timing constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup timing constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold timing constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold timing constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by Backward-Data-Direction (BDD) clocking. One of the drawbacks of the proposed register assignment is the increase in the number of required registers. After the formulation of this new register minimization problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by BDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.
As well as the schedule affects system performance, the control skew, i.e., the arrival time difference of control signals between registers, can be utilized for improving the system performance, enhancing robustness against delay variations, etc. The simultaneous optimization of the control step assignment and the control skew assignment is more powerful technique in improving performance. In this paper, firstly, we prove that, even if the execution sequence of operations which are assigned to the same resource is fixed, the simultaneous optimization problem under a fixed clock period is NP-hard. Secondly, we propose a heuristic algorithm for the simultaneous control step and skew optimization under given clock period, and we show how much the simultaneous optimization improves system performance. This paper is the first one that uses the intentional skew to shorten control steps under a specified clock period. The proposed algorithm has the potential to play a central role in various scenarios of skew-aware high level synthesis.
Fernando Gil V. RESENDE Jr. Keiichi TOKUDA Mineo KANEKO
A new adaptive AR spectral estimation method is proposed. While conventional least-squares methods use a single windowing function to analyze the linear prediction error, the proposed method uses a different window for each frequency band of the linear prediction error to define a cost function to be meinemized. With this approach, since time and frequency resolutions can be traded off throughout the frequency spectrum, an improvement on the precision of the estimates is achieved. In this paper, a wavelet-like time-frequency resolution grid is used so that low-frequency components of the linear prediction error are analyzed through long windows and high-frequency components are analyzed through short ones. To solve the optimization problem for the new cost function, special properties of the correlation matrix are used to derive an RLS algorithm on the order of M2, where M is the number of parameters of the AR model. Computer simulations comparing the performance of conventional RLS and the proposed methods are shown. In particular, it can be observed that the wavelet-based spectral estimation method gives fine frequency resolution at low frequencies and sharp time resolution at high frequencies, while with conventional methods it is possible to obtain only one of these characteristics.
Mineo KANEKO Hiroyuki MIYAUCHI
A systematic procedure to configure faulttolerant systolic arrays based on Triplicated Triple Modular Redundancy is proposed. The design procedure consists of the triplication of the dependence graph which is formed from a target regular algorithm and the transformation onto physical time-processor domain. The resultant systolic arrays tolerate failures not only on processing elements but also on communication links. While it needs sophisticated connection scheme between processing elements to guarantee the fault-tolerance on communication links, the link complexity is possibly reduced by optimizing redundant operation scheme. Unconstrained and constrained link minimization problems are introduced, and the possibility and the constraints required for link complexity reduction are investigated.
A mixed storage-type design using flip-flops and latches (FF/latch-based design) has advantages on such as area and power compared to single storage-type design (only flip-flops or latches). Considering FF/latch-based design at high-level synthesis is necessary, because resource binding process significantly affects the quality of resulting circuits. One of the fundamental aspects in FF/latch-based design is that different resource binding solutions could lead to the different numbers of latch-replacable registers. Therefore, as a first step, this paper addresses a datapath design problem in which resource binding and selecting storage-types of registers are simultaneously optimized for datapath area minimization (i.e., latch replacement maximization). An efficient algorithm based on the compatibility path decomposition and an integer linear programming-based exact approach are presented. Experiments confirm the effectiveness of the proposed approaches.
Tsuyoshi IWAGAKI Eiri TAKEDA Mineo KANEKO
This paper proposes a test scheduling method for stuck-at faults in a CHAIN interconnect, which is an asynchronous on-chip interconnect architecture, with scan ability. Special data transfer which is permitted only during test, is exploited to realize a more flexible test schedule than that of a conventional approach. Integer linear programming (ILP) models considering such special data transfer are developed according to the types of modules under test in a CHAIN interconnect. The obtained models are processed by using an ILP solver. This framework can not only obtain optimal test schedules but also easily introduce additional constraints such as a test power budget. Experimental results using benchmark circuits show that the proposed method can reduce test application time compared to that achieved by the conventional method.
Keisuke INOUE Mineo KANEKO Tsuyoshi IWAGAKI
As the feature size of VLSI becomes smaller, delay variations become a serious problem in VLSI. In this paper, we propose a novel class of robustness for a datapath against delay variations, which is named structural robustness against delay variation (SRV), and propose sufficient conditions for a datapath to have SRV. A resultant circuit designed under these conditions has a larger timing margin to delay variations than previous designs without sacrificing effective computation time. In addition, under any degree of delay variations, we can always find an available clock frequency for a datapath having SRV property to operate correctly, which could be a preferable characteristic in IP-based design.
Fernando Gil V. RESENDE Jr. Keiichi TOKUDA Mineo KANEKO Akinori NISHIHARA
A new structure for adaptive AR spectral estimation based on multi-band decomposition of the linear prediction error is introduced and the mathematical background for the soulution of the related adaptive filtering problem is derived. The presented structure gives rise to AR spectral estimates that represent the true underlying spectrum with better fidelity than conventional LS methods by allowing an arbitrary trade-off between variance of spectral estimates and tracking ability of the estimator along the frequency spectrum. The linear prediction error is decomposed through a filter bank and components of each band are analyzed by different window lengths, allowing long windows to track slowly varying signals and short windows to observe fastly varying components. The correlation matrix of the input signal is shown to satisfy both time-update and order-update properties for rectangular windowing functions, and an RLS algorithm based on each property is presented. Adaptive forward and backward relations are used to derive a mathematical framework that serves as a basis for the design of fast RLS alogorithms. Also, computer experiments comparing the performance of conventional and the proposed multi-band methods are depicted and discussed.
This paper addresses a high-level synthesis (HLS) using dual-edge-triggered flip-flops (DETFFs) as memory elements. In DETFF-based HLS, the duty cycle becomes a manageable resource to improve the timing performance. To utilize the duty cycle radically, a programmable duty cycle (PDC) mechanism is built into this HLS, and captured by a new HLS task named PDC scheduling. As a first step toward DETFF-based HLS with PDC, the execution time minimization problem is formulated for given results of operation scheduling. A linear program is presented to solve this problem in polynomial time. As a next step, simultaneous operation scheduling and PDC scheduling problem for the same objective is tackled. A mixed integer linear programming-based (MILP) approach is presented to solve this problem. The experimental results show that the MILP can reduce the execution time for several benchmarks.