Makoto SUGIHARA Taiga TAKATA Kenta NAKAMURA Ryoichi INANAMI Hiroaki HAYASHI Katsumi KISHIMOTO Tetsuya HASEBE Yukihiro KAWANO Yusuke MATSUNAGA Kazuaki MURAKAMI Katsuya OKUMURA
We propose a cell library development methodology for throughput enhancement of character projection equipment. First, an ILP (Integer Linear Programming)-based cell selection is proposed for the equipment for which both of the CP (Character Projection) and VSB (Variable Shaped Beam) methods are available, in order to minimize the number of electron beam (EB) shots, that is, time to fabricate chips. Secondly, the influence of cell directions on area and delay time of chips is examined. The examination helps to reduce the number of EB shots with a little deterioration of area and delay time because unnecessary directions of cells can be removed. Finally, a case study is shown in which the numbers of EB shots are shown for several cases.
Tsuyoshi SADAKATA Yusuke MATSUNAGA
A Multi-Functional unit has several functions and these can be changed with a control signal. For High-Level Synthesis, using Multi-Functions units in operation chaining make it possible to obtaining the solution with the same number of control steps and less resources compared to that without them. This paper proposes an operation chaining method considering Multi-Functional units. The method formulates module selection, scheduling, and functional unit allocation with operation chaining as a 0/1 integer linear problem and obtains optimal solution with minimum number of control steps under area and clock-cycle type constraints. The first contribution of this paper is to propose the global search for operation chaining with Multi-Functional units having multiple outputs as well as with single output. The second contribution is to condier the area constraint as a resource constraint instead of the type and number of functional units. Experimental results show that chaining with Multi-Functional units is effective and the proposed method is useful to evaluate heuristic algorithms.
The index generation function is a multi-valued logic function which checks if the given input vector is a registered or not, and returns its index value if the vector is registered. If the latency of the operation is critical, dedicated hardware is used for implementing the index generation functions. This paper proposes a method implementing the index generation functions using parallel index generator. A novel and efficient algorithm called ‘conflict free partitioning’ is proposed to synthesize parallel index generators. Experimental results show the proposed method outperforms other existing methods. Also, A novel architecture of index generator which is suitable for parallelized implementation is introduced. A new architecture has advantages in the sense of both area and delay.
In this paper, an exact algorithm for finding minimum test set which detects all testable stuck-at faults of a given combinational circit is presented. So far several heuristic algorithms for this problem are proposed, but no efficient exact algorithms are known. To solve this exactly, minimum test set problem is formalized as a minimum set covering problem, and then implicit manipulation technique using binary decision diagrams(BDDs) is applied. The algorithm presented in the paper has two contributions. One is utilization of maximal compatible fault set, which can drastically reduce the number of candidates for minimum test set. A new BDD based algorithm for extracting all maximal compatible fault sets is shown. The other is a new implicit manipulation technique handling with huge covering matrix. Actually, the algorithm using this technique can handle minimum set covrering problem with over ten thousand columns in a few minutes. Experiments using ISCAS benchmark circuits show that the algorithm is quite efficient for small(100-300 gates) circuits. A computational complexith of minimum test set problen is much higher than that of ordinary test pattern generation problem, so that practical signifcance of this method is not high. But the algorithm is still useful for evaluation of other heuristic algorithms. furthermore, this implicit manipulation technique can also be applied to other minimumset covering problems.
Makoto SUGIHARA Kenta NAKAMURA Yusuke MATSUNAGA Kazuaki MURAKAMI
The character projection (CP) lithography is utilized for maskless lithography and is a potential for the future photomask fabrication. The drawback of the CP lithography is its low throughput and leads to a price rise of IC devices. This paper discusses a technology mapping technique for enhancing the throughput of the CP lithography. The number of electron beam (EB) shots to project an entire chip directly determines the fabrication time for the chip as well as the throughput of CP equipment. Our technology mapping technique maps EB shot count-effective cells to a circuit in order to increase the throughput of CP equipment. Our technique treats the number of EB shots as an objective to minimize. Comparing with a conventional technology mapping, our technology mapping technique has achieved 26.6% reduction of the number of EB shots for the front-end-of-the-line (FEOL) process without any performance degradation of ICs. Moreover, our technology mapping technique has achieved a 54.6% less number of EB shots under no performance constraints. It is easy for both IC designers and equipment developers to adopt our technique because our technique is a software approach with no additional modification on CP equipment.
Taeko MATSUNAGA Shinji KIMURA Yusuke MATSUNAGA
Multi-operand adders, which calculates the summation of more than two operands, usually consist of compressor trees which reduce the number of operands to two without any carry propagation, and a carry-propagate adder for the two operands in ASIC implementation. The former part is usually realized using full adders or (3;2) counters like Wallace-trees in ASIC, while adder trees or dedicated hardware are used in FPGA. In this paper, an approach to realize compression trees on FPGAs is proposed. In case of FPGA with m-input LUT, any counters with up to m inputs can be realized with one LUT per an output. Our approach utilizes generalized parallel counters (GPCs) with up to m inputs and synthesizes high-performance compressor trees by setting some intermediate height limits in the compression process like Dadda's multipliers. Experimental results show that the number of GPCs are reduced by up to 22% compared to the existing heuristic. Its effectivity on reduction of delay is also shown against existing approaches on Altera's Stratix III.
Recent technology mappers for LUT based FPGAs employ cut enumeration. Although many cuts are often needed to find a good network, enumerating all the cuts with large size consumes a lot of run-time. Existing algorithms employ the bottom-up merging which calculates Cartesian products of the fanins' cuts for each node. The number of cuts is much smaller than the size of the Cartesian products in most cases. Thus, the existing algorithms are inefficient. Furthermore, the number of cuts exponentially increases with the size of cuts, that makes the run-time much longer. Several algorithms to enumerate not all the cuts but partial cuts have been presented, but they tend to disturb the quality of networks. This paper presents two algorithms to enumerate cuts; an exhaustive enumeration and a partial enumeration. Both of them are efficient because they do not employ the bottom-up merging. The partial enumeration reduces the number of enumerated cuts with a guarantee that a depth-minimum network can be constructed. The experimental results show that the exhaustive enumeration runs about 5 and 13 times faster than the existing bottom-up algorithm for K=8, 9 respectively, while keeping the same results. On the other hand, the partial enumeration runs about 9 and 29 times faster than the existing algorithm for K = 8, 9, respectively. The average area of networks derived by the sets of cuts enumerated by the partial enumeration is only 4% larger than that derived with using all the cuts, and the depth is the same.
Makoto SUGIHARA Kazuaki MURAKAMI Yusuke MATSUNAGA
In this paper, a test architecture optimization for system-on-a-chip under floorplanning constraints is proposed. The models of previous test architecture optimizations were too ideal to be applied to industrial SOCs. To make matters worse, they couldn't treat topological locality of cores, that is, floorplanning constraints. The optimization proposed in this paper can avoid long wires for TAMs in consideration of floorplanning constraints and finish optimizing test architectures within reasonable computation time.
Tsuyoshi SADAKATA Yusuke MATSUNAGA
This paper proposes a novel Behavioral Synthesis method that tries to reduce the number of clock cycles under clock cycle time and total functional unit area constraints using special functional units efficiently. Special functional units are designed to have shorter delay and/or smaller area than the cascaded basic functional units for specific operation patterns. For example, a Multiply-Accumulator is one of them. However, special functional units may have less flexibility for resource sharing because intermediate operation results may not be able to be obtained. Hence, almost all conventional methods can not handle special functional units efficiently for the reduction of clock cycles in practical time, especially under a tight area constraint. The proposed method makes it possible to solve module selection, scheduling, and functional unit allocation problems using special functional units in practical time with some heuristics. Experimental results show that the proposed method has achieved maximally 33% reduction of the cycles for a small application and 14% reduction for a realistic application in practical time.
This paper presents a test pattern compaction algorithm applicable for large scale circuits. The proposed methods formalizes the test pattern compaction problem as a problem finding minimum set of compatible fault groups. Also, an efficient algorithm checking compatibility of fault group is proposed. The experimental results show that the proposed algorithm achieves similar or better results against a couple of existing methods, especially for middle circuits.
Makoto SUGIHARA Yusuke MATSUNAGA Kazuaki MURAKAMI
Character projection (CP) lithography is utilized for maskless lithography and is a potential for the future photomask manufacture because it can project ICs much faster than point beam projection or variable-shaped beam (VSB) projection. In this paper, we first present a projection mask set development methodology for multi-column-cell (MCC) systems, in which column-cells can project patterns in parallel with the CP and VSB lithographies. Next, we present an INLP (integer nonlinear programming) model as well as an ILP (integer linear programming) model for optimizing a CP mask set of an MCC projection system so that projection time is reduced. The experimental results show that our optimization has achieved 33.4% less projection time in the best case than a naive CP mask development approach. The experimental results indicate that our CP mask set optimization method has virtually increased cell pattern objects on CP masks and has decreased VSB projection so that it has achieved higher projection throughput than just parallelizing two column-cells with conventional CP masks.
Taeko MATSUNAGA Shinji KIMURA Yusuke MATSUNAGA
Multi-operand adders that calculate the summation of more than two operands usually consist of compressor trees, which reduce the number of operands to two without any carry propagation, and carry-propagate adders for the two operands in the ASIC implementation. Compressor trees that consist of full adders and half adders cannot be implemented efficiently on LUT-based FPGAs, and carry-chains or dedicated structures have been utilized to produce multi-operand adders on FPGAs. Recent studies indicate that compressor trees can be implemented efficiently on LUTs using Generalized Parallel Counters (GPCs) as the building blocks of compressor trees. This paper addresses the problem of synthesizing compressor trees based on GPCs. Based on the observation that characteristics such as the area, power, and delay correlate roughly to the total number and the maximum level of GPCs, the target problem can be regarded as a minimization problem for the total number of GPCs and the maximum levels of the GPCs, for which an ILP-based approach is proposed. The key point of our formulation is not to model the problem based on the structures of compressor trees like the existing approach, but instead the compression process itself is used to reduce the number of variables and constraints in the ILP formulation. The experimental results demonstrate the advantage of our formulation in terms of the quality and runtime.
The paper describes a new algorithm for Boolean matching, which is based on BDD structure manipulation. Pruning of the search space takes place after partial assignments if certain subgraphs of two BDD's become inequivalent. This pruning is different from existing techniques, so that the search space is further reduced. Another feature of this algorithm is topological filtering. Usually, many functions have no matchings and this is easily found by only counting the number of minterms. To check it quickly, upper and lower bounds of minterm count are calculated from topological information. Using these bounds, functions that have no matchings are discarded without building their BDD's.
Though tree covering is an efficient algorithm for technology mapping, phase assignments on tree boundaries are not taken into consideration. Several inverter minimization algorithms have been proposed so far, but they do phase optimization before or after technology mapping, and their cost function is not to minimize the total area but to minimize the number of inverters. This paper describes a new formulation of phase optimization problem aiming to minimize the total area during the technology mapping. Cost function representing area according to each phase assignment is introduced, and tree covering algorithm is modified to handle that cost function. Edge-Valued Binary Decision Diagram is used to represent the function implicitly. Experimental results show that proposed method reduces about 10% area on average compared with a state-of-the-art logic synthesis system sis.
This paper describes two speed-up techniques for Boolean matching of LUT-based circuits. One is one-hot encoding technique for variables representing input assignments. Though it requires more variables than existing binary encoding technique, almost all added clauses using one-hot encoding are binary clauses, which are suitable for efficient Boolean constraint propagation. The other is CEGAR (counter example guided abstraction refinement) technique which reduces the CPU time significantly. With both techniques, we can solve Boolean matching problem with 9 input function in 20 milliseconds on average, which is faster than the existing algorithms more than one order of magnitude.
Qiang ZHU Yusuke MATSUNAGA Shinji KIMURA Katsumasa WATANABE
Combinational logic circuits are usually implemented as multi-level networks of logic nodes. Multi-level logic simplification using the don't cares on each node is widely used. Large don't cares give good simplification results, but suffer from huge memory area and computation time. Extraction of useful don't cares and reduction of the size of the don't cares are important problems on the simplification using don't cares. In the paper, we propose a new robust heuristic method for the selection of don't cares. We consider an adaptive subnetwork for each simplified node in the network and introduce a stepwise enhancement method of the subnetwork considering the memory area and the network structure. The don't cares extracted from the adaptive subnetworks are called the local don't cares. We have implemented our method for satisfiability don't cares and observability don't cares. We have applied the method on MCNC89 benchmarks, and compared the experimental results with those of the SIS system. The results demonstrate the superiority of our method on the quality of the results and on the size of applicable circuits.
Functional decomposition is an essential technique of logic synthesis and is important especially for FPGA design. Bertacco and Damiani proposed an efficient algorithm finding simple disjoint decomposition using Binary Decision Diagrams (BDDs). However, their algorithm is not complete and does not find all the decompositions. This paper presents a complete theory of simple disjoint decomposition and describes an efficient algorithm using BDDs.
Hiroyuki HIGUCHI Yusuke MATSUNAGA
This paper proposes a heuristic algorithm for state minimization of incompletely specified finite state machines (FSMs). The strategy is similar to that in ESPRESSO, a wellknown heuristic algorithm for two-level logic minimization. It consists of generating an initial solution, the set of maximal compatibles, and attempting to apply a series of transformations to the solution. The main transformation is to reduce each compatible in the solution and delete unnecessary compatibles by iterative improvements. Other transformations, such as expansion and merging of compatibles, are also introduced for further reduction. When the number of compatibles is likely to be too large to handle explicitly, they are represented by a Binary Decision Diagram. Experimental results show that the proposed method finds better solutions in shorter CPU times for most of the examples than conventional methods.