Takumi WATANABE Yusuke OHTOMO Kimihiro YAMAKOSHI Yuichiro TAKEI
This paper presents a routing methodology and a routing algorithm used in designing Gb/s LSIs with deep-submicron technology. A routing method for controlling wire width and spacing is adopted for net groups classified according to wire length and maximum-allowable-delay constraints. A high-performance router using this method has been developed and can handle variable wire widths, variable spacing, wire shape control, and low-delay routing. For multi-terminal net routing, a modification of variable-cost maze routing (GVMR) is effective for reducing wire capacitance (net length) and decreasing net delay. The methodology described here has been used to design an ATM-switch LSI using 0. 25-µm CMOS/SIMOX technology. The LSI has a throughput of 40 Gb/s (2. 5 Gbps/pin) and an internal clock frequency of 312 MHz.
Ashraf A. M. KHALAF Kenji NAKAYAMA
Time series prediction is very important technology in a wide variety of fields. The actual time series contains both linear and nonlinear properties. The amplitude of the time series to be predicted is usually continuous value. For these reasons, we combine nonlinear and linear predictors in a cascade form. The nonlinear prediction problem is reduced to a pattern classification. A set of the past samples x(n-1),. . . ,x(n-N) is transformed into the output, which is the prediction of the next coming sample x(n). So, we employ a multi-layer neural network with a sigmoidal hidden layer and a single linear output neuron for the nonlinear prediction. It is called a Nonlinear Sub-Predictor (NSP). The NSP is trained by the supervised learning algorithm using the sample x(n) as a target. However, it is rather difficult to generate the continuous amplitude and to predict linear property. So, we employ a linear predictor after the NSP. An FIR filter is used for this purpose, which is called a Linear Sub-Predictor (LSP). The LSP is trained by the supervised learning algorithm using also x(n) as a target. In order to estimate the minimum size of the proposed predictor, we analyze the nonlinearity of the time series of interest. The prediction is equal to mapping a set of past samples to the next coming sample. The multi-layer neural network is good for this kind of pattern mapping. Still, difficult mappings may exist when several sets of very similar patterns are mapped onto very different samples. The degree of difficulty of the mapping is closely related to the nonlinearity. The necessary number of the past samples used for prediction is determined by this nonlinearity. The difficult mapping requires a large number of the past samples. Computer simulations using the sunspot data and the artificially generated discrete amplitude data have demonstrated the efficiency of the proposed predictor and the nonlinearity analysis.
Adam Icarus IMORO Ippo AOKI Naoki INAGAKI Nobuyoshi KIKUMA
A more judicious choice of trial functions to implement the Improved Circuit Theory (ICT) application to multi-element antennas is achieved. These new trial functions, based on Tai's modified variational implementation for single element antennas, leads to an ICT implementation applicable to much longer co-planar dipole arrays. The accuracy of the generalized impedance formulas is in good agreement with the method of moments. Moreover, all these generalized formulas including the radiation pattern expressions are all in closed-form. This leads to an ICT implementation which still requires much shorter CPU time and lesser computer storage compared to method of moments. Thus, for co-planar dipole arrays, the proposed implementation presents a relatively very efficient method and would therefore be found useful in applications such as CAD/CAE systems.
Iman TRIONO Naoya OHTA Kenichi KANATANI
We implement a graphical interface that automatically transforms a figure input by a mouse into a regular figure which the system infers is the closest to the input. The difficulty lies in the fact that the classes into which the input is to be classified have inclusion relations, which prohibit us from using a simple distance criterion. In this letter, we show that this problem can be resolved by introducing the geometric AIC.
Keiichi KOIKE Kenji KAWAI Akira ONOZAWA Yuichiro TAKEI Yoshiji KOBAYASHI Haruhiko ICHINO
A computer-aided low-power design methodology for very high-speed Si bipolar standard cell LSI is described. In order to obtain Gbit/s-speed operation, it features a pair of differential clock channels inside cells and a highly accurate static timing analysis for back annotation. A newly developed CAD-based power optimization scheme minimizes cell currents while maintaining circuit speed. A 5.6 k gate SDH signal-processing LSI operating at 1.6 Gbit/s with only 3.9 W power consumption demonstrates the effectiveness of this design technology.
Masaru KATAYAMA Atsushi TAKAHARA Toshiaki MIYAZAKI Kennosuke FUKAMI
We propose a propagation delay model for SRAM-based FPGAs. It is a simplified Elmore delay model with a linear fan-out function. Therefore, the computational complexity is small. In order to ensure calculation accuracy, the model parameters are extracted from real layout data. The average model error is 4% compared to actual delays. The model is applicable for delay estimation in a router and as a tool for static calculation of critical path delay.
Ahmed Riadh BABA-ALI Ahcene FARAH
Signal flow determination of CMOS/VLSI digital circuits is a key issue for switch-level CAD tools such as timing and testability analysers, functional abstractors, ATPGs etc. and even some simulators. Signal flow determination is used to pre-process circuit MOS transistors in order to improve both the accuracy and the running time of these CAD tools. Existing algorithms can be classified into two main categories: the rule-based approach and the algorithm-based approach. However, both of them have several drawbacks. This paper presents an efficient algorithm based on a novel mixed algorithmic and rule based approach. Our algorithm overcomes most of the drawbacks of the pure algorithmic and rule based approaches. It is based on a set of "safe" topological rules rather than ad hoc or technology dependent ones, while the algorithmic aspect of our approach is based on a recursive Depth First Search (DFS). Due to the algorithmic aspect of our approach, some rules consider circuit global effects such as path informations. Our approach provides the advantages of the rule based one (i.e.: the flexibility and the adaptability toward the great variety of CMOS design styles) as well as the advantages of the algorithmic approach (i.e.: the fast processing time and the ability to consider circuits global effects). The result is that the software is very accurate since all the unidirectional and bidirectional transistors are correctly identified in all the pathological benchmarks reported in the literature.
Kenji TORAZAWA Satoshi SUMI Seiji YONEZAWA Naomi SUZUKI Yasuhito TANAKA Akira TAKAHASHI Yoshiteru MURAKAMI Norio OHTA
Recently, many types of high-density recording technologies for future MO (Magneto-Optical) storage have been reported. MSR (Magnetically Induced Super Resolution) technology is one of the most promising candidates, and over ten types of MSR technologies have been already proposed. However, they are not well-discussed from the viewpoint of total recording technology which would include the recording and readout methods, the pick-up technology and the signal processing technology. Key technologies for realizing MO storage of over 7 GBytes in a CD-sized disk using a red laser are proposed, and the experimental results pertaining to each key technology are described. The write/read characteristics were examined for the CAD (Center Aperture Detection)-MSR disk. From the characteristics of the CAD-MSR disk combined with laser pumped magnetic field modulation recording, it was shown that land/groove (0.7 µm width) recording with the linear density of 0.27 µm/bit and track pitch below 0.7 µm can be realized. It was also shown that CAD-MSR disk is well combined with an OSR (Optical Super Resolution) pick up, laser pumped read-out and PRML (Partial Response Maximum Likelihood) technologies which are very useful to achieve a high density MO disk. Using CAD-MSR disk combined with above technologies together, high density write/read with a bit length of 0.2 µm and a track pitch of 0.6 µm should be realized with using the laser of 635 nm wavelength. Applying the CAD-MSR disks to a CD sized MO disk, the capacity becomes over 7 GBytes (Format efficiency: 80%), which is 20 times higher than 5.25 inches MO disk and 1.5 times than DVD-ROM.
Hidenori SATO Hiroaki MATSUDA Akira ONOZAWA
This paper presents a clock routing technique called Balanced-Mesh Method (BMM) which incorporates the advantages of two famous conventional-clock-routing techniques. One is the balanced-tree method (BTM) where the clock net is routed as a tree so that the delay times of clock signal are balanced, and the other is the fixed-mesh method (FMM) where the clock net is routed as a fixed mesh driven by a large buffer. In BMM, the clock net is routed as a set of relatively small meshes of interconnects driven by relatively small buffers. Each mesh covers an area called a Mesh-Routing Region (MR) in which its delay and skew can be suppressed within a certain range. These small meshes are connected by a balanced tree with the chip clock source as its root. To implement BMM, we developed an MR-partitioning program that partitions the circuit into MR's according to a set of pre-determined constraints on the number of flip-flops and the area in each MR, and a clock-global-routing program that provides each mesh routing and the tree routing connecting meshes. We applied BMM to the design of an MPEG2-encoder LSI and achieved a skew of 210ps. In addition, the experimental results show BMM yields the lowest power dissipation compared to conventional methods.
This study was conducted to assess the relationship between fatigue and pupillary responses. Pupillary responses, ECG and blood pressure were measured for 24 hours every 30 min in 8 subjects. A questionnaire was used to rate subjective feeling of fatigue. Twenty-four hours were divided equally into four 6-hour blocks. Subjective feeling of fatigue increased markedly in the fourth block, and the difference in subjective fatigue between fourth and first blocks was significant. Of nine pupillary responses, the pupil diameter was found to decrease with time. With respect to the function of the autonomic nervous system such as heart rate, systolic blood pressure and diastolic blood pressure, only heart rate was found to be sensitive to the increased subjective feeling of fatigue. A significant difference was found in the mean pupil diameter and mean heart rate between the last and first blocks. This result indicates that pupil diameter is related to fatigue and can be used to assess fatigue. Possible implications for fatigue assessment are discussed.
Shinichiro SHIRATAKE Daisaburo TAKASHIMA Takehiro HASEGAWA Hiroaki NAKANO Yukihito OOWAKI Shigeyoshi WATANABE Takashi OHSAWA Kazunori OHUCHI
A new memory cell arrangement for a gigabit-scale NAND DRAM is proposed. Although the conventional NAND DRAM in which memory cells are connected in series realizes the small die size, it faces a crucial array noise problem in the 1 gigabit generation and beyond because of its inherent noise of the open bitline arrangement. By introducing the new cell arrangement to a NAND DRAM, the folded bitline scheme is realized, resulting in good noise immunity. The basic operation of the proposed folded bitline scheme was successfully verified using the 64 kbit test chip. The die size of the proposed NAND DRAM with the folded bitline scheme (F-NAND DRAM) at the 1 Gbit generation is reduced to 63% of that of the conventional 1 Gbit DRAM with the folded bitline scheme, assuming the bitlines and the wordlines are fabricated with the same pitch. The new 4/4 bitline grouping scheme in which cell data are read out to four neighboring bitlines is also introduced to reduce the bitline-to-bitline coupling noise to half of that of the conventional folded bitline scheme. The array noise of the proposed F-NAND DRAM with the 4/4 bitline grouping scheme at 1 Gbit generation is reduced to 10% of the read-out signal, while that of the conventional NAND DRAM with open bitline scheme is 29%, and that of the conventional DRAM with the folded bitline scheme is 22%.
Hiroshi SHIROTA Satoshi SHIBATANI Masayuki TERAI
A fast rip-up and reroute algorithm for very large scale gate arrays is proposed. The automatic routing program for gate arrays usually consists of an initial routing process and rip-up and rerouting process. The rip-up and rerouting process eliminates the unconnects introduced by the initial routing process. There are two main reasons for leaving some unconnects: routing order dependency and local wire congestion. The existing rip-up and reroute algorithms can efficiently resolve unconnects caused by the routing order dependency. However, they cannot do unconnects caused by the local wire congestion. On the other hand, the proposed algorithm combines a `global' and `local' rip-up and reroute process and efficiently resolve unconnects caused by both of them. The `global' process reduces the local wire congestion by ripping up and rerouting global paths. The `local' process eliminates the unconnects, mainly caused by routing order dependency, by ripping up and rerouting local paths. The effectiveness of our method is demonstrated by our experimental results on industrial sea-of-gates (SOG) circuits and a well-known benchmark circuit.
Katsuyoshi MIURA Koji NAKAMAE Hiromu FUJIOKA
A hierarchical fault tracing method for VLSIs with bi-directional busses from CAD layout data in the CAD-linked electron beam test system is described. When fault tracing reaches at a cell connected to a bi-directional bus, our method is able to judge the direction of the signal flow, input or output, by using waveforms acquired by an EB tester, in a consistent manner independently of circuit functions as with a previously proposed tracing method for circuits without bi-directional busses.
Adam Icarus IMORO Yoshihisa KANI Naoki INAGAKI Nobuyoshi KIKUMA
The valid region for the application of the conventional Improved Circuit Theory (ICT) in the analysis of wire antennas is established. To further extend the application of ICT to the analysis of much longer antennas, Tai's trial function is used to derive new formulas for the impedance matrix. Unlike the conventional ICT trial function, Tai's trial functions lead to input impedances which are finite irrespective of antenna length. Results of the new ICT impedance formulas are comparable in accuracy with the general method of moments. Moreover, since all the elements of the new formula have been expressed in closed-form, the resulting ICT algorithm is still superior in terms of computer running time with lesser storage requirement compared to other conventional methods like method of moments. This would enhance ICT applications in CAD/CAE systems.
Ingrid KIRSCHNING Jun-Ichi AOE
The Time-Slicing paradigm is a newly developed method for the training of neural networks for speech recognition. The neural net is trained to spot the syllables in a continuous stream of speech. It generates a transcription of the utterance, be it a word, a phrase, etc. Combined with a simple error recovery method the desired units (words or phrases) can be retrieved. This paradigm uses a recurrent neural network trained in a modular fashion with natural connectionist glue. It processes the input signal sequentially regardless of the input's length and immediately extracts the syllables spotted in the speech stream. As an example, this character string is then compared to a set of possible words, picking out the five closest candidates. In this paper we describe the time-slicing paradigm and the training of the recurrent neural network together with details about the training samples. It also introduces the concept of natural connectionist glue and the recurrent neural network's architecture used for this purpose. Additionally we explain the errors found in the output and the process to reduce them and recover the correct words. The recognition rates of the network and the recovery rates for the words are also shown. The presented examples and recognition rates demonstrate the potential of the time-slicing method for continuous speech recognition.
Hyosig WON Yoshihiro HAYAKAWA Koji NAKAJIMA Yasuji SAWADA
We have fabricated a new analog memory for integrated artificial neural networks. Several attempts have been made to develop a linear characteristics of floating-gate analog memorys with feedback circuits. The learning chip has to have a large number of learning control circuit. In this paper, we propose a new analog memory SDAM with three cascaded TFTs. The new analog memory has a simple design, a small area occupancy, a fast switching speed and an accurate linearity. To improve accurate linearity, we propose a new chargetransfer process. The device has a tunnel junction (poly-Si/poly-Si oxide/poly-Si sandwich structure), a thin-film transistor, two capacitors, and a floating-gate MOSFET. The diffusion of the charges injected through the tunnel junction are controlled by a source follower operation of a thin film transistor (TFT). The proposed operation is possible that the amounts of transferred charges are constant independent of the charges in storage capacitor.
Akira ONOZAWA Hitoshi KITAZAWA Kenji KAWAI
In this paper, a post-layout optimization technique for power dissipation and timing of cell-based Bipolar ECL LSIs is proposed. An ECL LSI can operate at a frequency of a few GHz but the power dissipation is very high compared to CMOS LSIs, which makes the systems using ECL quite expensive. Therefore it is crucial to develop of CAD techniques that minimize the power dissipation of an ECL LSI without decreasing its performance. To begin with, power and delay models of an ECL gate are presented as functions of its switching current. The power dissipation is a linear function of the switching current and the delay time is its hyperbolic function. These functions are obtained considering the post-layout interconnect capacitance and resistance to make the optimization results accurate enough. Using the delay model, a set of timing constraints specifying the max/min cell delay and the clock skew are extracted. This set of constraints in then given to a nonlinear programming package. The objective functions are clock skew time, the clock cycle time and the power dissipation, which are optimized in this order. With the minimum delay and hold constraints, the problem is not convex so that conventional convex programming approach cannot be used. As a result of the optimization, the switching currents for cells are obtained. These are realized within cells by regulating programmable resistors", which is a special feature of our ECL cell library. Since the above optimization is carried out after the placement and routing of the circuit, it can take accurate delay and power estimation into consideration. Experimental results show more than 40% power reductions for circuits including a real communication system chip, compared to the max power versions. The clock cycle time was maintained or even made faster due to the efficient clock skew optimization.
Masayoshi TACHIBANA Sachiko KUROSAWA Reiko NOJIMA Naohito KOJIMA Masaaki YAMADA Takashi MITSUHASHI Nobuyuki GOTO
This paper proposes a method for achieving low-power control-logic modules using a combination of CMOS complex gate reorganization, transistor size optimization, and transistor layout. Complex gate reorganization minimizes transistor count and net count without changing the functionality of the circuit. Transistor sizing and layout are interdependent, the optimization of one results in the optimization of the other. The authors applied the reorganization method to a 10,846-transistor circuit, and succeeded in reducing the transistor count by 10%, and the net count by 9%. Transistor sizing and layout compaction reduced the average transistor size by one tenth, while the same delay was maintained. Total circuit capacitance, which is strongly related to power dissipation, was cut to 36%, even when wiring capacitances were dominant.
A unified process flow management system (UPFMS) that combines a CIM system, process/device simulator, CAD system, and manufacturing line schedular has been developed. This new system uses a new language called PDL to describe the process flow as common information for all systems. The UPFMS consists of the flow edit section, the flow inspection section, and several types of interface programs to make it suitable for use with other systems. The process flow data described using the PDL in the UPFMS provides data for controlling lots in CIM system. If modification of the process flow data in the CIM system is required, the process flow data is returned to the UPFMS and modified with inspection using a knowledge data base. Then, the error-free process flow data is sent back to the CIM system for Processes after flow inspection. Moreover, the UPFMS, with the new language PDL, generates recipe data for the equipment using an interface program, and recipe data is input to several types of equipment. Furthermore, the PDL process flow data can also be used as input data for the manufacturing line scheduler using another interface program. Mask and layout data in a CAD system can be exchanged among process/device simulators by using the UPFMS, and thus two-dimensional device characteristics. Spice paramenters can be also to be created. The UPFMS combines with CIM system, process/device simulator, CAD system, and the manufacturing line scheduler using common information, PDL. The process flow data created in the UPFMS can be used to control all systems from the simulation to CIM system as common data.
Hisako SATO Katsumi TSUNENO Kimiko AOYAMA Takahide NAKAMURA Hisaaki KUNITOMO Hiroo MASUDA
A new methodology for simulation-based CMOS process design has been proposed, using a Hierarchical Response Surface Method (HRSM) and an efficient experimental calibration. The design methodology has been verified using a 0.4 micron CMOS process. The proposed HRSM achieved a 60% reduction of process and device design cost in comparison with those of conventional TCAD. The procedure was performed in conjunction with an experimental calibration technique to provide a reliable threshold voltage prediction including process variation effects. The total CPU cost was 200 hr. on SUN SPARC 10 and the error of the predicted threshold voltage was less than 0.02 V.