Makoto SUGIHARA Hiroshi DATE Hiroto YASUURA
In this paper, we propose a test methodology for core-based system LSIs. Our test methodology aims to decrease testing time for core-based system LSIs. In our method, every core is supplied with several sets of test vectors. Every set of test vectors guarantees sufficient fault coverage. Each set of test vectors consists of two parts. One is based on built-in self-test (BIST) and the other is based on external testing. These sets of test vectors are designed to have different ratio of BIST to external testing each other for every core. We can minimize testing time for core-based system LSIs by selecting from the given sets of test vectors for each core. The main contributions of this paper are summarized as follows. (i) BIST is efficiently combined with external testing to relax the limitation of the external primary inputs and outputs. (ii) External testing for one of cores and BISTs for the others are performed in parallel to reduce the total testing time. (iii) The testing time minimization problem for core-based system LSIs is formulated as a combinatorial optimization problem to select the optimal set of test vectors from given sets of test vectors for each core.
Hiroyuki TOMIYAMA Hiroto YASUURA
Since manufacturing processes inherently fluctuate, LSI chips which are produced from the same design have different propagation delays. However, the difference in delays caused by the process fluctuation has rarely been considered in most of existing high-level synthesis systems. This paper presents a new approach to module selection in high-level synthesis, which exploits the difference in functional unit delays. First, a module library model which assumes the probabilistic nature of functional unit delays is presented. Then, we propose a module selection problem and an algorithm which minimizes the cost per faultless chip. Experimental results demonstrate that the proposed algorithm finds optimal module selections which would not have been explored without manufacturing information.
As power consumption becoming a critical concern for System-On-a-Chip (SOC) design, accurate and efficient power analysis and estimation during the design phase at all levels of abstraction are becoming increasingly pressing in order to achieve low power without a costly redesign process. This paper surveys analysis and estimation techniques of dynamic power and leakage power for SOC design covering multiple design levels, which have been recently proposed, aiming to present a cohesive view of the power estimation techniques at all design levels of abstraction.
Yuji KUNITAKE Toshinori SATO Hiroto YASUURA
Negative Bias Temperature Instability (NBTI) is one of the major reliability problems in advanced technologies. NBTI causes threshold voltage shift in a PMOS transistor. When the PMOS transistor is biased to negative voltage, threshold voltage shifts to negatively. On the other hand, the threshold voltage recovers if the PMOS transistor is positively biased. In an SRAM cell, due to NBTI, threshold voltage degrades in the load PMOS transistors. The degradation has the impact on Static Noise Margin (SNM), which is a measure of read stability of a 6-T SRAM cell. In this paper, we discuss the relationship between NBTI degradation in an SRAM cell and the dynamic stress and recovery condition. There are two important characteristics. One is a stress probability, which is defined as the rate that the PMOS transistor is negatively biased. The other is a stress and recovery cycle, which is defined as the switching interval of an SRAM value. In our observations, in order to mitigate the NBTI degradation, the stress probability should be small and the stress and recovery cycle should be shorter than 10 msec. Based on the observations, we propose a novel cell-flipping technique, which makes the stress probability close to 50%. In addition, we show results of the case studies, which apply the cell-flipping technique to register file and cache memories.
Takashi YAMADA Shoji GOTO Norihisa TAKAYAMA Yoshifumi MATSUSHITA Yasoo HARADA Hiroto YASUURA
In wireless communication systems, low-power metrics is becoming a burdensome problem in the portable terminal design, because of portability constraints. This paper presents design architecture of a low-power Digital Matched Filter (DMF) for the direct-sequence spread-spectrum communication system such as WCDMA or wireless LAN. The proposed approach for power savings focuses on the architecture of the reception registers and the correlation-calculating unit, which dissipate the majority of the power in a DMF. The main features are asynchronous latch clock generation for the reception registers, parallelism of correlation calculation operations and bit manipulation for chip-correlation operations. A DMF is designed in compliance with the WCDMA specifications incorporating the proposed techniques, and its properties are evaluated by computer simulations at the gate level using 0.18-µm CMOS standard cell array technology. As a result, the power consumption of the proposed DMF is estimated to be 9.3 mW (@15.6 MHz, 1.6 V), which is below 40% of the power consumed by a general DMF.
Eko Fajar NURPRASETYO Akihiko INOUE Hiroyuki TOMIYAMA Hiroto YASUURA
In the design of an embedded system, an architecture of core processor strongly affects the performance and cost of the total system. This paper discusses a scalable processor architecture, called soft-core processor, which can be tuned for a target system. System designers can optimize several design parameters such as the datapath width and instruction set, and generate customized processors for their application. Design of Bung-DLX as a prototype of soft-core processor is presented in this paper. An experiment of system design using our processor has shown that the optimized processor chip area halves when the critical path delay is reduced to one third of the original one.
Yuji KUNITAKE Kazuhiro MIMA Toshinori SATO Hiroto YASUURA
A deep submicron semiconductor technology has increased process variations. This fact makes the estimate of the worst-case design margin difficult. In order to realize robust designs, we are investigating such a typical-case design methodology, which we call Constructive Timing Violation (CTV). In the CTV-based design, we can relax timing constraints. However, relaxing timing constraints might cause some timing errors. While we have applied the CTV-based design to a processor, unfortunately, the timing error recovery has serious impact on processor performance. In this paper, we investigate enhancement techniques of the CTV-based design. In addition, in order to accurately evaluate the CTV-based design, we build a co-simulation framework to consider circuit delay at the architectural level. From the co-simulation results, we find the performance penalty is significantly reduced by the enhancement techniques.
Kazutoshi KOBAYASHI Keikichi TAMARU Hiroto YASUURA Hidetoshi ONODERA
We propose a new architecture of Functional Memory type Parallel Processor (FMPP) architectures called bit-parallel block-parallel (BPBP) FMPP. Design details of a prototype BPBP FMPP chip are also shown. FMPP is a massively parallel processor architecture that has a memory-based simple two-dimensional regular array structure suitable for memory VLSI technology. Computation space increases as integration density of memory increases. Computation time does not depend on the number of processors. So far, a bit-serial word-parallel (BSWP) implementation based on a content addressable memory (CAM) is mainly investigated as one of promising architectures of FMPP. In a BSWP FMPP, each word of a CAM works as a processor, and the amount of hardware is minimized by abopting a bit-serial operation, thus maximizing integration scale. The BSWP FMPP, however, does not allow operations between two words, which restriction limits the applicability of the BSWP FMPP. On the other hand, the proposed BPBP FMPP is designed to execute logical and arithmetic operations on two words. These operations are performed simultaneously on every group of words called a block. BPBP FMPP hereby achieves a high performance while maintaining high integration density of the BSWP, and is suitable for various applications.
This paper addresses bitwidth optimization focusing on leakage power reduction for system-level low-power design. By means of tuning the design parameter, bitwidth tailored to a given application requirements, the datapath width of processors and size of memories are optimized resulting in significant leakage power reduction besides dynamic power reduction. Experimental results for several real embedded applications, show power reduction without performance penalty range from about 21.5% to 66.2% of leakage power, and 14.5% to 59.2% of dynamic power.
Farhad Fuad ISLAM Hiroto YASUURA Keikichi TAMARU
An architecture for radix-2, decimation in frequency fast Fourier transform butterfly computation unit is proposed. It operates on 16 bit inputs and uses 'merged core' type of multipliers which reduce hardware components and hence promise smaller chip area. This reduction in area is achieved with negligible increase in computation delay.
Hiroki AKABOSHI Hiroto YASUURA
A modern architect can not design high performance computer architecture without thinking all factors of performance from hardware level (logic/layout design) to system level (application programs, operating systems, and compilers). For computer architecture design, there are few practical CAD tools, which support design activities of the architect. In this paper, we propose a CAD tool, called COACH, for computer architecture design. COACH supports architecture design from hardware level to system level. To make a high-performance general purpose computer system, the architect evaluates system performance as well as hardware level performance. To evaluate hardware level performance accurately, logic/layout synthesis tools and simulator are used for evaluation. Logic/layout synthesis tools translate the architecture design into logic circuits and layout pattern and simulator is used to get accurate information on hardware level performance which consists of clock frequency, the number of transistors, power consumption, and so on. To evaluate system level performance, a compiler generator is introducd. The compiler generator generates a compiler of a programming language from the desripition of architecture design. The designed architecture is simulated in the behavior level with programs compiled by the compiler, and the architect can get information on system level performance which consists of program execution steps, etc. From both hardware level performance and system level performance, the architect can evaluate and revise his/her architecture, considering the architecture from hardware level to system level. In this paper, we propose a new design methodology which uses () logic/layout synthesis tools and simulators as tools for architecture design and () a compiler generator for system level evaluation. COACH, a CAD system based on the methodology, is discussed and a prototype of COACH is implemented. Using the design methodology, two processors are designed. The result of the designs shows that the proposed design methodology are effective in architecture design.
Shigeru ICHINOSE Mizuho IWAIHARA Hiroto YASUURA
Providing various assistances for design modifications on HDL source codes is important for design reuse and quick design cycle in VLSI CAD. Program slicing is a software-engineering technique for analyzing, abstracting, and transforming programs. We show algorithms for extracting/removing behaviors of specified signals in VHDL descriptions. We also describe a VHDL slicing system and show experimental results of efficiently extracting components from VHDL descriptions.
Masanori MUROYAMA Akihiko HYODO Takanori OKUMA Hiroto YASUURA
To transfer a small number, we inherently need a small number of bits. However all bit lines on a data bus change their status and redundant power is consumed. To reduce the redundant power consumption, we introduce a concept named active bit. In this paper, we propose a power reduction scheme for data buses using active bits. Suppressing switching activity of inactive bits, we can reduce redundant power consumption. We propose various power reduction techniques using active bits and the implementation methods. Experimental results illustrate up to 54.2% switching activity reduction.
Barry SHACKLEFORD Etsuko OKUSHI Mitsuhiro YASUDA Hisao KOIZUMI Katsuhiko SEO Hiroto YASUURA
The problem of synthesizing a minimum-cost logic network is formulated for a genetic algorithm (GA). When benchmarked against a commercial logic synthesis tool, an odd parity circuit required 24 basic cells (BCs) versus 28 BCs for the design produced by the commercial system. A magnitude comparator required 20 BCs versus 21 BCs for the commercial system's design. Poor temporal performance, however, is the main disadvantage of the GA-based approach. The design of a hardware-based cost function that would accelerate the GA by several thousand times is described.
Low Power design has emerged as a both practically and theoretically attractive theme in modern LSI system design. This paper presents system level power optimization techniques. A brief survey of system level low power design approaches and several examples in detail are described. It reviews some techniques that have been proposed to overcome the power issue and gives guideline for prospective system level solutions.
Makoto SUGIHARA Hiroto YASUURA
External pins for tests are precious hardware resources because this number is strongly restricted. Cores are tested via test access mechanisms (TAMs) such as a test bus architecture. When cores are tested via test buses which have constant bit widths, test stimuli and test responses for a particular core have to be transported over these test buses. The core might require more widths for input and output than test buses, and hence, for some part of the test, the TAMs are idle; this is a wasteful usage of the TAMs. In this paper, an optimization method of test accesses with a combined BIST and external test (CBET) scheme is proposed for eliminating the wasteful usage of test buses. This method can minimize the test time and eliminate the wasteful usage of external pins by considering the trade-off between test time and the number of external pins. Our idea consists of two parts. One is to determine the optimum groups, each of which consists of cores, to simultaneously share mechanisms for the external test. The other is to determine the optimum bandwidth of the external input and output for the external test. Our idea is basically formulated for the purpose of eliminating the wasteful external pin usage. We make the external test part to be under the full bandwidth of external pins by considering the trade-off between the test time and the number of external pins. This is achieved only with the CBET scheme because it permits test sets for both the BIST and the external test to be elastic. Taking test bus architecture as an example, a formulation for test access optimization and experimental results are shown. Experimental results reveal that our optimization can achieve a 51.9% reduction in the test time of conventional test scheduling and our proposals are confirmed to be effective in reducing the test time of system-on-a-chip.
Takashi YAMADA Takeshi SAKAMOTO Shinji FURUICHI Mamoru MUKUNO Yoshifumi MATSUSHITA Hiroto YASUURA
This paper proposes two techniques for improving the accuracy of gate-level power analysis for system-on-a-chip (SoC). (1) Creation of custom wire load models for clock nets. (2) Use of layout information (actual net capacitance and input signal transition time). The analysis time is reduced to less than one three-hundredth of the transistor-level power analysis time. Error is within 5% against a real chip, (the same level as that of the transistor-level power analysis), if technique (2) is used, and within 15% if technique (1) is used.
System LSI is a new principal product of semiconductor industry and also a key component of Information Technology (IT). Design of a system LSI contains two different characteristics, system design and LSI design. It is keen issue to establish a design methodology of system LSIs in which designers have much freedom on their design from system level to device level and also can control various design parameters to optimize their design. In this paper, considerations on markets of system LSIs and requirements from each application are summarized. Some proposals on new directions of design methodology are also surveyed.
Kosuke TARUMI Akihiko HYODO Masanori MUROYAMA Hiroto YASUURA
We propose a novel approach for designing a low power datapath in wireless communication systems. Especially, we focus on the digital FIR filter. Our proposed approach can reduce the power consumption and the circuit area of the digital FIR filter by optimizing the bitwidth of the each filter coefficient with keeping the filter calculation accuracy. At first, we formulate the constraints about keeping accuracy of the filter calculations. We define the problem to find the optimized bitwidth of each filter coefficient. Our defined problem can be solved by using the commercial optimization tool. We evaluate the effects of consuming power reduction by comparing the digital FIR filters designed in the same bitwidth of all coefficients. We confirm that our approach is effective for a low power digital FIR filter.