Takashi YAMADA Takeshi SAKAMOTO Shinji FURUICHI Mamoru MUKUNO Yoshifumi MATSUSHITA Hiroto YASUURA
This paper proposes two techniques for improving the accuracy of gate-level power analysis for system-on-a-chip (SoC). (1) Creation of custom wire load models for clock nets. (2) Use of layout information (actual net capacitance and input signal transition time). The analysis time is reduced to less than one three-hundredth of the transistor-level power analysis time. Error is within 5% against a real chip, (the same level as that of the transistor-level power analysis), if technique (2) is used, and within 15% if technique (1) is used.
System LSI is a new principal product of semiconductor industry and also a key component of Information Technology (IT). Design of a system LSI contains two different characteristics, system design and LSI design. It is keen issue to establish a design methodology of system LSIs in which designers have much freedom on their design from system level to device level and also can control various design parameters to optimize their design. In this paper, considerations on markets of system LSIs and requirements from each application are summarized. Some proposals on new directions of design methodology are also surveyed.
Kosuke TARUMI Akihiko HYODO Masanori MUROYAMA Hiroto YASUURA
We propose a novel approach for designing a low power datapath in wireless communication systems. Especially, we focus on the digital FIR filter. Our proposed approach can reduce the power consumption and the circuit area of the digital FIR filter by optimizing the bitwidth of the each filter coefficient with keeping the filter calculation accuracy. At first, we formulate the constraints about keeping accuracy of the filter calculations. We define the problem to find the optimized bitwidth of each filter coefficient. Our defined problem can be solved by using the commercial optimization tool. We evaluate the effects of consuming power reduction by comparing the digital FIR filters designed in the same bitwidth of all coefficients. We confirm that our approach is effective for a low power digital FIR filter.
This paper presents a novel system-level design methodology, called quality-driven design, by which application-specific optimization can be achieved; furthermore the entire functionality can be shared to maximize design reuse. As a case of study, this paper focuses on quality-driven design for video applications and introduces an output quality adaptive approach based on variable bitwidth optimization to explore a new design space. MPEG2 video is used as the driver application to illustrate the potential of the presented methodology. Experimental results show the effectiveness of the methodology.
Barry SHACKLEFORD Mitsuhiro YASUDA Etsuko OKUSHI Hisao KOIZUMI Hiroyuki TOMIYAMA Hiroto YASUURA
Entire systems on a chip (SOCs) embodying a processor, memory, and system-specific peripheral hardware are now an everyday reality. The current generation of SOC designers are driven more than ever by the need to lower chip cost, while at the same time being faced with demands to get designs to market more quickly. It was to support this new community of designers that we developed Satsuki-an integrated processor synthesis and compiler generation system. By allowing the designer to tune the processor design to the bitwidth and performance required by the application, minimum cost designs are achieved. Using synthesis to implement the processor in the same technology as the rest of the chip, allows for global chip optimization from the perspective of the system as a whole and assures design portability. The integral compiler generator, driven by the same parameters used for processor synthesis, promotes high-level expression of application algorithms while at the same time isolating the application software from the processor implementation. Synthesis experiments incorporating a 0.8 micron CMOS gate array have produced designs ranging from a 45 MHz, 1,500 gate, 8-bit processor with a 4-word register file to a 31 MHz, 9,800 gate, 32-bit processor with a 16-word register file.
Hiroto YASUURA Mitsumasa KOYANAGI
Hisao KOIZUMI Katsuhiko SEO Fumio SUZUKI Yoshisuke OHTSURU Hiroto YASUURA
In this paper we propose a co-design method for control systems using combination of models. By co-design," we mean a cooperative design method in which the behavior of the entire system is simulated as a single model while parameters of the system are being optimized. Our co-design method enables the various subsystems in the system, which have been designed independently as tasks assigned to different designers in the traditional design method, to be designed simultaneously in a unified cooperative way from the system-wide perspective of a system designer. Our proposed method combines models of controlling and controlled subsystems into a single model for the behavior of the entire control system. After the optimum control conditions are determined through simulation of the combined models, based on the corresponding algorithms and parameters, ASIC design proceeds quickly with accurate verification using iterative replacements of the behavior model by the electronic circuit model. To evaluate the proposed method, we implemented a design environment. We then applied our method to the design of ASICs in three test cases (in a control system and in audio-visual systems) to investigate its effectiveness. This paper introduces the concepts of the proposed co-design method, the design environment and the experimental results, and points out the new issues for system design.
Akihiko HYODO Masanori MUROYAMA Hiroto YASUURA
This paper presents a variable pipeline depth processor, which can dynamically adjust its pipeline depth and operating voltage at run-time, we call dynamic pipeline and voltage scaling (DPVS), depending on the workload characteristics under timing constraints. The advantage of adjusting pipeline depth is that it can eliminate the useless energy dissipation of the additional stalls, or NOPs and wrong-path instructions which would increase as the pipeline depth grow deeper in excess of the inherent parallelism. Although dynamic voltage scaling (DVS) is a very effective technique in itself for reducing energy dissipation, lowering supply voltage also causes performance degradation. By combining with dynamic pipeline scaling (DPS), it would be possible to retain performance at required level while reducing energy dissipation much further. Experimental results show the effectiveness of our DPVS approach for a variety of benchmarks, reducing total energy dissipation by up to 64.90% with an average of 27.42% without any effect on performance, compared with a processor using only DVS.
This paper presents a novel low-energy memory design technique based on variable analysis for on-chip data memory (RAM) in application-specific systems, which called VAbM technique. It targets the exploitation of both data locality and effective data width of variables to reduce energy consumed by data transfer and storage. Variables with higher access frequency and smaller effective data width are assigned into a smaller low-energy memory with fewer bit lines and word lines, placed closer the processor. Under constraints of the number of memory banks, VAbM technique use variable analysis results to perform allocating and assigning on-chip RAM into multiple banks, which have different size with different number of word lines and different number of bit lines tailored to each application requirements. Experimental results with several real embedded applications demonstrate significant energy reduction up to 64.8% over monolithic memory, and 27.7% compared to memory designed by memory banking technique.
In this paper, we discuss on accuracy of power dissipation medels for CMOS VLSI circuits. Some researchers have proposed several efficient power estimation methods for CMOS circuits. However, we do not know how accurate they are because we have not established a method to compare the estimated results of power consumption with power consumption of actual VLSI chips. To evaluate the accuracy of several kinds of power dissipation models in chip-level, block-level and gate-lebel etc., we have been (i) Measuring power consumtion of actual microprocessors, (ii) Estimating power consumption with several kinds of power dissipation models, and (iii) Comparing (i) with (ii). The experimental results show as follows: (1) Power estimation at gate level is accurate enough. (2) Estimating power of a clock tree independently makes estimation more accurate. (3) Area of each functional block is a good approximation of load capacitance of the block.
Akihiko INOUE Hiroyuki TOMIYAMA Takanori OKUMA Hiroyuki KANBARA Hiroto YASUURA
The datapath width of a core processor has a strong effect on cost, power consumption, and performance of an embedded system integrated with memories into a single-chip. However, it is difficult for designers to appropriately determine the datapath width for each application because of the limited reusability of software and the lack of compilation techniques. The purpose of this paper is to clarify supports required from software for the optimal datapath width determination. As a solution, an embedded programming language, called Valen-C, and a retargetable Valen-C compiler are proposed. In this paper, the syntax and semantics of Valen-C along with the mechanism of the Valen-C retargetable compiler and how to preserve the accuracy of computation of programs in relation to various datapath widths are also described. Experiments with practical applications show that the total cost of the system including a core processor, ROM, and RAM is drastically reduced with little performance loss by reducing the datapath width.
Hiroyuki TOMIYAMA Tohru ISHIHARA Akihiko INOUE Hiroto YASUURA
In many embedded systems, a significant amount of power is consumed for off-chip driving because off-chip capacitances are much larger than on-chip capacitances. This paper proposes instruction scheduling techniques to reduce power consumed for off-chip driving. The techniques minimize the switching activity of a data bus between an on-chip cache and a main memory when instruction cache misses occur. The scheduling problem is formulated and two scheduling algorithms are presented. Experimental results demonstrate the effectiveness and the efficiency of the proposed algorithms.
Mohammad Mesbah UDDIN Yasunobu NOHARA Daisuke IKEDA Hiroto YASUURA
A multi-application smart card system consists of an issuer, service vendors and cardholders, where cardholders are recipients of smart cards (from the issuer) to be used in connection with applications offered by service vendors. Authentic post-issuance program modification is necessary for a multi-application smart card system because applications in the system are realized after the issuance of a smart card. In this paper, we propose a system where only authentic modification is possible. In the proposed system, the smart card issuer stores a unique long bitstring called PID in a smart card. The smart card is then given to the cardholder. A unique substring of the PID (subPID) is shared between the cardholder and a corresponding service vendor. Another subPID is shared between the issuer and the cardholder. During program modification, a protocol using the subPIDs, a one-way hash function and a pseudorandom number generator function verifies the identity of the parties and the authenticity of the program.
Barry SHACKLEFORD Mitsuhiro YASUDA Etsuko OKUSHI Hisao KOIZUMI Hiroyuki TOMIYAMA Akihiko INOUE Hiroto YASUURA
Entire systems embedded in a chip and consisting of a processor, memory, and system-specific peripheral hardware are now commonly contained in commodity electronic devices. Cost minimization of these systems is of paramount economic importance to manufactures of these devices. By employing a variable configuration processor in conjunction with a multi-precision compiler generator, we show that there are situations in which considerable system cost reduction can be obtained by synthesizing a CPU that is narrower than the largest variable in the application program.
Masahiko OHMURA Hiroto YASUURA Keikichi TAMARU
Behavioral extraction from circuit description is a useful technique for logic design verification. We have proposed a technique of extraction from combinational circuits and developed a prototype system. To use this system practically, it is necessary to deal with sequential circuits. In this paper, we will present a new technique to extract behavioral descriptions from synchronous sequential circuits which include some flip-flops. Flip-flops are classified to two types. The one is a part of control registers. The other is a part of data registers. Behavior of the circuit with control registers is described by the state transition. Behavior of the circuit with data registers is described by the movement of data among registers. There are many circuits, as micro processors, which realize a function after some times of state transitions occurred. In such circuits, it is more important to abstract the function than to extract each state transition. We have progressed our system to extract such behaviors.
This paper presents Power-Pro architecture (Programmable Power Management Architecture), a novel processor architecture for power reduction. The Power-Pro architecture has two key functionalities: (i) Supply voltage and clock frequency of a microprocessor can be dynamically varied, and (ii) active datapath width can be dynamically adjusted to the precision of each operation. The most unique point of this architecture is that software programmers can directly specify the requirements of applications such as real-time constraints and precision of the operations. To make programmable power management possible, Power-Pro architecture equips special instructions. Programmers can vary the supply voltage, the clock frequency and the active datapath width dynamically by the instructions. Experimental results show that power consumption for a variety of applications are dramatically reduced by the Power-Pro architecture.