In low-power embedded processors design, stand-by power constraint is also important with average power and operation frequency. Multi-threshold-voltage cells are used in the design and the ratio of low-Vth cells should be controlled. On the other hand, physical synthesis flow is indispensable to achieve high performance and short design time. This paper proposes a physical synthesis methodology under the restriction of maximum low-Vth cell ratio. The experimental results show that our method can achieve only 4 MHz slower logic within 5% margin of the target low-Vth ratio. We have applied this design flow in an application processor design and the designed processor demonstrates 360 MIPS at 200 MHz only with 80 mW at 1.0 V, namely 4500 MIPS/W and 4.2 mA leakage current without any power-cut mode.
This paper presents a Quiet-Noisy scan technique for low power delay fault testing. The novel scan cell design provides both the quiet and noisy scan modes. The toggling of scan cell outputs is suppressed in the quiet scan mode so the power is saved. Two-pattern tests are applied in the noisy scan mode so the delay fault testing is possible. The experimental data shows that the Quiet-Noisy scan technique effectively reduces the test power to 56% of that of the regular scan. The transition fault coverage is improved by 19.7% compared to an existing toggle suppression low power technique. The presented technique requires very minimal changes in the existing MUX-scan Design For Testability (DFT) methodology and needs virtually no computation. The penalties are area overhead, speed degradation, and one extra control in test mode.
Fumitoshi HATORI Hiroki ISHIKURO Mototsugu HAMADA Ken-ichi AGAWA Shouhei KOUSAI Hiroyuki KOBAYASHI Duc Minh NGUYEN
This paper describes a full-CMOS single-chip Bluetooth LSI fabricated using a 0.18 µm CMOS, triple-well, quad-metal technology. The chip integrates radio and baseband, which is compliant with Bluetooth Core Specification version 1.1. A direct modulation transmitter and a low-IF receiver architecture are employed for the low-power and low-cost implementation. To reduce the power consumption of the digital blocks, it uses a clock gating technique during the active modes and a power manager during the low power modes. The maximum power consumption is 75 mW for the transmission, 120 mW for the reception and 30 µW for the low power mode operation. These values are low enough for mobile applications. Sensitivity of -80 dBm has been achieved and the transmitter can deliver up to 4 dBm.
Masanori MUROYAMA Akihiko HYODO Takanori OKUMA Hiroto YASUURA
To transfer a small number, we inherently need a small number of bits. However all bit lines on a data bus change their status and redundant power is consumed. To reduce the redundant power consumption, we introduce a concept named active bit. In this paper, we propose a power reduction scheme for data buses using active bits. Suppressing switching activity of inactive bits, we can reduce redundant power consumption. We propose various power reduction techniques using active bits and the implementation methods. Experimental results illustrate up to 54.2% switching activity reduction.
Kwisung YOO Hoon LEE Gunhee HAN
The cable length in wired serial data communication is limited because the limited bandwidth of a long cable introduces ISI (Inter Symbol Interference). A line equalizer can be used at the receiver to extend the channel bandwidth. This paper proposes a low-power and small-area analog adaptive line equalizer for 100-Mb/s operation on UTP (Unshielded Twisted Pair) cable up to 100 m. The proposed adaptive line equalizer is fabricated with 0.35-µm CMOS process, consumes 19 mW and occupies only 0.07 mm2 Measurement results show that the prototype can operate at data rate of 100 Mb/s on a 100-m cable and 155 Mb/s on a 50-m cable.
Kun-Lin TSAI Shanq-Jang RUAN Chun-Ming HUANG Edwin NAROSKA Feipei LAI
Circuit partition, retiming and state reordering techniques are effective in reducing power consumption of circuits. In this paper, we propose a partition architecture and a methodology to reduce power consumption when designing low power IP, named PRC (Partition and Reordering Circuit). The circuit reordering synthesis flow consists of three phases: first, evenly partition the circuit based on the Shannon expansion; secondly encode the output vectors of each partition to build an equivalent functional logic. Finally, apply reordering algorithm to reorganize the logic function to reduce power consumption and decrease area cost. The validity of our architecture is proven by applying it to MCNC benchmark with simulation environment.
Hideo OHIRA Kentaro KAWAKAMI Miwako KANAMORI Yasuhiro MORITA Masayuki MIYAMA Masahiko YOSHIMOTO
In this paper, we describe a feed-forward dynamic voltage/clock-frequency control method enabling low power MPEG4 on multi-regulated voltage CPU with combining the characteristics of the CPU and the video encoding processing. This method theoretically achieves minimum low power consumption which is close to the hardware-level power consumption. Required processing performance for MPEG4 visual encoding totally depends on the activity of the sequence, and high motion sequence requires high performance and low motion sequence requires low performance. If required performance is predictable, lower power consumption can be achieved with controlling the adequate voltage and clock-frequency dynamically at every frame. The proposed method in this paper is predicting the required processing performance of a future frame using our unique feed-forward analysis method and controlling a voltage and frequency dynamically at every frame along with the forward analysis value. The simulation results indicate that the proposed feed-forward analysis method adequately predicts the required processing performance of every future frame, and enables to minimize power consumption on software basis MPEG4 visual encoding processing. In the case that CPU has Frequency-Voltage characteristics of 1.8 V @400 MHz to 1.0 V @189 MHz, the proposed method reduces the power consumption approximately 37% at high motion sequences or 65% at low motion sequences comparing with the conventional software video encoding method.
The paper covers techniques to cope with ever-increasing leakage power as well as dynamic power of CMOS VLSI's. The techniques to be presented range from software, system, circuit to device level. The novel trend is to look into the cooperative approaches between disciplines such as software-circuit cooperation and circuit-technology cooperation. The biggest challenge that System-on-a-Chip designers should meet in the future is the fact that transistors go more and more leaky in digital and memory circuits as generation advances. The topics to break through this stringent problem are described. Approaches to lower power at the system level are also discussed. The paper touches on new applications and markets which will be open up by the low-power VLSI's.
Yoshinobu HIGAMI Shin-ya KOBAYASHI Yuzo TAKAMATSU
When LSIs that are designed and manufactured for low power dissipation are tested, test vectors that make the power dissipation low should be applied. If test vectors that cause high power dissipation are applied, incorrect test results are obtained or circuits under test are permanently damaged. In this paper, we propose a method to generate test sequences with low power dissipation for sequential circuits. We assume test sequences generated by an ATPG tool are given, and modify them while keeping the original stuck-at fault coverages. The test sequence is modified by inverting the values of primary inputs of every test vector one by one. In order to keep the original fault coverage, fault simulation is conducted whenever one value of primary inputs is inverted. We introduce heuristics that perform fault simulation for a subset of faults during the modification of test vectors. This helps reduce the power dissipation of the modified test sequence. If the fault coverage by the modified test sequence is lower than that by the original test sequence, we generate a new short test sequence and add it to the modified test sequence.
Takaki YOSHIDA Masafumi WATARI
As semiconductor manufacturing technology advances, power dissipation and noise in scan testing have become critical problems. Our studies on practical LSI manufacturing show that power supply voltage drop causes testing problems during shift operations in scan testing. In this paper, we present a new testing method named MD-SCAN (Multi-Duty SCAN) which solves power supply voltage drop problems, as well as its experimental results applied to practical LSI chips.
As power consumption becoming a critical concern for System-On-a-Chip (SOC) design, accurate and efficient power analysis and estimation during the design phase at all levels of abstraction are becoming increasingly pressing in order to achieve low power without a costly redesign process. This paper surveys analysis and estimation techniques of dynamic power and leakage power for SOC design covering multiple design levels, which have been recently proposed, aiming to present a cohesive view of the power estimation techniques at all design levels of abstraction.
Satoshi KOMATSU Masahiro FUJITA
The power dissipation at the off-chip bus has become a significant part of the overall power dissipation in micro-processor based digital systems. This paper presents irredundant address bus encoding methods which reduce signal transitions on the instruction address buses by using adaptive codebook methods. These methods are based on the temporal locality and spatial locality of instruction address. Since applications tend to JUMP/BRANCH to limited sets of addresses, proposed encoding methods assign the least signal transition codes to the addresses of JUMP/BRANCH operations in the past. In addition, our methods can be easily applicable for conventional digital systems since they are irredundant encoding methods. Our encoding methods reduce the signal transitions on the instruction address buses, which results in the reduction of total power dissipation of digital systems. Experimental results show that our methods can reduce the signal transition by an average of 88%.
Takeshi FUJINO Akira YAMAZAKI Yasuhiko TAITO Mitsuya KINOSHITA Fukashi MORISHITA Teruhiko AMANO Masaru HARAGUCHI Makoto HATAKENAKA Atsushi AMO Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (
Christos DROSOS Chrissavgi DRE Spyridon BLIONAS Dimitrios SOUDRIS
The architecture and implementation of a novel processor suitable wireless terminal applications, is introduced. The wireless terminal is based on the novel dual-mode baseband processor for DECT and GSM, which supports both heterodyne and direct conversion terminal architectures and is capable to undertake all baseband signal processing, and an innovative direct conversion low power modulator/demodulator for DECT and GSM. The state of the art design methodologies for embedded applications and innovative low-power design steps followed for a single chip solution. The performance of the implemented dual mode direct conversion wireless terminal was tested and measured for compliance to the standards. The developed innovative terminal fulfils all the requirements and specifications imposed by the DECT and GSM standards.
Takashi HASHIMOTO Shunichi KUROMARU Masayoshi TOUJIMA Yasuo KOHASHI Masatoshi MATSUO Toshihiro MORIIWA Masahiro OHASHI Tsuyoshi NAKAMURA Mana HAMADA Yuji SUGISAWA Miki KUROMARU Tomonori YONEZAWA Satoshi KAJITA Takahiro KONDO Hiroki OTSUKI Kohkichi HASHIMOTO Hiromasa NAKAJIMA Taro FUKUNAGA Hiroaki TOIDA Yasuo IIZUKA Hitoshi FUJIMOTO Junji MICHIYAMA
A low power MPEG-4 video codec LSI with the capability for core profile decoding is presented. A 16-b DSP with a vector pipeline architecture and a 32-b arithmetic unit, eight dedicated hardware engines to accelerate MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing, 20-Mb embedded DRAM, and three peripheral blocks are integrated together on a single chip. MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing are realized with a hybrid architecture consisting of a programmable DSP and dedicated hardware engines at low operating frequency. In order to reduce the power consumption, clock gating technique is fully adopted in each hardware block and embedded DRAM is employed. The chip is implemented using 0.18-µm quad-metal CMOS technology, and its die area is 8.8 mm 8.6 mm. The power consumption is 90 mW at a SP@L1 codec and 110 mW at a CP@L1 decoding.
Seongmo PARK Miyoung LEE KyoungSeon SHIN Hanjin CHO Jongdae KIM Dukdong LEE
In this paper, we present a design of MPEG-4 video codec chip to reduce the power consumption using frame level clock gating, macro block level and motion estimation skip scheme. It performs 30 frames/s of codec (encoding and decoding) mode with quarter-common intermediate format (QCIF) at 27 MHz. Power consumption is 290 mW at 27 MHz operation, which is achieving 35% power saving compared to a conventional CMOS. Motion Estimation skip method is employed to reduce 32% computation load. This chip performs MPEG-4 Simple Profile Level 2 (Simple@L2) and H. 263 base mode. Its contains 388,885 gates, 662 k bits memory, and the chip size was 9.7 mm9.7 mm which was fabricated using 0.35 micron 3-layers metal CMOS technology.
Hiroshi TAKAHASHI Rimon IKENO Yutaka TOYONOH Akihiro TAKEGAMA Yasumasa IKEZAKI Tohru URASAKI Hitoshi SATOH Masayasu ITOIGAWA Yoshinari MATSUMOTO
High-speed and low-power DSPs have been developed for versatile hand set applications. The DSP contains a 16-bit fixed point DSP core with multiple buses, highly tuned instruction sets and a low-power architecture, featuring CPU power with 404.5 µ W/MHz, chip power with 2.08 mW/MHz at peak and 200 µA stand-by current and 160 MHz/160 MIPS performance by a single DSP core, and also operates at 0.68 V within the temperature range from -40C to 125C in the worst case (Weak corner) even using much higher I-off current process compared to a conventional process to obtain a faster operating frequency. In this paper, we discuss circuit design techniques to continue scaling down valuable IP cores keeping the same functionality, better speed performance, and lower power dissipation with much lower voltage operation capability. For further power reduction by DSP software, Run-time Power Control (RPC) has been demonstrated in an MP3 player using 100 MHz/100 MIPS DSP at 1.8 V, which is a real-time application running on an Internet audio evaluation module experimentally and we obtained 32-60% power reduction on various music source data.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose an instruction encoding scheme to reduce power consumption of instruction ROMs. The power consumption of the instruction ROM strongly depends on the switching activity of bit-lines due to their large load capacitance. In our approach, the binary-patterns to be assigned as op-codes are determined based on the frequency of instructions in order to reduce the number of bit-line dis-charging. Simulation results show that our approach can reduce 40% of bit-line switchings from a conventional organization.
Hiromi NOTANI Masayuki KOYAMA Ryuji MANO Hiroshi MAKINO Yoshio MATSUDA Osamu TOMISAWA Shuhei IWADE
A 64-bit 100-MHz multimedia DSP core has been designed using 0.15-µ m CMOS technology. An improved Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit with a charge pump is adopted to suppress the standby leakage current. The dynamic active current of whole chip was simulated to optimize the size of a switch for a power supply control. The DSP core chip, which integrates 300-kgate Logic, 64-kbyte SRAM and charge pump circuit, has 8-µ A standby leakage current. The reduction rate is 1/250.
In this paper, two low power hardware structures essential for MPEG-4 video codec are proposed for portable applications. First, an adaptive bit resolution control (ABRC) scheme is proposed for a processing element (PE) in a systolic-array type motion estimator (ME). By appropriately modifying the datapath of PE to exploit the correlations in pixel values, its structure is optimized in terms of both hardware cost and low power consumption. As a result, power is saved up to 29% compared with a conventional PE while the computation accuracy is preserved and the overhead is kept negligible. Second, a low power motion compensation (MC) accelerator is proposed. By embedding DRAM whose structure is optimized for low power consumption, the power consumption for external data I/Os is dramatically reduced. In addition, distributed nine-tiled block mapping (DNTBM) with partial activation scheme in the frame buffer reduces the power for accessing frame buffer up to 31% compared to a conventional 1-bank tiled mapping. With the proposed MC accelerator, MPEG-4 SP@L1 decoding system is fabricated using 0.18 µm embedded memory logic (EML) technology.