Jun TERADA Yasuyuki MATSUYA Fumiharu MORISAWA Yuichi KADO
A very low-power, high-speed flash A/D converter front-end composed of a new latched comparator was developed. We established a butterfly sorting technique to guarantee the monotonicity of the converter. The 6-bit A/D front-end achieves a speed of 100 Msps and dynamic range of 33 dB with power consumption of only 7 mW at the supply voltage of 1 V, and the butterfly sorter guarantees 6-bit monotonicity with an extra power consumption of 1 mW.
Kawori TAKAKUBO Hajime TAKAKUBO Yohei NAGATAKE Shigetaka TAKAGI Nobuo FUJII
A mapping circuit in order to have a wider input dynamic range is proposed. MOSFET's connecting between power supply lines are employed to construct the mapping circuit. SPICE simulation is shown to evaluate the proposed circuits. With the proposed mapping circuit, two-MOSFET subtractor has a rail-to-rail input voltage. As an application, an OTA consisting of subtractors is realized by employing the proposed mapping circuits to have a rail-to-rail input voltage range.
Takashi YAMADA Shoji GOTO Norihisa TAKAYAMA Yoshifumi MATSUSHITA Yasoo HARADA Hiroto YASUURA
In wireless communication systems, low-power metrics is becoming a burdensome problem in the portable terminal design, because of portability constraints. This paper presents design architecture of a low-power Digital Matched Filter (DMF) for the direct-sequence spread-spectrum communication system such as WCDMA or wireless LAN. The proposed approach for power savings focuses on the architecture of the reception registers and the correlation-calculating unit, which dissipate the majority of the power in a DMF. The main features are asynchronous latch clock generation for the reception registers, parallelism of correlation calculation operations and bit manipulation for chip-correlation operations. A DMF is designed in compliance with the WCDMA specifications incorporating the proposed techniques, and its properties are evaluated by computer simulations at the gate level using 0.18-µm CMOS standard cell array technology. As a result, the power consumption of the proposed DMF is estimated to be 9.3 mW (@15.6 MHz, 1.6 V), which is below 40% of the power consumed by a general DMF.
Keiichi KUROKAWA Takuya YASUI Yoichi MATSUMURA Masahiko TOYONAGA Atsushi TAKAHASHI
In several researches in recent years, it is shown that the circuit of a higher clock frequency can be obtained by controlling the clock-input timing of each register. However, the power consumption of the clock-tree obtained by them tends to be larger since the locations of registers are not well taken into account in clock scheduling. In this paper, we propose a novel clock tree synthesis that attains both the higher clock frequency and the lower power consumption. Our proposed algorithm determines the clock-input timings of registers step by step in constructing a clock tree structure. First, the clock period of a circuit is improved by controlling the clock-input timing of each register, and second, the clock-input timings are modified to construct a low power clock tree without deteriorating the obtained clock period. According to our experiments using several benchmark circuits, the power consumption of our clock trees attain about 9.5% smaller than previous methods.
Takahiro KAKIMOTO Hiroyuki OCHI Takao TSUDA
As a design flow for low-power FPGA implementation, Datapath-Layout-Driven Design (DLDD) has been proposed. This letter reports the effect of DLDD for standard-cell-based ASIC implementation, and proposes necessary improvements. Experimental results shows that about 8.3% reduction of power dissipation is achieved in the best case.
In this paper a new MOS charge pump architecture is presented, where a clock generator is used in each pump stage of the charge pump circuit to elevate voltage exponentially with stages. This charge pump with a clock level shifter is designed to run at an optimized operation frequency, which can make an excellent compromise between the rise time and the dynamic power dissipation. With less stages than the linear-cascade circuit, the power dissipation and the area of the novel charge pump circuit are markedly decreased. The simulating comparison results based on 1.2 µm CMOS, p-substrate double-poly double-metal process parameters show that the nonlinear charge pump with a high pumping efficiency can supply a steady 1 mA, 16 v output for portable LCDs.
Tatsuji MATSUURA Junya KUDOH Eiki IMAIZUMI
A low-power-consumption 6-bit pipelined analog-to-digital converter for use in a BluetoothTM RF transceiver has been developed. The RF transceiver chip was fabricated using a 0.35-µm BiCMOS process, and the A/D converter is based on CMOS technology for digital logic. To reduce the power consumption of the converter, we used a look-ahead pipeline architecture to reduce the required settling time of an amplifier in the critical path of the converter. We show that through this reduction, amplifier power consumption of 600 µA can be reduced to 250 µA to achieve a 13-MHz conversion rate. We have also developed a low-power two-capacitor switched-capacitor common-mode feedback circuit which enables an offset cancellation of an amplifier during the reset phase. Offset cancellation is used in each stage of the S/H amplifier to reduce the overall offset of the converter. It achieves an effective number of bits of 5.7 at a conversion rate of 13 Msps and 5.0 at 26 Msps. The residual offset of the converter is only 4 mV. It has a low total current consumption of 3.2 mA at 13 Msps and a supply voltage of 2.8 V.
In this letter a low-complexity and low-power realization of the 2D discrete-cosine-transform and its inverse (DCT/IDCT) is presented. A VLSI circuit based on the Chen algorithm with the distributed arithmetic approach is described. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the circuit power consumption. To this aim, input signal statistics have been extracted from H.263/MPEG verification models. Finally, circuit performance is compared to known software solutions and dedicated full-custom ones.
A lithium-ion (Li-ion) battery protection IC with an average current of 0.99 µA (at a battery voltage of 3.6 V) and a standby current (after detecting over-discharge) less than 0.01 µA is presented. This low power performance is achieved via a power-on duty-cycle technique. The protection circuit samples the voltage of the battery periodically and powers down during the rest of time. This Li-ion battery protector provides over-charge, over-discharge, excess-current and short circuits protection. This protection IC was implemented in a 0.6-µm CMOS technology and the active area is 880 µm 780 µm.
Due to large capacitance, high access ratio and wide access bitwidth, frame memory is one of the most energy consuming devices in modern video encoders. This paper proposes a new architectural technique to reduce energy dissipation of frame memory through adaptive bitwith compression. Unlike related approaches, the technique utilizes the fixed order of memory accesses and data correlation of video sequences, by dynamically adjusting the memory bitwidth to the number of bits changed per pixel. Instead of treating the data bits independently, we group the most significant bits together, activating the corresponding group of bit-lines adaptively to data variation. The approach is not restricted to the specific bit-patterns nor depends on the storage phase. It works equally well on read and write accesses, as well as during precharging. Simulations show that using this method we can reduce the total energy consumption of the frame memory cell array by 20% without affecting the picture quality. The implementation scheme is simple yet compact.
Shiro DOSHO Naoshi YANAGISAWA Seiji WATANABE Takahiro BOKUI Kazuhiko NISHIKAWA
In this paper, a CMOS data recovery PLL for DVD-ROM is described. Some techniques have been introduced to alleviate the specifications required to analog circuits. A new phase detector alleviates the timing specification of a delay line and a pulse generator. A new frequency detector increases the capture range up to 8% of the center frequency. We have achieved to realize the data recovery PLL that operates at DVD-ROMx14 speed.
Media processing has become one of the dominant computing workloads. In this context, SIMD instructions have been introduced in current processors to raise performance, often the main goal of microprocessor designers. Today, however, designers have become concerned with the power consumption, and in some cases low power is the main design goal (laptops). In this paper, we show that SIMD ISA extensions on a superscalar processor can be one solution to reduce power consumption and keeping a high performance level. We reduce the average power consumption by decreasing the number of instructions, the number of cache references, and using dynamic power management to transform the speedup in performance in power consumption reduction.
Hiroshi KAWAGUCHI Gang ZHANG Seongsoo LEE Youngsoo SHIN Takayasu SAKURAI
An LSI has been fabricated and measured to demonstrate feasibility of VDD-hopping scheme in an embedded system level by executing MPEG4 CODEC. In the VDD-hopping, supply voltage of a processor is dynamically controlled by a hardware-software cooperative mechanism depending on workload of the processor. When the workload is about a half, the VDD-hopping is shown to reduce power to less than a quarter compared to the conventional fixed-VDD scheme. The power saving is achieved without degrading real-time features of MPEG4 CODEC.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called "history-based tag-comparison (HBTC) cache. " The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced by more than 90% in many applications.
With increased size and issue-width, instruction issue queue becomes one of the most energy consuming units in today's superscalar microprocessors. This paper presents a novel architectural technique to reduce energy dissipation of adaptive issue queue, whose functionality is dynamically adjusted at runtime to match the changing computational demands of instruction stream. In contrast to existing schemes, the technique exploits a new freedom in queue design, namely the voltage per access. Since loading capacitance operated in the adaptive queue varies in time, the clock cycle budget becomes inefficiently exploited. We propose to trade-off the unused cycle time with supply voltage, lowering the voltage level when the queue functionality is reduced and increasing it with the activation of resources in the queue. Experiments show that the approach can save up to 39% of the issue queue energy without large performance and area overhead.
Koji INOUE Vasily G. MOSHNYAGA Kazuaki MURAKAMI
One of uncompromising requirements from portable computing is energy efficiency, because that affects directly the battery life. On the other hand, portable computing will target more demanding applications, for example moving pictures, so that higher performance is still required. Cache memories have been employed as one of the most important components of computer systems. In this paper, we briefly survey architectural techniques for high performance, low power cache memories.
Tetsuya YAMADA Makoto ISHIKAWA Yuji OGATA Takanobu TSUNODA Takahiro IRITA Saneaki TAMAKI Kunihiko NISHIYAMA Tatsuya KAMEI Ken TATEZAWA Fumio ARAKAWA Takuichiro NAKAZAWA Toshihiro HATTORI Kunio UCHIYAMA
A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.
This paper proposes constructive timing-violation (CTV) and evaluates its potential. It can be utilized both for increasing clock frequency and for reducing energy consumption. Increasing clock frequency over that determined by the critical paths causes timing violations. On the other hand, while supply voltage reduction can result in substantial power savings, it also causes larger gate delay and thus clock must be slow down in order not to violate timing constraints of critical paths. However, if any tolerant mechanisms are provided for the timing violations, it is not necessary to keep the constraints. Rather, the violations would be constructive for high clock frequency or for energy savings. From these observations, we propose the CTV, which is supported by the tolerant mechanism based on contemporary speculative execution mechanisms. We evaluate the CTV using a cycle-by-cycle simulator and present its considerably promising potential.
Kazuyuki WADA Shigetaka TAKAGI Nobuo FUJII
A building block for widening an input range under low power-supply voltages is proposed and the block is used in a popular linearization technique for voltage-to-current converters. The block employs two MOSFETs each of which actively works when and only when the other is in cutoff region. Accurate level shift circuits for the control of the MOSFETs enable such exclusive operation. Simulation results show that the complementary MOSFETs perform as an equivalent MOSFET without any cutoff region. It is also confirmed that the novel linear voltage-to-current converter is effective for not only a wide input range but also low-power consumption.
Sungjae KIM Hyungwoo LEE Juho KIM
We present an efficient heuristic algorithm to reduce glitch power dissipation in CMOS digital circuits. In this paper, gate sizing is classified into three types and the buffer insertion is classified into two types. The proposed algorithm combines three types of gate sizing and two types of buffer insertion into a single optimization process to maximize the glitch reduction. The efficiency of our algorithm has been verified on LGSynth91 benchmark circuits with a 0.5 µm standard cell library. Experimental results show an average of 69.98% glitch reduction and 28.69% power reduction that are much better than those of gate sizing and buffer insertion performed independently.