Masanori HASHIMOTO Hidetoshi ONODERA
We propose a transistor sizing method that downsizes MOSFETs inside a cell to eliminate redundancy of cell-based circuits as much as possible. Our method reduces power dissipation of detail-routed circuits while preserving interconnects. The effectiveness of our method is experimentally evaluated using 3 circuits. The power dissipation is reduced by 75% maximum and 60% on average without delay increase. Compared with discrete cell sizing, the proposed method reduces power dissipation furthermore by 30% on average.
A system level approach for a memory power reduction is proposed in this paper. The basic idea is allocating frequently executed object codes into a small subprogram memory and optimizing supply voltage and threshold voltage of the subprogram memory. Since large scale memory contains a lot of direct paths from power supply to ground, power dissipation caused by subthreshold leakage current is more serious than dynamic power dissipation. Our approach optimizes the size of subprogram memory, supply voltage, and threshold voltage so as to minimize memory power dissipation including static power dissipation caused by leakage current. A heuristic algorithm which determines code allocation, supply voltage, and threshold voltage simultaneously so as to minimize power dissipation of memories is proposed as well. Our experiments with some benchmark programs demonstrate significant energy reductions up to 80% over a program memory which does not employ our approach.
Technology scaling will become difficult due to power wall. On the other hand, future computer and communications technology will require further reduction in power dissipation. Since no new energy efficient device technology is on the horizon, low power CMOS design should be challenged. This paper discusses what and how much designers can do for CMOS power reduction.
Ayuko TAKAGI Kiyoshi NISHIKAWA Hitoshi KIYA
This paper propose a method for improving the image quality of motion estimation (ME) using low-bit images. By using edge-enhanced images for quantization, we can increase the accuracy of the ME and improve the image quality. It is known that using low-bit images for ME is effective for reducing power consumption but it slightly degrades image quality. The quality of the encoded image depends on the thresholds for data quantization, thus, algorithms for determining thresholds are studied. The proposed method uses linear quantization, which simply truncates the least significant bits. This method is simple without any complicated threshold calculations, and the resultant image quality is improved as much as the methods that use threshold calculations. To evaluate the effectiveness, we simulate results for image quality and estimate the power consumption using synthesis results from a VHDL model motion estimator.
This letter presents a method for reducing power dissipation in a protocol converter. The communication protocol of a VLSI chip hierarchically consists of several sub-protocols and only one of them can be actively working at any given time. In general, protocol converters are implemented by dual protocols of the initially given protocols which are to be interfaced. If the duals of those sub-protocols are implemented in separate modules, we can separate active modules and inactive modules on the fly since only one of the modules can be active at a time. The active/inactive state of a module can be monitored by the control signals that represent the execution of the protocol corresponding to the module. Power reduction can be achieved by dynamically suppressing the clock supply to inactive modules. To trade-off the power reduction rate against the area overhead, the module granularity must be properly chosen. For this purpose, we implement the duals of the atomic protocols in the same module if their state graphs share states except the initial state. Our experimental results show that this method provides significant savings in power consumption of between 18.4% and 92.1% with a 5.3% area overhead.
Wujian ZHANG Runde ZHOU Tsunehachi ISHITANI Ryota KASAI Toshio KONDO
This paper describes an improved multiresolution telescopic search algorithm (MRTlcSA) for block-matching motion estimation. The algorithm uses images with full and reduced bit resolution, and uses motion-track and adaptive-search-window strategies. Simulation results show that the proposed algorithm has low computational complexity and achieves good image quality. We have developed a systolic-architecture-based search engine that has split data paths. In the case of low bit-resolution, the throughput is increased by enhancing the operating parallelism. The new motion estimator works at a low clock frequency and a low supply voltage, and therefore has low power consumption.
Kawori TAKAKUBO Hajime TAKAKUBO Shigetaka TAKAGI Nobuo FUJII
Voltage follower is one of the most useful building blocks in analog circuits. This paper proposes a voltage follower composed of a complementary pair of p-channel MOS(PMOS) and n-channel MOS (NMOS) differential amplifiers which operates under low power supply. The proposed circuit has a rail-to-rail dynamic range by combining complementary differential amplifiers.
Kazuaki MURAKAMI Hidetaka MAGOSHI
This paper briefly surveys architectural technologies of recent or future high-performance, low-power processors for improving the performance and power/energy consumption simultaneously. Achieving both high performance and low power at the same time imposes a lot of challenges on processor design, and therefore gives us a lot of opportunities for devising new technologies. The paper also tries to provide some insights into the technology direction in future.
Hideo OHIRA Toshihisa KAMEMARU Hirokazu SUZUKI Ken-ichi ASANO Masahiko YOSHIMOTO
An architectural design of a media processor core optimized for MPEG4/H26x video codec targeted for use in mobile multimedia terminals is presented. The architecture consists of a maximum 6.4 GOPS SIMD (Single Instruction Multiple Data) processor, RISC-processor, VLC-processor, and intelligent DMA controller. The unique SIMD processor completes 2-D DCT processing in 132 clock cycles, or block matching (16 by 16 pixels) in 24 clock-cycles. VLC-processor allows the completion of 8 by 8 block run-level coding in average 10 clock cycles in the case of low bit-rates. The functions of transpose-registers in the SIMD processor, data sub-sampling technique in the DMA, or data-sliding technique between PEs (Processor Elements) in the SIMD processor eliminate a large amount of cycle loss for data handling, and extract the highest level of performance. Through the use of the above architecture and the lower power approach, CIF 30 frames/s MPEG4 Simple Profile video codec @ 100 MHz can be achieved. Estimated dissipation is as low as 280 mW. 300 kgates and 16 kBytes four port SRAM are contained on a 12 mm2 area by using 0.18 µm process technology. The combination of the RISC-processor and SIMD-processor can also operate MPEG4 core profile (shape coding) that requires flexibility and performance.
Fukashi MORISHITA Kazutami ARIMOTO Kazuyasu FUJISHIMA Hideyuki OZAKI Tsutomu YOSHIHARA
A novel body potential-controlling technique for floating SOI CMOS circuits is proposed and verified in this study. High-speed operation is realized with a small chip size by using body-floating SOI transistors. The use of this technique allows the threshold voltage of the body-floating transistors to be varied transitionally. Therefore, the standby current of SOI CMOS logic is reduced to less than 1/50th of that required by the non-controlled operation of the body potential, and the logic operates at a high speed during the active period. There is no speed penalty for the recovery operation from the standby mode. This technique supports sub-1 V operation, which will be required by future battery-operated devices with wide-range covering.
This paper reviews analog-circuit researches in the 1990's especially from an academic-side point of view with the aim of pursuing what becomes important in the 21st century. To achieve this aim a large number of articles are surveyed and more than 200 are listed in References.
Naoki KATO Yohei AKITA Mitsuru HIRAKI Takeo YAMASHITA Teruhisa SHIMIZU Fuyuhiko MAKI Kazuo YANO
Random modulation refers to the changing of the MOSFET threshold voltage cell by cell. This paper claims it is essential in sub-2-V CMOS design because it reduces the sub-threshold leakage current even in the active and sleep modes as well as in the stand-by mode. We found that a gradated modulation scheme, which gradually changes the ratio of low- Vth cells according to the path-delay, is the best approach. To achieve the minimal leakage current, the way of determining the optimum pair of threshold voltages is also described. Experimental results for microprocessor show that gradated modulation reduces sub-threshold leakage current by 75% to 90% compared to conventional single-low-threshold voltage design without degrading the performance of the circuits.
Koji INOUE Koji KAI Kazuaki MURAKAMI
This paper proposes an on-chip memory-path architecture employing the dynamically variable line-size (D-VLS) cache for high performance and low energy consumption. The D-VLS cache exploits the high on-chip memory bandwidth attainable on merged DRAM/logic LSIs by replacing a whole large cache line in one cycle. At the same time, it attempts to avoid frequent evictions by decreasing the cache-line size when programs have poor spatial locality. Activating only on-chip DRAM subarrays corresponding to a replaced cache-line size produces a significant energy reduction. In our simulation, it is observed that our proposed on-chip memory-path architecture, which employs a direct-mapped D-VLS cache, improves the ED (Energy Delay) product by more than 75% over a conventional memory-path model.
Tadahiro KURODA Tetsuya FUJITA Fumitoshi HATORI Takayasu SAKURAI
This paper describes a Variable Threshold-voltage CMOS technology (VTCMOS) which controls the threshold voltage (VTH) by means of substrate bias control. Circuit techniques to combine a switch circuit for an active mode and a pump circuit for a standby mode are presented. Design considerations, such as latch-up immunity and upper limit of reverse substrate bias, are discussed. Experimental results obtained from chips fabricated in a 0.3 µm VTCMOS technology are reported. VTH controllability including temperature dependence and influence on short channel effect, power penalty caused by the control circuit, substrate current dependence at low VTH, and substrate noise influence on circuit performance are investigated. A scaling theory is also presented for use in the discussion of future possibilities and problems involved in this technology.
Hae-Moon SEO Chang-Gene WOO Sang-Won OH Sung-Wook JUNG Pyung CHOI
This paper presents the implementation of a 3 V low power multi-rate of 156, 622, and 1244 Mbps clock and data recovery circuit (CDR) for optical communications tranceiver using new parallel clock recovery architecture based on dual charge-pump PLL. Designed circuit recovers eight-phase clock signals which are one-eighth frequency of the input signal. While the typical system uses the method that compares the input data with recovered clock, the proposed circuit compares a 1/2-bit delayed input data with the serial data generated by the recovered eight-phase clock signals. The advantage of the circuit is that the implementation is easy, since each sub blocks have one-eighth frequency of the input data signal. Morevover, since the circuit works at one-eighth frequency of the input data, it dissipates less power than conventional CMOS recovery circuit. Simulation results show that this recovery circuit can work with power dissipation of less than 40 mW with a single 3 V supply. All the simulations are based on HYUNDAI 0.65 µm N-Well CMOS double-poly double-metal technology.
This paper describes the formula for dynamic power dissipation of a track/hold circuit as a function of the input frequency, the input amplitude, the sampling frequency, the track/hold duty cycle, the power supply voltage and the hold capacitance for a sinusoidal input.
Ayuko TAKAGI Shogo MURAMATSU Hitoshi KIYA
In MPEG standard, motion estimation (ME) is used to eliminate the temporal redundancy of video frames. This ME is the most time-consuming task in the encoding of video sequences and is also the one using the most power. Using low-bit images can save power of ME and a conventional architecture fixed to a certain bit width is used for low-bit motion estimator. It is known that there is a trade-off between power and image quality. ME may be used in various situations, and the relation between demands for power or image quality will depend on those circumstances. We therefore developed an architecture for a low-bit motion estimator with adjustable power consumption. In this architecture, we can select the bit width for the input image and adjust the amount of power for ME. To evaluate its effectiveness, we designed the motion estimator by VHDL and used the synthesis results to estimate the performance.
Hiroshi TAKAHASHI Shintaro MIZUSHIMA
High-speed and low-power DSPs have been developed for versatile applications, especially for digital communications. These DSPs contain a 16-bit fixed point DSP core with multiple buses, highly tuned instruction set and low-power architecture, featuring 0.45 mA/MIPS, 100-120 MIPS performance by a single CPU core, 200 MIPS performance by dual CPU core architecture, respectively and also contain a 1.2 V low-voltage DSP core with 30 MIPS performance for super low-power applications. In this paper, new architecture VIA2 programming ROM for high-speed and new D flip-flop circuit considering the impact of pocket implantation process for low power are discussed, including key C-MOS process technology.
Kunihiro ASADA Makoto IKEDA Satoshi KOMATSU
This paper summarizes power reduction methods applicable for VLSI bus systems in terms of reduction of signal swing, effective capacitance reduction and reduction of signal transition, which have been studied in authors' research group. In each method the basic concept is reviewed quickly along with some examples of its application. A future perspective is also described in conclusion.
Low Power design has emerged as a both practically and theoretically attractive theme in modern LSI system design. This paper presents system level power optimization techniques. A brief survey of system level low power design approaches and several examples in detail are described. It reviews some techniques that have been proposed to overcome the power issue and gives guideline for prospective system level solutions.