Akira FUJIMAKI Daiki HASEGAWA Yuto TAKESHITA Feng LI Taro YAMASHITA Masamitsu TANAKA
Yihao WANG Jianguo XI Chengwei XIE
Feng TIAN Zhongyuan ZHOU Guihua WANG Lixiang WANG
Yukihiro SUZUKI Mana SAKAMOTO Taiyou NAGASHIMA Yosuke MIZUNO Heeyoung LEE
Yo KUMANO Tetsuya IIZUKA
Wisansaya JAIKEANDEE Chutiparn LERTVACHIRAPAIBOON Dechnarong PIMALAI Kazunari SHINBO Keizo KATO Akira BABA
Satomitsu Imai Shoya Ishii Nanako Itaya
Satomitsu Imai Takekusu Muraoka Kaito Tsujioka
Takahide Mizuno Hirokazu Ikeda Hiroki Senshu Toru Nakura Kazuhiro Umetani Akihiro Konishi Akihito Ogawa Kaito Kasai Kosuke Kawahara
Yongshan Hu Rong Jin Yukai Lin Shunmin Wu Tianting Zhao Yidong Yuan
Kewen He Kazuya Kobayashi
Tong Zhang Kazuya Kobayashi
Yuxuan PAN Dongzhu LI Mototsugu HAMADA Atsutake KOSUGE
Shigeyuki Miyajima Hirotaka Terai Shigehito Miki
Xiaoshu CHENG Yiwen WANG Hongfei LOU Weiran DING Ping LI
Akito MORITA Hirotsugu OKUNO
Chunlu WANG Yutaka MASUDA Tohru ISHIHARA
Dai TAGUCHI Takaaki MANAKA Mitsumasa IWAMOTO
Kento KOBAYASHI Riku IMAEDA Masahiro MORIMOTO Shigeki NAKA
Yoshinao MIZUGAKI Kenta SATO Hiroshi SHIMADA
Baoquan ZHONG Zhiqun CHENG Minshi JIA Bingxin LI Kun WANG Zhenghao YANG Zheming ZHU
Kazuya TADA
Suguru KURATOMI Satoshi USUI Yoko TATEWAKI Hiroaki USUI
Yoshihiro NAKA Masahiko NISHIMOTO Mitsuhiro YOKOTA
Tsuneki YAMASAKI
Kengo SUGAHARA
Cuong Manh BUI Hiroshi SHIRAI
Hiroyuki DEGUCHI Masataka OHIRA Mikio TSUJI
Yongzhe Wei Zhongyuan Zhou Zhicheng Xue Shunyu Yao Haichun Wang
Mio TANIGUCHI Akito IGUCHI Yasuhide TSUJI
Kouji SHIBATA Masaki KOBAYASHI
Zhi Earn TAN Kenjiro MATSUMOTO Masaya TAKAGI Hiromasa SAEKI Masaya TAMURA
Koya TANIKAWA Shun FUJII Soma KOGURE Shuya TANAKA Shun TASAKA Koshiro WADA Satoki KAWANISHI Takasumi TANABE
An ultra-small (0.4
Yasuki NAKAMURA Hiroshi OKANO Atsuhiro SUGA Hiromasa TAKAHASHI
A 12.8GOPS/2.1GFLOPS that operates at 533 MHz, and which is equipped with an 8-way embedded VLIW processor fabricated with 6-layer Cu and 1-layer Al metal 0.11 µm CMOS process technology is introduced in this paper. The processor is also equipped with a 4-way integer pipeline, a 4-way floating/media pipeline, and separate 32 KB 4-way set associative instruction and data caches. It is also equipped with instruction fetch prediction, and non-aligned dual data load/store mechanisms. The performance evaluation that was successfully conducted using the MPEG2 IDCT routine and JPEG decoding program shows that it offers twice the performance of the previous work . 10.4 M transistors are integrated on a 7.8 mm
Tadayoshi ENOMOTO Akira KOTABE
A fast-motion-estimation (ME) algorithm called a "breaking-off-search (BOS)" was developed. It can improve processing speed of the full-search (FS) method by a factor of 3.4. The BOS algorithm can not only sometimes achieve better visual quality than FS, but can also solve visual degradation problems associated with conventional fast-ME algorithms whenever picture patterns change (i. e. , presence of scene changes). The power dissipation of a 0.6-µ m CMOS parallel Wallace-tree motion estimator using BOS was reduced to about 281 mW which was 1/28.7 that of the 0.6-µ m CMOS binary-tree motion estimator using FS.
Salvatore M. CARTA Luigi RAFFO
A reconfigurable coprocessor for ETSI-GSM voice coding application domain is presented, synthesized and tested. An average overall reduction of more than 55% cycles with respect to standard RISC processors with DSP features is obtained. Such improvement together with locality and temporal correlation allows a reduction of power consumption, while standard interfacing technique ensures maximum flexibility.
In this paper, two low power hardware structures essential for MPEG-4 video codec are proposed for portable applications. First, an adaptive bit resolution control (ABRC) scheme is proposed for a processing element (PE) in a systolic-array type motion estimator (ME). By appropriately modifying the datapath of PE to exploit the correlations in pixel values, its structure is optimized in terms of both hardware cost and low power consumption. As a result, power is saved up to 29% compared with a conventional PE while the computation accuracy is preserved and the overhead is kept negligible. Second, a low power motion compensation (MC) accelerator is proposed. By embedding DRAM whose structure is optimized for low power consumption, the power consumption for external data I/Os is dramatically reduced. In addition, distributed nine-tiled block mapping (DNTBM) with partial activation scheme in the frame buffer reduces the power for accessing frame buffer up to 31% compared to a conventional 1-bank tiled mapping. With the proposed MC accelerator, MPEG-4 SP@L1 decoding system is fabricated using 0.18 µm embedded memory logic (EML) technology.
Masayuki MIYAMA Osamu TOOYAMA Naoki TAKAMATSU Tsuyoshi KODAKE Kazuo NAKAMURA Ai KATO Junichi MIYAKOSHI Kousuke IMAMURA Hideo HASHIMOTO Satoshi KOMATSU Mikio YAGI Masao MORIMOTO Kazuo TAKI Masahiko YOSHIMOTO
This paper describes an ultra low power, motion estimation (ME) processor for MPEG2 HDTV resolution video. It adopts a Gradient Descent Search (GDS) algorithm that drastically reduces required computational power to 6 GOPS. A SIMD datapath architecture optimized for the GDS algorithm decreases the clock frequency and operating voltage. A low power 3-port SRAM with a write-disturb-free cell array arrangement is newly designed for image data caches of the processor. The proposed ME processor contains 7-M transistors, integrated in 4.50 mm
Keiji KIMURA Takeshi KODAKA Motoki OBATA Hironori KASAHARA
This paper describes multigrain parallel processing on OSCAR (Optimally SCheduled Advanced multiprocessoR) chip multiprocessor architecture. OSCAR compiler cooperative chip multiprocessor architecture aims at development of scalable, high effective performance and cost effective chip multiprocessor with ease of use by compiler supports. OSCAR chip multiprocessor architecture integrates simple single issue processors having distributed shared data memory for optimal use of data locality over different loops and fine grain data transfer and synchronization, local data memory for private data recognized by compiler, and compiler controllable data transfer unit for overlapping data transfer to hide data transfer overhead. This OSCAR chip multiprocessor and OSCAR multigrain parallelizing compiler have been developed simultaneously. Performance of multigrain parallel processing on OSCAR chip multiprocessor architecture is evaluated using SPEC fp 2000/95 benchmark suite. When microSPARC like single issue core is used, OSCAR chip multiprocessor architecture gives us 2.36 times speedup in fpppp, 2.64 times in su2cor, 2.88 times in turb3d, 2.98 times in hydro2d, 3.84 times in tomcatv, 3.84 times in mgrid and 3.97 times in swim respectively for four processors against single processor.
Masaaki KONDO Hiroshi NAKAMURA
In recent computer systems, a large portion of energy is consumed by on-chip cache accesses and data movement between cache and off-chip main memory. Reducing these memory system energy is indispensable for future microprocessors because power and thermal issues certainly become a key factor of limiting processor performance. In this paper, we discuss and evaluate how our architecture called SCIMA contributes to energy saving. SCIMA integrates software-controllable memory (SCM) into processor chip. SCIMA can save total memory system energy by using SCM under the support of compiler. The evaluation results reveal that SCIMA can reduce 5-50% of memory system energy and still faster than conventional cache based architecture.
Hiroshi TAKAHASHI Rimon IKENO Yutaka TOYONOH Akihiro TAKEGAMA Yasumasa IKEZAKI Tohru URASAKI Hitoshi SATOH Masayasu ITOIGAWA Yoshinari MATSUMOTO
High-speed and low-power DSPs have been developed for versatile hand set applications. The DSP contains a 16-bit fixed point DSP core with multiple buses, highly tuned instruction sets and a low-power architecture, featuring CPU power with 404.5 µ W/MHz, chip power with 2.08 mW/MHz at peak and 200 µA stand-by current and 160 MHz/160 MIPS performance by a single DSP core, and also operates at 0.68 V within the temperature range from -40
Hiromi NOTANI Masayuki KOYAMA Ryuji MANO Hiroshi MAKINO Yoshio MATSUDA Osamu TOMISAWA Shuhei IWADE
A 64-bit 100-MHz multimedia DSP core has been designed using 0.15-µ m CMOS technology. An improved Auto-Backgate-Controlled MT-CMOS (ABC-MT-CMOS) circuit with a charge pump is adopted to suppress the standby leakage current. The dynamic active current of whole chip was simulated to optimize the size of a switch for a power supply control. The DSP core chip, which integrates 300-kgate Logic, 64-kbyte SRAM and charge pump circuit, has 8-µ A standby leakage current. The reduction rate is 1/250.
A pass-transistor logic is enhanced with a bootstrap configuration for sub-1 V operation at high speed and low power. The bootstrap configuration drives the output to full swing, which accelerates the signal transition and cuts off the short-circuit current of subsequent CMOS logic gates. The asynchronous or synchronous timing sequence of the input (drain) and the control (gate) signals ensures bootstrap operation. A 1-b arithmetic logic unit (ALU) and an EXNOR gate built with the bootstrap pass-transistor logic outperforms those built with other types of pass-transistor logic. An experimental 16-b pass-transistor adder operates down to 0.4 V with a delay time of 4.2 ns and a power dissipation of 2.8 µ W/MHz at 0.5 V.
Takeshi HONDA Noboru SAKIMURA Tadahiko SUGIBAYASHI Hideaki NUMATA Sadahiko MIURA Hiromitsu HADA Shuichi TAHARA
MRAM-writing circuitry to compensate for the thermal variation of the magnetization-reversal current is proposed. The writing current of the proposed circuitry is designed to decrease in proportion to an increase in temperature. This technique prevents multiple-write failures from degrading 1 Gb MRAM yield where the standard deviation of magnetization-reversal current variation from other origins is less than 5%.
An accurate, fast delay calculation method suitable for high-performance, low-power LSI design is proposed. The delay calculation is composed of two steps: (1) the gate delay is calculated by using an effective capacitance obtained from a simple model we propose; and (2) the interconnect delay is also calculated from the effective capacitance and modified by using the gate-output transition time. The proposed delay calculation halves the error of a conventional rough calculation, achieving a computational error within 10% per gate stage. The mathematical models are simple enough that the method is suitable for quick delay calculation and logic circuit optimization in the early stages of LSI design. A delay optimization tool using this delay calculation method reduced the worst path delay of a multiply-add module by 11.2% and decreased the sizes of 58.1% of the gates.
Naoya WATANABE Fukashi MORISHITA Yasuhiko TAITO Akira YAMAZAKI Tetsushi TANIZAKI Katsumi DOSAKA Yoshikazu MOROOKA Futoshi IGAUE Katsuya FURUE Yoshihiro NAGURA Tatsunori KOMOIKE Toshinori MORIHARA Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
This paper describes an Embedded DRAM Hybrid Macro, which supports various memory specifications. The eDRAM module generator with Hybrid Macro provides more than 120,000 eDRAM configurations. This eDRAM includes a new architecture called Auto Signal Management (ASM) architecture, which automatically adjusts the timing of the control signals for various eDRAM configurations, and reduces the design Turn Around Time. An Enhanced-on-chip Tester performs the maximum 512b I/O pass/fail simultaneous judgments and the real time repair analysis. The eDRAM testing time is reduced to about 1/64 of the time required using the conventional technique. A test chip is fabricated using a 0.18 µm 4-metal embedded DRAM technology, which utilizes the triple-well, dual-Tox, and Co salicide process technologies. This chip achieves a wide voltage range operation of 1.2 V at 100 MHz to 1.8 V at 200 MHz.
Akira YAMADA Yasuhiro NUNOMURA Hiroaki SUZUKI Hisakazu SATO Niichi ITOH Tetsuya KAGEMOTO Hironobu ITO Takashi KURAFUJI Nobuharu YOSHIOKA Jingo NAKANISHI Hiromi NOTANI Rei AKIYAMA Atsushi IWABU Tadao YAMANAKA Hidehiro TAKATA Takeshi SHIBAGAKI Takahiko ARAKAWA Hiroshi MAKINO Osamu TOMISAWA Shuhei IWADE
A high-speed 32-bit RISC microcontroller has been developed. In order to realize high-speed operation with minimum hardware resource, we have developed new design and analysis methods such as a clock distribution, a bus-line layout, and an IR drop analysis. As a result, high-speed operation of 400 MHz has been achieved with power dissipation of 0.96 W at 1.8 V.
Tsutomu YOSHIMURA Kimio UEDA Jun TAKASOH Harufusa KONDOH
In this paper, we present a 10 Gbase Ethernet Transceiver that is suitable for 10 Gb/s Ethernet applications. The 10 Gbase Ethernet Transceiver LSI, which contains the high-speed interface and the fully integrated IEEE 802.3ae compliant logics, is fabricated in a 0.18 µm SOI/CMOS process and dissipates 2.9 W at 1.8 V supply. By incorporating the monolithic approach and the use of the advance CMOS process, this 10 GbE transceiver realizes a low power, low cost and compact solution for the exponentially increasing need of broadband network applications.
Peilin LIU Li JIANG Hiroshi NAKAYAMA Toshiyuki YOSHITAKE Hiroshi KOMAZAKI Yasuhiro WATANABE Hisakatsu ARAKI Kiyonori MORIOKA Shinhaeng LEE Hajime KUBOSAWA Yukio OTOBE
We have developed a low-power, high-performance MPEG-4 codec LSI for mobile video applications. This codec LSI is capable of up to CIF 30-fps encoding, making it suitable for various visual applications. The measured power consumption of the codec core was 9 mW for QCIF 15-fps codec operation and 38 mW for CIF 30-fps encoding. To provide an error-robust MPEG-4 codec, we implemented an error-resilience function in the LSI. We describe the techniques that have enabled low power consumption and high performance and discuss our test results.
Wataru TAMAMURA Koji NAKAMAE Hiromu FUJIOKA
An automatic LSI package lead inspection system for backside lead specification is proposed. The proposed system inspects not only lead backside contamination but also the mechanical lead specification such as lead pitch, lead offset and lead overhangs (variations in lead lengths). The total inspection time of a UQFP package with a lead count of 256 is less than the required time of 1 second. Our proposed method is superior to the threshold method used usually, especially for the defect between leads.
Dong-Muk CHOI Che-Young KIM Kwang-Hee KWON
This letter presents a Monte-Carlo FDTD technique to determine the scattered field from a perfectly conducting fractal surface from which the useful information on the incoherent pattern tendency could be observed. A one-dimensional fractal surface was generated by the bandlimited Weierstrass function. In order to verify the numerical results by this technique, these results are compared with those of Kirchhoff approximations, which show a good match between them. To investigate the incoherent pattern tendency involved, the dependence of the fitting curve slope on the different D and
Ji Hoon KIM Joon Hyung KIM Youn Sub NOH Song Gang KIM Chul Soon PARK
A high efficient HBT MMIC power amplifier with a new on-chip bias control circuit was proposed for PCS applications. By adjusting the quiescent current in accordance with the output power levels, the average power usage efficiency of the power amplifier is improved by a factor of 1.4. The bias controlled power amplifier, depending on low (high) output power levels, shows 62(103) mA of quiescent current, 16(28) dBm output power with 7.5(35.4)% of power-added efficiency(PAE), -46(-45) dBc of adjacent-channel power ratio (ACPR), and 23.7(26.9) dB of gain
Seung-Chan HEO Young-Chan JANG Sang-Hune PARK Hong-June PARK
An 8-bit 200 MS/s CMOS 2-stage cascaded folding/interpolating ADC chip was implemented by applying a resistor averaging/interpolating scheme at the preamplifier outputs and the differential circuits for the encoder logic block, with a 0.35-µm double-poly CMOS process. The number of preamplifiers was reduced to half by using an averaging technique with a resistor array at the preamplifier outputs. The delay time of digital encoder block was reduced from 2.2 ns to 1.3 ns by replacing the standard CMOS logic with DCVSPG and CPL differential circuits. The measured SFDR was 42.5 dB at the sampling rate of 200 MS/s for the 10.072 MHz sinusoidal input signal.
Ki-Duck CHO Heung-Sik TAE Sung-Il CHIEN
A new multi-luminance-level subfield method is proposed to reduce the low gray-level contour of an alternate current plasma display panel (AC-PDP). The minimum or maximum luminance level per sustain-cycle can be altered by simultaneously applying the proper auxiliary short pulses. As a result, the multi-luminance levels per one or two sustain pulse pairs can be expressed by properly adjusting the auxiliary short pulses for the one or two sustain-cycle subfields, thereby suppressing a low gray-level contour of AC-PDP.