Mikio ASAKURA Kazutami ARIMOTO Hideto HIDAKA Kazuyasu FUJISHIMA
In low-voltage operating DRAM, one of the most serious problems is how to maintain the sufficient charge stored in the memory cell, which is concerned with the operating margin and soft error immunity. This paper proposes a new array architecture called the Cell-plate line Connecting Complementary bit-line (C3) architecture which realizes a large signal voltage on the bit-line pair and low soft error rate (SER) without degrading the reliablity of the memory cell capacitor dielectric film. This architecture requires no unique process technology and no additional chip area. With the test device using the 16-Mb DRAM process, a 130-mV signal voltage is observed at 1.5-V power supply with 1.6 3.2-µm2 cell size. This architecture will open the path for future battery-backup and/or battery-operating high-density DRAM's.
Yoshihiro NAGURA Yoshinori FUJIWARA Katsuya FURUE Ryuji OHMURA Tatsunori KOMOIKE Takenori OKITAKA Tetsushi TANIZAKI Katsumi DOSAKA Kazutami ARIMOTO Yukiyoshi KODA Tetsuo TADA
The increase of test time of embedded DRAMs (e-DRAM) is one of the key issues of System-on-chip (SOC) device test. This paper proposes to put the repair analysis function on chip as Built In Self Repair (BISR). BISR is performed at 166 MHz as at-speed of e-DRAM with using low cost automatic test equipment (ATE). The area of the BISR is 1.7 mm2. Using error storage table form contributes to realize small area penalty of repair analysis function. e-DRAM function test time by BISR was about 20% less than the conventional method at wafer level testing. Moreover, representative samples are produced to confirm repair analysis ability. The results show that all of the samples are actually repaired by repair information generated by BISR.
Hideyuki NODA Katsumi DOSAKA Hans Jurgen MATTAUSCH Tetsushi KOIDE Fukashi MORISHITA Kazutami ARIMOTO
This paper describes a novel TCAM architecture designed for enhancing the soft-error immunity. An associated embedded DRAM and ECC circuits are placed next to TCAM macro to implement a unique methodology of recovering upset bits due to soft errors. The proposed configuration allows an improvement of soft-error immunity by 6 orders of magnitude compared with the conventional TCAM. We also propose a novel testing methodology of the soft-error rate with a fast parallel multi-bit test. In addition, the proposed architecture resolves the critical problem of the look-up table maintenance of TCAM. The design techniques reported in this paper are especially attractive for realizing soft-error immune, high-performance TCAM chips.
Hiroki SHIMANO Fukashi MORISHITA Katsumi DOSAKA Kazutami ARIMOTO
The advanced-DFM (Design For Manufacturability) RAM provides the solution for the limitation of SRAM voltage scaling down and the countermeasure of the process fluctuations. The characteristics of this RAM are the voltage scalability (@0.6 V operation) with wide operating margin and the reliability of long data retention time. The memory cell consists of 2 Cell/bit with the complementary dynamic memory operation and has the 1 Cell/bit test mode for the accelerated screening against the marginal cells. The GND bitline pre-charge sensing scheme and SSW (Sense Synchronized Write) peripheral circuit technologies are also adopted for the low voltage and DFV (Dynamic Frequency and Voltage) controllable SoC which will be strongly required from the many kinds of applications. This RAM supports the DFM functions with both good cell/bit for advanced process technologies and the voltage scalable SoC memory platform.
Akira YAMAZAKI Fukashi MORISHITA Naoya WATANABE Teruhiko AMANO Masaru HARAGUCHI Hideyuki NODA Atsushi HACHISUKA Katsumi DOSAKA Kazutami ARIMOTO Setsuo WAKE Hideyuki OZAKI Tsutomu YOSHIHARA
The voltage margin of an embedded DRAM's sense operation has been shrinking with the scaling of process technology. A method to estimate this margin would be a key to optimizing the memory array configuration and the size of the sense transistor. In this paper, the voltage margin of the sense operation is theoretically analyzed. The accuracy of the proposed voltage margin model was confirmed on a 0.13-µm eDRAM test chip, and the results of calculation were generally in agreement with the measured results.
Toru SHIMIZU Kazutami ARIMOTO Osamu NISHII Sugako OTANI Hiroyuki KONDO
Various low power technologies have been developed and applied to LSIs from the point of device and circuit design. A lot more CPU cores as well as function IPs are integrated on a single chip LSI today. Therefore, not only the device and circuit low power technologies, but software power control technologies are becoming more important to reduce active power of application systems. This paper overviews the low power technologies and defines power management platform as a combination of hardware functions and software programming interface. This paper discusses importance of the power management platform and direction of its development.
Kazunari INOUE Hideyuki NODA Kazutami ARIMOTO Hans Jurgen MATTAUSCH Tetsushi KOIDE
A signature-matching co-processor in 130 nm CMOS technology for application in the network-security field is presented. Two key search technologies, implemented with fully-parallel CAM-based search cores, enable the removal of misused packets from Giga-bit-per-second (G-bps) networks in real-time without disturbing the normal network traffic. The first technology is a thorough search through packet header as well as payload in byte-shifting manner and is capable of detecting viruses, even if they are hidden at an arbitrary position within the packet. A 1.125 Mbit ternary CAM, operated at the speed of 125 Mega-searches per second (M-sps), integrates the primary lookup table for thorough packet search. The second technology applies an additional relational search with programmable logical operations to detect recently appearing more complicated misused packets. A small 192-bit binary CAM operated at 31.25 M-sps is also included for this purpose. Power dissipation, being a major concern of CAM-based application-specific LSIs, is addressed in the light of the signature-matching application, which has a high probability of multiple matches and which doesn't require to mask individual bits of the search word. Consequently, two application-driven power-reduction methods are implemented, namely an improved pipelined search for efficiently reducing power even in the case of a large number of multiple matches, and a search-line encoding for cutting search-line related power dissipation. As a result the signature-matching co-processor features low power dissipation between 0.4 W and 1.1 W for the best case and the worst case search configurations, respectively.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.
Takeshi KUMAKI Yasuto KURODA Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.
Akira YAMAZAKI Takeshi FUJINO Kazunari INOUE Isamu HAYASHI Hideyuki NODA Naoya WATANABE Fukashi MORISHITA Katsumi DOSAKA Yoshikazu MOROOKA Shinya SOEDA Kazutami ARIMOTO Setsuo WAKE Kazuyasu FUJISHIMA Hideyuki OZAKI
A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.
Yoshifumi KAWAMURA Naoya OKADA Yoshio MATSUDA Tetsuya MATSUMURA Hiroshi MAKINO Kazutami ARIMOTO
A Field Programmable Sequencer and Memory (FPSM), which is a programmable unit exclusively optimized for peripherals on a micro controller unit, is proposed. The FPSM functions as not only the peripherals but also the standard built-in memory. The FPSM provides easier programmability with a smaller area overhead, especially when compared with the FPGA. The FPSM is implemented on the FPGA and the programmability and performance for basic peripherals such as the 8 bit counter and 8 bit accuracy Pulse Width Modulation are emulated on the FPGA. Furthermore, the FPSM core with a 4K bit SRAM is fabricated in 0.18µm 5 metal CMOS process technology. The FPSM is an half the area of FPGA, its power consumption is less than one-fifth.
Masanori HAYASHIKOSHI Hideto HIDAKA Kazutami ARIMOTO Kazuyasu FUJISHIMA
This paper describes a dual-mode sensing (DMS) scheme of a capacitor-coupled EEPROM cell. A new memory cell structure and a new sensing scheme are proposed and estimated. The new memory cell combines an EEPROM cell with a DRAM cell. The DMS Scheme utilizes the charge-mode sensing of the DRAM cell in addition to the current-mode sensing of the EEPROM cell. Using this DMS technique, the sensing speed can be enhanced by 36% at a cell current of 15 µA by virtue of the additional charge-mode sensing. Furthermore, the stress applied to the tunnel oxide of the memory transistor can be relieved by decreasing the programming voltage and shortening the programming time. Therefore, with this memory cell structure and sensing scheme, it is possible to realize high-speed sensing in low-voltage operation and high endurance.
Tsukasa OOISHI Mikio ASAKURA Hideto HIDAKA Kazutami ARIMOTO Kazuyasu FUJISHIMA
A multi-valued addressing scheme is proposed for a high speed, high packing density memory system. This scheme is a level-multiplex addressing scheme instead of standard time-multiplex addressing scheme, and provides all address signals to the DRAM at the same time without increasing the address pin counts. This scheme makes memory matrix strechable and achieves the low power dissipation using the enhanced partial array activation. The 16 Mb stretchable memory matrix DRAM (16MbSTDRAM) is examined using this addressing design. A power dissipation of 121.5 mW, access time of 30 ns, and 20 pin have been estimated for 3.3 v 16MbSTDRAM with X/Y=15/9 adress configuration. The low power battery-drive memory system for such as the note-book or the handheld-type personal computers can be realized by the STDRAMs with the multi-valued addressing scheme.
Kazutami ARIMOTO Toshihiro HATTORI Hidehiro TAKATA Atsushi HASEGAWA Toru SHIMIZU
Many embedded system application in ubiquitous network strongly require the high performance SoC with overcoming the physical limitations in the advanced CMOS. To develop these SoC, the continuous design efforts have been done. The initial efforts are the primitive level circuit technique and power switching control method for suppressing the standby currents. However, the additional physical limitations and system enhancements becomes main factors, the new design efforts have been proposed. These design efforts are the application-oriented technologies from the system level to device level. This paper introduces the self voltage controlled technique to cancel the PVT (process, voltage, and temperature) variation, power distribution and power management for cellular phone application, parallel algorithm and optimized layout DSP, and massively parallel fine-grained SIMD processor for next multimedia application. The high performance SoC for the embedded are achieved by providing the components of the system level IPs and making the application oriented SoC platform.
Tsukasa OOISHI Mikio ASAKURA Shigeki TOMISHIMA Hideto HIDAKA Kazutami ARIMOTO Kazuyasu FUJISHIMA
We propose an advanced DRAM array driving technique which can achieve low-voltage operation, which we call a well-synchronized sensing and equalizing method. This method sets the DRAM array free from the body effect, achieves a small influence of the short channel effect, and reduces the leakage current. The sense and restore amplifier and equalizer can operate rapidly under a low-voltage operating condition such as 1.0 V Vcc. Therefore, we can make determining the Vth easy for the satisfaction of the high-speed, the low-power dissipation, and a simple device structure. The well-synchronized sensing and equalizing method is applicable to low-voltage operating DRAM's with capacity of 256 Mbits and more.
Tadaaki YAMAUCHI Lance HAMMOND Oyekunle A. OLUKOTUN Kazutami ARIMOTO
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more conventional chip which only has SRAM-based on-chip cache. The DRAM-based architecture with four processors outperforms the SRAM-based architecture on floating point applications which are effectively parallelized and have large working sets. This performance difference is significantly better than that possible in a uniprocessor DRAM-based architecture, which performs only slightly faster than an SRAM-based architecture on the same applications. In addition, on multiprogrammed workloads, in which independent processes are assigned to every processor in a single chip multiprocessor, the large bandwidth of on-chip DRAM can handle the inter-access contention better. These results demonstrate that a multiprocessor takes better advantage of the large bandwidth provided by the on-chip DRAM than a uniprocessor.
Naoya WATANABE Fukashi MORISHITA Yasuhiko TAITO Akira YAMAZAKI Tetsushi TANIZAKI Katsumi DOSAKA Yoshikazu MOROOKA Futoshi IGAUE Katsuya FURUE Yoshihiro NAGURA Tatsunori KOMOIKE Toshinori MORIHARA Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
This paper describes an Embedded DRAM Hybrid Macro, which supports various memory specifications. The eDRAM module generator with Hybrid Macro provides more than 120,000 eDRAM configurations. This eDRAM includes a new architecture called Auto Signal Management (ASM) architecture, which automatically adjusts the timing of the control signals for various eDRAM configurations, and reduces the design Turn Around Time. An Enhanced-on-chip Tester performs the maximum 512b I/O pass/fail simultaneous judgments and the real time repair analysis. The eDRAM testing time is reduced to about 1/64 of the time required using the conventional technique. A test chip is fabricated using a 0.18 µm 4-metal embedded DRAM technology, which utilizes the triple-well, dual-Tox, and Co salicide process technologies. This chip achieves a wide voltage range operation of 1.2 V at 100 MHz to 1.8 V at 200 MHz.
Fukashi MORISHITA Kazutami ARIMOTO Kazuyasu FUJISHIMA Hideyuki OZAKI Tsutomu YOSHIHARA
A novel body potential-controlling technique for floating SOI CMOS circuits is proposed and verified in this study. High-speed operation is realized with a small chip size by using body-floating SOI transistors. The use of this technique allows the threshold voltage of the body-floating transistors to be varied transitionally. Therefore, the standby current of SOI CMOS logic is reduced to less than 1/50th of that required by the non-controlled operation of the body potential, and the logic operates at a high speed during the active period. There is no speed penalty for the recovery operation from the standby mode. This technique supports sub-1 V operation, which will be required by future battery-operated devices with wide-range covering.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Takeshi FUJINO Akira YAMAZAKI Yasuhiko TAITO Mitsuya KINOSHITA Fukashi MORISHITA Teruhiko AMANO Masaru HARAGUCHI Makoto HATAKENAKA Atsushi AMO Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
A low power 16 Mb embedded DRAM (eDRAM) macro is fabricated using 0.15 µm logic -based embedded DRAM process technology. A 0.5 µm2 CUB (