Jun FURUTA Kazutoshi KOBAYASHI Hidetoshi ONODERA
We measure neutron-induced Single Event Upsets (SEUs) and Multiple Cell Upsets (MCUs) on Flip-Flops (FFs) in a 65-nm bulk CMOS process in order to evaluate dependence of MCUs on cell distance and well-contact density using four different shift registers. Measurement results by accelerated tests show that MCU/SEU is up to 23.4% and it is exponentially decreased by the distance between latches on FFs. MCU rates can be drastically reduced by inserting well-contact arrays between FFs. The number of MCUs is reduced from 110 to 1 by inserting well-contact arrays under power and ground rails.
Jun FURUTA Kazutoshi KOBAYASHI Hidetoshi ONODERA
According to the process scaling, semiconductor devices are becoming more sensitive to soft errors since amount of critical charges are decreasing. In this paper, we propose an area/delay efficient dual modular flip-flop, which is tolerant to SEU (Single Event Upset) and SET (Single Event Transient). It is based on a "BISER" (Built-in Soft Error Resilience). The original BISER FF achieves small area but it is vulnerable to an SET pulse on C-elements. The proposed dual modular FF doubles C-elements and weak keepers between master and slave latches, which enhances SET immunity considerably with paying small area-delay product than the conventional delayed TMR FFs.
Yoichi YUYAMA Akira TSUCHIYA Kazutoshi KOBAYASHI Hidetoshi ONODERA
In this paper, we propose alternate self shielding to remove critical transitions of on-chip global interconnect. Our proposed method alternates shield and signal wires cycle by cycle. The conventional self-shielding methods need additional wires to remove critical transition by encoding. The proposed alternate self-shielding, however, requires no additional wires. We evaluate our method by simulating signal transimission with a circuit simulator. As a result, our proposed method is superior in bit rate compared to others from 10% to 75%.
Kazutoshi KOBAYASHI Akihiko HIGUCHI Hidetoshi ONODERA
Sleep transistors such as MTCMOS and SCCMOS drastically reduce leakage current, but their ON resistances cause significant performance degradation. Larger sleep transistors reduce their ON resistances, but increase leakage current in a sleep mode. Decoupling capacitors beside sleep transistors reduce leakage current. Experimental results show that PMOS SCCMOS with a 4 pF decoupling capacitor reduces leakage current by 1/673 on a 64 bit adder in a 90 nm process.
Chikara HAMANAKA Ryosuke YAMAMOTO Jun FURUTA Kanto KUBOTA Kazutoshi KOBAYASHI Hidetoshi ONODERA
We show measurement results of variation-tolerance of an error-hardened dual-modular-redundancy flip-flop fabricated in a 65-nm process. The proposed error-hardened FF called BCDMR is very strong against soft errors and also robust to process variations. We propose a shift-register-based test structure to measure variations. The proposed test structure has features of constant pin count and fast measurement time. A 65 nm chip was fabricated including 40k FFs to measure variations. The variations of the proposed BCDMR FF are 74% and 55% smaller than those of the conventional BISER FF on the twin-well and triple-well structures respectively.
Kazutoshi KOBAYASHI Noritsugu NAKAMURA Kazuhiko TERADA Hidetoshi ONODERA Keikichi TAMARU
We have developed and fabricated an LSI called the FMPP-VQ64. The LSI is a memory-based shared-bus SIMD parallel processor containing 64 PEs, intended for low bit-rate image compression using vector quantization. It accelerates the nearest neighbor search (NNS) during vector quantization. The computation time does not depend on the number of code vectors. The FMPP-VQ64 performs 53,000 NNSs per second, while its power dissipation is 20 mW. It can be applied to the mobile telecommunication system.
Kazutoshi KOBAYASHI Masao ARAMOTO Hidetoshi ONODERA
We propose a low-power resource-shared VLIW processor (RSVP) for future leaky nanometer process technologies. It consists of several single-way independent processor units (IPUs) that share parallel processor resources. Each IPU works as a variable-way VLIW processor sharing the parallel resources according to priorities of given tasks. RSVP allocates shared parallel resources to the IPUs cycle by cycle. It can minimize the number of NOPs that is wasting power. The performance per power (P3) of a 4-parallel 4-way RSVP that corresponds to four 4way VLIWs is 3.7% better than a conventional 4-parallel 4-way VLIW multiprocessor in the current 90 nm process. We estimate that the RSVP achieves 36% less leakage power and 28% better P3 in the future 25 nm process. We have fabricated an RSVP test chip that contains two IPU and a shared resource equivalent to two 2way VLIWs in a 180 nm process. It is functional at 100 MHz clock speed and its power is 130 mW.
Kazutoshi KOBAYASHI Keikichi TAMARU Hiroto YASUURA Hidetoshi ONODERA
We propose a new architecture of Functional Memory type Parallel Processor (FMPP) architectures called bit-parallel block-parallel (BPBP) FMPP. Design details of a prototype BPBP FMPP chip are also shown. FMPP is a massively parallel processor architecture that has a memory-based simple two-dimensional regular array structure suitable for memory VLSI technology. Computation space increases as integration density of memory increases. Computation time does not depend on the number of processors. So far, a bit-serial word-parallel (BSWP) implementation based on a content addressable memory (CAM) is mainly investigated as one of promising architectures of FMPP. In a BSWP FMPP, each word of a CAM works as a processor, and the amount of hardware is minimized by abopting a bit-serial operation, thus maximizing integration scale. The BSWP FMPP, however, does not allow operations between two words, which restriction limits the applicability of the BSWP FMPP. On the other hand, the proposed BPBP FMPP is designed to execute logical and arithmetic operations on two words. These operations are performed simultaneously on every group of words called a block. BPBP FMPP hereby achieves a high performance while maintaining high integration density of the BSWP, and is suitable for various applications.
Takuya KOMAWAKI Michitarou YABUUCHI Ryo KISHIDA Jun FURUTA Takashi MATSUMOTO Kazutoshi KOBAYASHI
As device sizes are downscaled to nanometer, Random Telegraph Noise (RTN) becomes dominant. It is indispensable to accurately estimate the effect of RTN. We propose an RTN simulation method for analog circuits. It is based on the charge trapping model. The RTN-induced threshold voltage fluctuation are replicated to attach a variable DC voltage source to the gate of a MOSFET by using Verilog-AMS. In recent deca-nanometer processes, high-k (HK) materials are used in gate dielectrics to decrease the leakage current. We must consider the defect distribution characteristics both in HK and interface layer (IL). This RTN model can be applied to the bimodal model which includes characteristics of the HK and IL dielectrics. We confirm that the drain current of MOSFETs temporally fluctuates in circuit-level simulations. The fluctuations of RTN are different in MOSFETs. RTN affects the frequency characteristics of ring oscillators (ROs). The distribution of RTN-induced frequency fluctuations has a long-tail in a HK process. The RTN model applied to the bimodal can replicate a long-tail distribution. Our proposed method can estimate the temporal impact of RTN including multiple transistors.
Kazutoshi KOBAYASHI Ryuta NAKANISHI Hidetoshi ONODERA
We propose an efficient motion estimation algorithm to search an additional area according to the motion of a camcorder, which is obtained from a gyro sensor. When the camcorder moves, the background moves in the opposite direction. The proposed algorithm searches three regions, one around the center, another around the predicted region and another in the background around the region associated with the camcorder motion. Compared to conventional algorithms without the last region, the proposed one reduces the amount of computation to 1/5 while maintaining or enhancing the quality.
Kazutoshi KOBAYASHI Makoto EGUCHI Takuya IWAHASHI Takehide SHIBAYAMA Xiang LI Kosuke TAKAI Hidetoshi ONODERA
We propose a vector-pipeline processor VP-DSP for low-rate videophones which can encode and decode 10 frames/sec. of QCIF through a 29.2 kbps low-rate line. We have already fabricated a VP-DSP LSI by a 0.35 µm CMOS process. The area of the VP-DSP core is 4.26 mm2. It works properly at 25 MHz/1.6 V with a power consumption of 49 mW. Its peak performance is up to 400 MOPS, 8.2 GOPS/W.
Akihiko HIGUCHI Kazutoshi KOBAYASHI Hidetoshi ONODERA
This paper proposes an instruction-level power estimation method for an embedded RISC processor, the power consumption of which fluctuates so much by applications and input data. The proposed method estimates the power consumption from the result of ISS (Instruction Set Simulator) and energy tables according to Hamming Distance of Registers (HDR) of all instructions. It is over 105 times faster than the gate-level detailed logic simulation, while the estimated power curves have the same tendency with those from the logic simulation. The proposed method realizes both accurate and fast power estimation of embedded processors.
Michitarou YABUUCHI Ryo KISHIDA Kazutoshi KOBAYASHI
We analyze the correlation between BTI (Bias Temperature Instability) -induced degradations and process variations. Those reliability issues are correlated. BTI is one of the most significant aging-degradations on LSIs. Threshold voltages of MOSFETs increase with time when biases stress their gates. It shows a strong effect of BTI on highly scaled LSIs in the same way as the process variations. The accurate prediction of the combinational effects is indispensable. We should analyze both aging-degradations and process variations of MOSFETs to explain the correlation. We measure frequencies of ROs (Ring Oscillators) of 65-nm process test circuits on two types of LSIs, ASICs and FPGAs. There are 98 and 837 ROs on our ASICs and FPGAs respectively. The frequencies of ROs follow gaussian distributions. We describe the highest frequency group as the “fast” conditon, the average group as the “typical” conditon and the lowest group as the “slow” conditon. We measure the aging-degradations of the ROs of the three conditions on the accelerated test. The degradations can be approximated by logarithmic function of stress time. The degradation at the “fast” condition has a higher impact on the frequency than the “slow” one. The correlation coefficient is 0.338. In this case, we can define a smaller design margin for BTI-induced degradations than that without considering the correlation because the degradation at the “slow” conditon is smaller than the average and the fast.
Haruki MARUOKA Masashi HIFUMI Jun FURUTA Kazutoshi KOBAYASHI
We propose a radiation-hardened Flip-Flop (FF) with stacked transistors based on the Adaptive Coupling Flip-Flop (ACFF) with low power consumption in a 65 nm FDSOI process. The slave latch in ACFF is much weaker against soft errors than the master latch. We design several FFs with stacked transistors in the master or slave latches to mitigate soft errors. We investigate radiation hardness of the proposed FFs by α particle and neutron irradiation tests. The proposed FFs have higher radiation hardness than a conventional DFF and ACFF. Neutron irradiation and α particle tests revealed no error in the proposed AC Slave-Stacked FF (AC_SS FF) which has stacked transistors only in the slave latch. We also investigate radiation hardness of the proposed FFs by heavy ion irradiation. The proposed FFs maintain higher radiation hardness up to 40 MeV-cm2/mg than the conventional DFF. Stacked inverters become more sensitive to soft errors by increasing tilt angles. AC_SS FF achieves higher radiation hardness than ACFF with the performance equivalent to that of ACFF.
Kazutoshi KOBAYASHI Kazuya KATSUKI Manabu KOTANI Yuuri SUGIHARA Yohei KUME Hidetoshi ONODERA
We have fabricated a LUT-based FPGA device with functionalities measuring within-die variations in a 90 nm process. Variations are measured using ring oscillators implemented as a configuration of the FPGA. Random variations are dominant in a 4848 configurable array laid out in a 3 mm3 mm square region. It has a functionality to measure delays on actual signal paths between flip flops by providing two clock pulses. Measured variations are used to maximize the operating frequency of each device by choosing the optimal paths. Optimizations of routing paths using a simple model circuit reveals that performance of the circuit is enhanced by 2.88% in average and a maximum of 9.34%.
Kazutoshi KOBAYASHI Hidetoshi ONODERA
This paper describes a comprehensive simulation and test environment for prototype LSI verification. We develop a Perl package, ST, for simulation & test of digital circuits. A designer can describe a testbench with the Perl syntax, which can be converted to various kinds of simulators and LSI testers. Parameters such as a target simulator/tester, cycle time and voltage levels can be changed very easily just modifying arguments of subroutines. We also develop DUT boards which consist of a tester-dependent mother board and a package-dependent daughter board. Using ST and the DUT boards, a designer can start verification just after getting fabricated LSIs.
Koichiro ISHIBASHI Nobuyuki SUGII Shiro KAMOHARA Kimiyoshi USAMI Hideharu AMANO Kazutoshi KOBAYASHI Cong-Kha PHAM
A 32bit CPU, which can operate more than 15 years with 220mAH Li battery, or eternally operate with an energy harvester of in-door light is presented. The CPU was fabricated by using 65nm SOTB CMOS technology (Silicon on Thin Buried oxide) where gate length is 60nm and BOX layer thickness is 10nm. The threshold voltage was designed to be as low as 0.19V so that the CPU operates at over threshold region, even at lower supply voltages down to 0.22V. Large reverse body bias up to -2.5V can be applied to bodies of SOTB devices without increasing gate induced drain leak current to reduce the sleep current of the CPU. It operated at 14MHz and 0.35V with the lowest energy of 13.4 pJ/cycle. The sleep current of 0.14µA at 0.35V with the body bias voltage of -2.5V was obtained. These characteristics are suitable for such new applications as energy harvesting sensor network systems, and long lasting wearable computers.
Mitsuhiko IGARASHI Yuuki UCHIDA Yoshio TAKAZAWA Makoto YABUUCHI Yasumasa TSUKAMOTO Koji SHIBUTANI Kazutoshi KOBAYASHI
In this paper, we present an analysis of local variability of bias temperature instability (BTI) by measuring Ring-Oscillators (RO) on various processes and its impact on logic circuit and SRAM. The evaluation results based on measuring ROs of a test elementary group (TEG) fabricated in 7nm Fin Field Effect Transistor (FinFET) process, 16/14nm generation FinFET processes and a 28nm planer process show that the standard deviations of Negative BTI (NBTI) Vth degradation (σ(ΔVthp)) are proportional to the square root of the mean value (µ(ΔVthp)) at any stress time, Vth flavors and various recovery conditions. While the amount of local BTI variation depends on the gate length, width and number of fins, the amount of local BTI variation at the 7nm FinFET process is slightly larger than other processes. Based on these measurement results, we present an analysis result of its impact on logic circuit considering measured Vth dependency on global NBTI in the 7nm FinFET process. We also analyse its impact on SRAM minimum operation voltage (Vmin) of static noise margin (SNM) based on sensitivity analysis and shows non-negligible Vmin degradation caused by local NBTI.
Kazutoshi KOBAYASHI Kazuhiko TERADA Hidetoshi ONODERA Keikichi TAMARU
We propose a real-time low-rate video compression algorithm using fixed-rate multi-stage hierarchical vector quantization. Vector quantization is suitable for mobile computing, since it demands small computation on decoding. The proposed algorithm enables transmission of 10 QCIF frames per second over a low-rate 29.2 kbps mobile channel. A frame is hierarchically divided by sub-blocks. A frame of images is compressed in a fixed rate at any video activity. For active frames, large sub-blocks for low resolution are mainly transmitted. For inactive frames, smaller sub-blocks for high resolution can be transmitted successively after a motion-compensated frame. We develop a compression system which consists of a host computer and a memory-based processor for the nearest neighbor search on VQ. Our algorithm guarantees real-time decoding on a poor CPU.