In-Young CHUNG Seong Yeol JEONG Sung Min SEO Myungjin LEE Taesu JANG Seon-Yong CHA Young June PARK
New concept of CMOS nonvolatile memory is presented with demonstration of cell implementations. The memory cell, which is a comparator basically, makes use of comparator offset for storage quantity and the FN stress phenomena for cell programming. We also propose the stress-packet operation which is the relevant programming method to finely control the offset of the memory cell. The memory cell is multiple-time programmable while it is implemented in a standard CMOS process. We fabricated the memory cell arrays of the latch comparator and demonstrated that it is rewritten several times. We also investigated the reliability of cell data retention by monitoring programmed offsets for several months.
Akio OHTA Yuta GOTO Shingo NISHIGAKI Guobin WEI Hideki MURAKAMI Seiichiro HIGASHI Seiichi MIYAZAKI
We have studied resistance-switching properties of RF sputtered Si-rich oxides sandwiching with Pt electrodes. By sweeping bias to the top Pt electrode, non-polar type resistance switching was observed after a forming process. In comparison to RF sputtered TiOx case, significant small current levels were obtained in both the high resistance state (HRS) and the low resistance state (LRS). And, even with decreasing SiOx thickness down to 8 nm from 40 nm, the ON/OFF ratio in resistance-switching between HRS and LRS as large as 103 was maintained. From the analysis of current-voltage characteristics for Pt/SiOx on p-type Si(100) and n-type Si(100), it is suggested that the red-ox (REDction and OXidation) reaction induced by electron fluence near the Pt/SiOx interface is of importance for obtaining the resistance-switching behavior.
Se Hwan PARK Yoon KIM Wandong KIM Joo Yun SEO Hyungjin KIM Byung-Gook PARK
We propose a new three-dimensional (3D) NAND flash memory array having Tied Bit-line and Ground Select Transistor (TiGer) [1]. Channels are stacked in the vertical direction to increase the memory density without the device size scaling. To distinguish stacked channels, a novel operation scheme is introduced instead of adding supplementary control gates. The stacked layers are selected by using ground select line (GSL) and common source line (CSL). Device structure and fabrication process are described. Operation scheme and simulation results for program inhibition are also discussed.
In this paper, a 60 nm-thick ferroelectric film of poly(vinylidene fluoride–trifluoroethylene) on a flexible substrate of aluminum foil was fabricated and characterized. Compared to pristine silicon wafer, Al-foil has very large root-mean-square (RMS) roughness, thus presenting challenges for the fabrication of flat and uniform electronic devices on such a rough substrate. In particular, RMS roughness affects the leakage current of dielectrics, the uniformity of devices, and the switching time in ferroelectrics. To avoid these kinds of problems, a new thin film fabrication method adopting a detach-and-transfer technique has been developed. Here, 'detach' means that the ferroelectric film is detached from a flat substrate (sacrificial substrate), and 'transfer' refers to the process of the detached film being moved onto the rough substrate (main substrate). To characterize the dielectric property of the transferred film, polarization and voltage relationships were measured, and the results showed that a hysteresis loop could be obtained with low leakage current.
The fringe field effects of nano-electromechanical (NEM) nonvolatile memory cells have been investigated analytically for the accurate evaluation of NEM memory cells. As the beam width is scaled down, fringe field effect becomes more severe. It has been observed that pull-in, release and hysteresis voltage decrease more than our prediction. Also, the fringe field on cell characteristics has been discussed.
Recently, the 3-D vertical Floating Gate (FG) type NAND cell arrays with the Sidewall Control Gate (SCG), such as ESCG, DC-SF and S-SCG, are receiving attention to overcome the reliability issues of Charge Trap (CT) type device. Using this novel cell structure, highly reliable flash cell operations were successfully implemented without interference effect on the FG type cell. However, the 3-D vertical FG type cell has large cell size by about 60% for the cylindrical FG structure. In this point of view, we intensively investigate the scalability of the FG width of the 3-D vertical FG NAND cells. In case of the planar FG type NAND cell, the FG height cannot be scaled down due to the necessity of obtaining sufficient coupling ratio and high program speed. In contrast, for the 3-D vertical FG NAND with SCG, the FG is formed cylindrically, which is fully covered with surrounded CG, and very high CG coupling ratio can be achieved. As results, the scaling of FG width of the 3-D vertical FG NAND cell with S-SCG can be successfully demonstrated at 10 nm regime, which is almost the same as the CT layer of recent BE-SONOS NAND.
Turbo codes suffer from high decoding latency which hinders their utilization in many communication systems. Parallel decodable turbo codes (PDTCs) are suitable for parallel decoding and hence have low latency. In this article, we analyze the worst case minimum distance of parallel decodable turbo codes with both S-random interleaver and memory collision free Row-Column S-random interleaver. The effect of minimum distance on code performance is determined through computer simulations.
Sang-Youl LEE Seung-Dong YANG Jae-Sub OH Ho-Jin YUN Kwang-Seok JEONG Yu-Mi KIM Hi-Deok LEE Ga-Won LEE
In this paper, we fabricated a gate-all-around bandgap-engineered (BE) silicon-oxide-nitride-oxide-silicon (SONOS) and silicon-oxide-high-k-oxide-silicon (SOHOS) flash memory device with a vertical silicon pillar type structure for a potential solution to scaling down. Silicon nitride (Si3N4) and hafnium oxide (HfO2) were used as trapping layers in the SONOS and SOHOS devices, respectively. The BE-SOHOS device has better electrical characteristics such as a lower threshold voltage (VTH) of 0.16 V, a higher gm.max of 0.593 µA/V and on/off current ratio of 5.76108, than the BE-SONOS device. The memory characteristics of the BE-SONOS device, such as program/erase speed (P/E speed), endurance, and data retention, were compared with those of the BE-SOHOS device. The measured data show that the BE-SONOS device has good memory characteristics, such as program speed and data retention. Compared with the BE-SONOS device, the erase speed is enhanced about five times in BE-SOHOS, while the program speed and data retention characteristic are slightly worse, which can be explained via the many interface traps between the trapping layer and the tunneling oxide.
The Viterbi algorithm is widely used for decoding of the convolutional codes. The trace-back method is preferable to the register exchange method because of lower power consumption especially for convolutional codes with many states. A drawback of the conventional trace-back is that it generally requires long latency to obtain the decoded data. In this paper, a method of the trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and a shorer decode latency than the conventional method.
Kousuke MIYAJI Ryoji YAJIMA Teruyoshi HATANAKA Mitsue TAKAHASHI Shigeki SAKAI Ken TAKEUCHI
Initialize and weak-program erasing scheme is proposed to achieve high-performance and high-reliability Ferroelectric (Fe-) NAND flash solid-state drive (SSD). Bit-by-bit erase VTH control is achieved by the proposed erasing scheme and history effects in Fe-NAND is also suppressed. History effects change the future erase VTH shift characteristics by the past program voltage. The proposed erasing scheme decreases VTH shift variation due to history effects from ±40% to ±2% and the erase VTH distribution width is reduced from over 0.4 V to 0.045 V. As a result, the read and VPASS disturbance decrease by 42% and 37%, respectively. The proposed erasing scheme is immune to VTH variations and voltage stress. The proposed erasing scheme also suppresses the power and bandwidth degradation of SSD.
Yasuyuki OISHI Shigekazu KIMURA Eisuke FUKUDA Takeshi TAKANO Daisuke TAKAGO Yoshimasa DAIDO Kiyomichi ARAKI
This paper describes a method to design a predistorter (PD) for a GaN-FET power amplifier (PA) by using nonlinear parameters extracted from measured IMD which has asymmetrical peaks peculiar to a memory effect with a second-order lag. While, computationally efficient equations have been reported by C. Rey et al. for the memory effect with a first-order lag. Their equations are extended to be applicable to the memory effect with the second-order lag. The extension provides a recursive algorithm for cancellation signals of the PD each of which updating is made by using signals in only two sampling points. The algorithm is equivalent to a memory depth of two in computational efficiency. The required times for multiplications and additions are counted for the updating of all the cancellation signals and it is confirmed that the algorithm reduces computational intensity lower than half of a memory polynomial in recent papers. A computer simulation has clarified that the PD improves the adjacent channel leakage power ratio (ACLR) of OFDM signals with several hundred subcarriers corresponding to 4G mobile radio communications. It has been confirmed that a fifth-order PD is effective up to a higher power level close to 1 dB compression. The improvement of error vector magnitude (EVM) by the PD is also simulated for OFDM signals of which the subcarrier channels are modulated by 16 QAM.
Rong-Long WANG Li-Qing ZHAO Xiao-Fan ZHOU
Ant Colony Optimization (ACO) is one of the most recent techniques for solving combinatorial optimization problems, and has been unexpectedly successful. Therefore, many improvements have been proposed to improve the performance of the ACO algorithm. In this paper an ant colony optimization with memory is proposed, which is applied to the classical traveling salesman problem (TSP). In the proposed algorithm, each ant searches the solution not only according to the pheromone and heuristic information but also based on the memory which is from the solution of the last iteration. A large number of simulation runs are performed, and simulation results illustrate that the proposed algorithm performs better than the compared algorithms.
Hasitha Muthumala WAIDYASOORIYA Yosuke OHBAYASHI Masanori HARIYAMA Michitaka KAMEYAMA
Accelerator cores in low-power heterogeneous processors have on-chip local memories to enable parallel data access. The memory capacities of the local memories are very small. Therefore, the data should be transferred from the global memory to the local memories many times. These data transfers greatly increase the total processing time. Memory allocation technique to increase the data sharing is a good solution to this problem. However, when using reconfigurable cores, the data must be shared among multiple contexts. However, conventional context partitioning methods only consider how to reuse limited hardware resources in different time slots. They do not consider the data sharing. This paper proposes a context partitioning method to share both the hardware resources and the local memory data. According to the experimental results, the proposed method reduces the processing time by more than 87% compared to conventional context partitioning techniques.
Wei-Neng WANG Kai NI Jian-She MA Zong-Chao WANG Yi ZHAO Long-Fa PAN
The wear leveling is a critical factor which significantly impacts the lifetime and the performance of flash storage systems. To extend lifespan and reduce memory requirements, this paper proposed an efficient wear leveling without substantially increasing overhead and without modifying Flash Translation Layer (FTL) for huge-capacity flash storage systems, which is based on selective replacement. Experimental results show that our design levels the wear of different physical blocks with limited system overhead compared with previous algorithms.
Memory accesses are a major cause of energy consumption for embedded systems. This paper presents the implementation of a fully software technique which places stack and static data into a scratch-pad memory (SPM) in order to reduce the energy consumed by the processor while accessing them. Since an SPM is usually too small to include all these data, some of them must be left into the external main memory (MM). Therefore, further energy reduction is achieved by moving some stack data between both memories at run time. The technique employs integer linear programming in order to find at compile time the optimal placement of static data and management of the stack and implements it by inserting stack operations inside the code. Experimental results show that with an SPM of only 1 KB, our technique is able to exploit it for reducing the energy consumption related to the static and stack data accesses by more than 90% for several applications and on an average by 57% compared to the case where these data are fully placed into the main memory.
Shunsuke OKUMURA Yuki KAGIYAMA Yohei NAKATA Shusuke YOSHIMOTO Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper proposes 7T SRAM which realizes block-level simultaneous copying feature. The proposed SRAM can be used for data transfer between local memories such as checkpoint data storage and transactional memory. The 1-Mb SRAM is comprised of 32-kb blocks, in which 16-kb data can be copied in 33.3 ns at 1.2 V. The proposed scheme reduces energy consumption in copying by 92.7% compared to the conventional read-modify-write manner. By applying the proposed scheme to transactional memory, the number of write back cycles is possibly reduced by 98.7% compared with the conventional memory system.
Seungjae BAEK Heekwon PARK Jongmoo CHOI
In this paper, we propose three techniques to improve the performance of YAFFS (Yet Another Flash File System), while enhancing the reliability of the system. Specifically, we first propose to manage metadata and user data separately on segregated blocks. This modification not only leads to the reduction of the mount time but also reduces the garbage collection time. Second, we tailor the wear-leveling to the segregated metadata and user data blocks. That is, worn out blocks between the segregated blocks are swapped, which leads to more evenly worn out blocks increasing the lifetime of the system. Finally, we devise an analytic model to predict the expected garbage collection time. By accurately predicting the garbage collection time, the system can perform garbage collection at more opportune times when the user's perceived performance may not be negatively affected. Performance evaluation results based on real implementations show that our modifications enhance performance and reliability without incurring additional overheads. Specifically, the YAFFS with our proposed techniques outperforms the original YAFFS by six times in terms of mount speed and five times in terms of benchmark performance, while reducing the average erase count of blocks by 14%.
Won-young CHUNG Jae-won PARK Seung-Woo LEE Won Woo RO Yong-surk LEE
The message passing interface (MPI) broadcast communication commonly causes a severe performance bottleneck in multicore system that uses distributed memory. Thus, in this paper, we propose a novel algorithm and hardware structure for the MPI broadcast communication to reduce the bottleneck situation. The transmission order is set based on the state of each processing node that comprises the multicore system, so the novel algorithm minimizes the performance degradation caused by conflict. The proposed scoreboard MPI unit is evaluated by modeling it with SystemC and implemented using VerilogHDL. The size of the proposed scoreboard MPI unit occupies less than 1.03% of the whole chip, and it yields a highly improved performance up to 75.48% as its maximum with 16 processing nodes. Hence, with respect to low-cost design and scalability, this scoreboard MPI unit is particularly useful towards increasing overall performance of the embedded MPSoC.
Myeongwoon JEON Kyungchul KIM Sungkyu CHUNG Seungjae CHUNG Beomju SHIN Jungwoo LEE
NAND multilevel cell flash memory devices are gaining popularity because they can increase the memory capacity by storing two or more bits to a single cell. However, when the number of levels of a cell increases, the inter-cell interference which shifts threshold voltage becomes more critical. There are two approaches to alleviate the errors caused by the voltage shift. One is the error correcting codes, and the other is the signal processing methods. In this paper, we focus on signal processing methods to reduce the inter-cell interference which causes the voltage shift, and propose two algorithms which reduce the voltage shift effects by adjusting read voltages. The simulation results show that the proposed algorithms are effective for interference mitigation.
This paper presents a low-complexity multi-mode fast Fourier transform (FFT) processor for Digital Video Broadcasting-Terrestrial 2 (DVB-T2) systems. DVB-T2 operations need 1K/2K/4K/8K/16K/32K-point multiple mode FFT processors. The proposed architecture employs pipelined shared-memory architecture in which radix-2/22/23/24 FFT algorithms, multi-path delay commutator (MDC), and a novel data scaling approach are exploited. Based on this architecture, a novel low-cost data scaling unit is proposed to increase area efficiency, and an elaborate memory configuration scheme is designed to make single-port SRAM without degrading throughput rate. Also, new scheduling method of twiddle factor is proposed to reduce the area. The SQNR performance of 32K-point FFT mode is about 45.3 dB at 11-bit internal word length for 256QAM modulation. The proposed FFT processor has a lower hardware complexity and memory size compared to conventional FFT processors.