1-10hit |
Yoshiki TAKAI Mamoru FUKUCHI Chihiro MATSUI Reika KINOSHITA Ken TAKEUCHI
This paper analyzes the optimal SSD configuration including emerging non-volatile memories such as quadruple-level cell (QLC) NAND flash memory [1] and storage class memories (SCMs). First, SSD performance and SSD endurance lifetime of hybrid SSD are evaluated in four configurations: 1) single-level cell (SLC)/QLC NAND flash, 2) SCM/QLC NAND flash, 3) SCM/triple-level cell (TLC)/QLC NAND flash and 4) SCM/TLC NAND flash. Furthermore, these four configurations are compared in limited cost. In case of cold workloads or high total SSD cost assumption, SCM/TLC NAND flash hybrid configuration is recommended in both SSD performance and endurance lifetime. For hot workloads with low total SSD cost assumption, however, SLC/QLC NAND flash hybrid configuration is recommended with emphasis on SSD endurance lifetime. Under the same conditions as above, SCM/TLC/QLC NAND flash tri-hybrid is the best configuration in SSD performance considering cost. In particular, for prxy_0 (write-hot workload), SCM/TLC/QLC NAND flash tri-hybrid achieves 67% higher IOPS/cost than SCM/TLC NAND flash hybrid. Moreover, the configurations with the highest IOPS/cost in each workload and cost limit are picked up and analyzed with various types of SCMs. For all cases except for the case of prxy_1 with high total SSD cost assumption, middle-end SCM (write latency: 1us, read latency: 1us) is recommended in performance considering cost. However, for prxy_1 (read-hot workload) with high total SSD cost assumption, high-end SCM (write latency: 100ns, read latency: 100ns) achieves the best performance.
Hirofumi TAKISHITA Yutaka ADACHI Chihiro MATSUI Ken TAKECUHI
NAND flash memories used in solid-state drives (SSDs) will be replaced with storage-class memories (SCMs), which are comparable with NAND flash in their cost, and with DRAM in their speed. This paper describes the performance difference of the SCM/NAND flash hybrid SSD and the SCM-based SSD with between sector-unit read (512 Byte) and page-unit read (16 KByte, NAND flash page-size) using synthetic and real workload. Also, effect of the SCM read-unit size on SSD performance are analyzed. When SCM write/read latency is 0.1 us, performance difference of the SCM/NAND flash hybrid SSD with between page- and sector-unit read is about 1% and 6% at most for the write-intensive and read-intensive workloads, respectively. However, performance of the SCM-based SSD is significantly improved when sector-unit read is used because extra read latency does not occur. Especially, the SCM-based SSD IOPS is improved by 131% for proj_3 (read-hot-random), because its read request size is small but its read request ratio is large. This paper also shows IOPS of SCM-based SSD write/read with sector-unit read can be predicted by the average write/read request size of workloads.
Yusuke YAMAGA Chihiro MATSUI Yukiya SAKAKI Ken TAKEUCHI
In order to reduce the memory cell errors in real-usage of NAND flash-based SSD, real usage-based precise reliability test for NAND flash of SSDs has been proposed. Reliability of the NAND flash memories of the SSDs is seriously degraded as the scaling of memory cells. However, conventional simple reliability tests of read-disturb and data-retention cannot give the same result as the real-life VTH shift and memory cell errors. To solve this problem, the proposed reliability test precisely reproduces the real memory cell failures by emulating the complicated read, write, and data-retention with SSD emulator. In this paper, the real-life VTH shift and memory cell errors between two generations of NAND flash memory with different characterized real workloads are provided. Using the proposed test method, 1.6-times BER difference is observed when write-cold and read-hot workload (hm_1) and write-hot and read-hot workload (prxy_1) are compared in 1Ynm MLC NAND flash. In addition, by NAND flash memory scaling from 1Xnm to 1Ynm generations, the discrepancy of error numbers between the conventional reliability test result and actual reliability measured by proposed reliability test is increased by 6.3-times. Finally, guidelines for read reference voltage shifts and strength of ECCs are given to achieve high memory cell reliability for various workloads.
Mamoru FUKUCHI Chihiro MATSUI Ken TAKEUCHI
This paper analyzes the system-level performance of Storage Class Memory (SCM)/NAND flash hybrid solid-state drives (SSDs) and SCM/NAND flash/NAND flash tri-hybrid SSDs in difference types of NAND flash memory. There are several types of NAND flash memory, i.e. 2-dimensional (2D) or 3-dimensional (3D), charge-trap type (CT) and floating-gate type (FG) and multi-level cell (MLC) or triple-level cell (TLC). In this paper, the following four types of NAND flash memory are analyzed: 1) 3D CT TLC, 2) 3D FG TLC, 3) 2D FG TLC, and 4) 2D FG MLC NAND flash. Regardless of read- and write-intensive workloads, SCM/NAND flash hybrid SSD with low cost 3D CT TLC NAND flash achieves the best performance that is 20% higher than that with higher cost 2D FG MLC NAND flash. The performance improvement of 3D CT TLC NAND flash can be obtained by the short write latency. On the other hand, in case of tri-hybrid SSD, SCM/3D CT TLC/3D CT TLC NAND flash tri-hybrid SSD improves the performance 102% compared to SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. In addition, SCM/2D FG MLC/2D FG MLC NAND flash tri-hybrid SSD shows 49% lower performance than SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. Tri-hybrid SSD flash with 3D CT TLC NAND flash is the best performance in tri-hybrid SSD thanks to larger block size and word-line (WL) write. Therefore, in 3D CT TLC NAND flash based SSDs, higher cost MLC NAND flash is not necessary for hybrid SSD and tri-hybrid SSD for data center applications.
This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.
Shinsei YOSHIKIYO Naoko MISAWA Kasidit TOPRASERTPONG Shinichi TAKAGI Chihiro MATSUI Ken TAKEUCHI
This paper proposes a layer-wise tunable retraining method for edge FeFET Computation-in-Memory (CiM) to compensate the accuracy degradation of neural network (NN) by FeFET device errors. The proposed retraining can tune the number of layers to be retrained to reduce inference accuracy degradation by errors that occur after retraining. Weights of the original NN model, accurately trained in cloud data center, are written into edge FeFET CiM. The written weights are changed by FeFET device errors in the field. By partially retraining the written NN model, the proposed method combines the error-affected layers of NN model with the retrained layers. The inference accuracy is thus recovered. After retraining, the retrained layers are re-written to CiM and affected by device errors again. In the evaluation, at first, the recovery capability of NN model by partial retraining is analyzed. Then the inference accuracy after re-writing is evaluated. Recovery capability is evaluated with non-volatile memory (NVM) typical errors: normal distribution, uniform shift, and bit-inversion. For all types of errors, more than 50% of the degraded percentage of inference accuracy is recovered by retraining only the final fully-connected (FC) layer of Resnet-32. To simulate FeFET Local-Multiply and Global-accumulate (LM-GA) CiM, recovery capability is also evaluated with FeFET errors modeled based on FeFET measurements. Retraining only FC layer achieves recovery rate of up to 53%, 66%, and 72% for FeFET write variation, read-disturb, and data-retention, respectively. In addition, just adding two more retraining layers improves recovery rate by 20-30%. In order to tune the number of retraining layers, inference accuracy after re-writing is evaluated by simulating the errors that occur after retraining. When NVM typical errors are injected, it is optimal to retrain FC layer and 3-6 convolution layers of Resnet-32. The optimal number of layers can be increased or decreased depending on the balance between the size of errors before retraining and errors after retraining.
Ayumu YAMADA Zhiyuan HUANG Naoko MISAWA Chihiro MATSUI Ken TAKEUCHI
In this work, fluctuation patterns of ReRAM current are classified automatically by proposed fluctuation pattern classifier (FPC). FPC is trained with artificially created dataset to overcome the difficulties of measured current signals, including the annotation cost and imbalanced data amount. Using FPC, fluctuation occurrence under different write conditions is analyzed for both HRS and LRS current. Based on the measurement and classification results, physical models of fluctuations are established.
Yuya ICHIKAWA Ayumu YAMADA Naoko MISAWA Chihiro MATSUI Ken TAKEUCHI
Integrating RGB and event sensors improves object detection accuracy, especially during the night, due to the high-dynamic range of event camera. However, introducing an event sensor leads to an increase in computational resources, which makes the implementation of RGB-event fusion multi-modal AI to CiM difficult. To tackle this issue, this paper proposes RGB-Event fusion Multi-modal analog Computation-in-Memory (CiM), called REM-CiM, for multi-modal edge object detection AI. In REM-CiM, two proposals about multi-modal AI algorithms and circuit implementation are co-designed. First, Memory capacity-Efficient Attentional Feature Pyramid Network (MEA-FPN), the model architecture for RGB-event fusion analog CiM, is proposed for parameter-efficient RGB-event fusion. Convolution-less bi-directional calibration (C-BDC) in MEA-FPN extracts important features of each modality with attention modules, while reducing the number of weight parameters by removing large convolutional operations from conventional BDC. Proposed MEA-FPN w/ C-BDC achieves a 76% reduction of parameters while maintaining mean Average Precision (mAP) degradation to < 2.3% during both day and night, compared with Attentional FPN fusion (A-FPN), a conventional BDC-adopted FPN fusion. Second, the low-bit quantization with clipping (LQC) is proposed to reduce area/energy. Proposed REM-CiM with MEA-FPN and LQC achieves almost the same memory cells, 21% less ADC area, 24% less ADC energy and 0.17% higher mAP than conventional FPN fusion CiM without LQC.
Fuyuki KIHARA Chihiro MATSUI Ken TAKEUCHI
In this work, we propose a 1T1R ReRAM CiM architecture for Hyperdimensional Computing (HDC). The number of Source Lines and Bit Lines is reduced by introducing memory cells that are connected in series, which is especially advantageous when using a 3D implementation. The results of CiM operations contain errors, but HDC is robust against them, so that even if the XNOR operation has an error of 25%, the inference accuracy remains above 90%.
Tomoaki YAMADA Chihiro MATSUI Ken TAKEUCHI
In order to realize solid-state drives (SSDs) with high performance, low energy consumption and high reliability, storage class memory (SCM)/multi-level cell (MLC) NAND flash hybrid SSD has been proposed. Algorithm of the hybrid SSD should be designed according to SCM specifications and workload characteristics. In this paper, SCMs are used as non-volatile cache. Cache operation guidelines and optimal SCM specifications for the hybrid SSD are provided for various workload characteristics. Three kinds of non-volatile cache operation for the hybrid SSD are discussed: i) write cache, ii) read-write cache without space control (RW cache) and iii) read-write cache with space control (RW cache w/ SC). SSD workloads are categorized into eight according to read/write ratio, access frequency and access data size. From evaluation result, the write cache algorithm is suitable for write-intensive workloads and read-cold-sequential workloads, while the RW cache algorithm is suitable for read-cold-random workloads to achieve the highest performance of the hybrid SSD. In contrast, as for read-hot-random workloads, write cache is appropriate when the SCM capacity is less than 3% of the NAND flash capacity. On the other hand, RW cache should be used in case that SCM capacity is more than 5% of NAND flash capacity. The effect of Memory-type SCM (M-SCM) and Storage-type SCM (S-SCM) on the hybrid SSD performance is also analyzed. The M-SCM latency is below 1 us (high speed) but the capacity is only 2% of the NAND flash capacity (small capacity). On the other hand, the S-SCM capacity is assumed to be 5% of the NAND flash capacity (large capacity) but S-SCM speed is larger than 1 us (low speed). If the additional SCM cost is limited to 20% of MLC NAND flash cost, up to 7-times and 8-times performance improvement are achieved in write-hot-random workload and read-hot-random workloads, respectively. Moreover, if the additional SCM cost is the same as MLC NAND flash cost, M-SCM/MLC NAND flash hybrid SSD achieves 24-times performance improvement.