Keyword Search Result

[Keyword] memory(654hit)

21-40hit(654hit)

  • Crosstalk Analysis and Countermeasures of High-Bandwidth 3D-Stacked Memory Using Multi-Hop Inductive Coupling Interface Open Access

    Kota SHIBA  Atsutake KOSUGE  Mototsugu HAMADA  Tadahiro KURODA  

     
    BRIEF PAPER

      Pubricized:
    2022/09/30
      Vol:
    E106-C No:7
      Page(s):
    391-394

    This paper describes an in-depth analysis of crosstalk in a high-bandwidth 3D-stacked memory using a multi-hop inductive coupling interface and proposes two countermeasures. This work analyzes the crosstalk among seven stacked chips using a 3D electromagnetic (EM) simulator. The detailed analysis reveals two main crosstalk sources: concentric coils and adjacent coils. To suppress these crosstalks, this paper proposes two corresponding countermeasures: shorted coils and 8-shaped coils. The combination of these coils improves area efficiency by a factor of 4 in simulation. The proposed methods enable an area-efficient inductive coupling interface for high-bandwidth stacked memory.

  • An Efficient Reference Image Sharing Method for the Image-Division Parallel Video Encoding Architecture

    Ken NAKAMURA  Yuya OMORI  Daisuke KOBAYASHI  Koyo NITTA  Kimikazu SANO  Masayuki SATO  Hiroe IWASAKI  Hiroaki KOBAYASHI  

     
    PAPER

      Pubricized:
    2022/11/29
      Vol:
    E106-C No:6
      Page(s):
    312-320

    This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.

  • Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform

    Van-Cam NGUYEN  Yasuhiko NAKASHIMA  

     
    PAPER-Computer System

      Pubricized:
    2023/03/17
      Vol:
    E106-D No:6
      Page(s):
    1117-1129

    Many deep convolutional neural network (CNN) inference accelerators on the field-programmable gate array (FPGA) platform have been widely adopted due to their low power consumption and high performance. In this paper, we develop the following to improve performance and power efficiency. First, we use a high bandwidth memory (HBM) to expand the bandwidth of data transmission between the off-chip memory and the accelerator. Second, a fully-pipelined manner, which consists of pipelined inter-layer computation and a pipelined computation engine, is implemented to decrease idle time among layers. Third, a multi-core architecture with shared-dual buffers is designed to reduce off-chip memory access and maximize the throughput. We designed the proposed accelerator on the Xilinx Alveo U280 platform with in-depth Verilog HDL instead of high-level synthesis as the previous works and explored the VGG-16 model to verify the system during our experiment. With a similar accelerator architecture, the experimental results demonstrate that the memory bandwidth of HBM is 13.2× better than DDR4. Compared with other accelerators in terms of throughput, our accelerator is 1.9×/1.65×/11.9× better than FPGA+HBM2 based/low batch size (4) GPGPU/low batch size (4) CPU. Compared with the previous DDR+FPGA/DDR+GPGPU/DDR+CPU based accelerators in terms of power efficiency, our proposed system provides 1.4-1.7×/1.7-12.6×/6.6-37.1× improvement with the large-scale CNN model.

  • On Lookaheads in Regular Expressions with Backreferences

    Nariyoshi CHIDA  Tachio TERAUCHI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/02/06
      Vol:
    E106-D No:5
      Page(s):
    959-975

    Many modern regular expression engines employ various extensions to give more expressive support for real-world usages. Among the major extensions employed by many of the modern regular expression engines are backreferences and lookaheads. A question of interest about these extended regular expressions is their expressive power. Previous works have shown that (i) the extension by lookaheads does not enhance the expressive power, i.e., the expressive power of regular expressions with lookaheads is still regular, and that (ii) the extension by backreferences enhances the expressive power, i.e., the expressive power of regular expressions with backreferences (abbreviated as rewb) is no longer regular. This raises the following natural question: Does the extension of regular expressions with backreferences by lookaheads enhance the expressive power of regular expressions with backreferences? This paper answers the question positively by proving that adding either positive lookaheads or negative lookaheads increases the expressive power of rewb (the former abbreviated as rewblp and the latter as rewbln). A consequence of our result is that neither the class of finite state automata nor that of memory automata (MFA) of Schmid[2] (which corresponds to regular expressions with backreferenes but without lookaheads) corresponds to rewblp or rewbln. To fill the void, as a first step toward building such automata, we propose a new class of automata called memory automata with positive lookaheads (PLMFA) that corresponds to rewblp. The key idea of PLMFA is to extend MFA with a new kind of memories, called positive-lookahead memory, that is used to simulate the backtracking behavior of positive lookaheads. Interestingly, our positive-lookahead memories are almost perfectly symmetric to the capturing-group memories of MFA. Therefore, our PLMFA can be seen as a natural extension of MFA that can be obtained independently of its original intended purpose of simulating rewblp.

  • Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

    Yue XIE  Ruiyu LIANG  Zhenlin LIANG  Xiaoyan ZHAO  Wenhao ZENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1098-1101

    To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

  • APVAS: Reducing the Memory Requirement of AS_PATH Validation by Introducing Aggregate Signatures into BGPsec

    Ouyang JUNJIE  Naoto YANAI  Tatsuya TAKEMURA  Masayuki OKADA  Shingo OKAMURA  Jason Paul CRUZ  

     
    PAPER

      Pubricized:
    2023/01/11
      Vol:
    E106-A No:3
      Page(s):
    170-184

    The BGPsec protocol, which is an extension of the border gateway protocol (BGP) for Internet routing known as BGPsec, uses digital signatures to guarantee the validity of routing information. However, the use of digital signatures in routing information on BGPsec causes a lack of memory in BGP routers, creating a gaping security hole in today's Internet. This problem hinders the practical realization and implementation of BGPsec. In this paper, we present APVAS (AS path validation based on aggregate signatures), a new protocol that reduces the memory consumption of routers running BGPsec when validating paths in routing information. APVAS relies on a novel aggregate signature scheme that compresses individually generated signatures into a single signature. Furthermore, we implement a prototype of APVAS on BIRD Internet Routing Daemon and demonstrate its efficiency on actual BGP connections. Our results show that the routing tables of the routers running BGPsec with APVAS have 20% lower memory consumption than those running the conventional BGPsec. We also confirm the effectiveness of APVAS in the real world by using 800,000 routes, which are equivalent to the full route information on a global scale.

  • Heterogeneous Integration of Precise and Approximate Storage for Error-Tolerant Workloads

    Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2022/09/05
      Vol:
    E106-A No:3
      Page(s):
    491-503

    This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.

  • A Non-Intrusive Speech Quality Evaluation Method Based on the Audiogram and Weighted Frequency Information for Hearing Aid

    Ruxue GUO  Pengxu JIANG  Ruiyu LIANG  Yue XIE  Cairong ZOU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/07/25
      Vol:
    E106-A No:1
      Page(s):
    64-68

    For a long time, the compensation effect of hearing aid is mainly evaluated subjectively, and there are fewer studies of objective evaluation. Furthermore, a pure speech signal is generally required as a reference in the existing objective evaluation methods, which restricts the practicality in a real-world environment. Therefore, this paper presents a non-intrusive speech quality evaluation method for hearing aid, which combines the audiogram and weighted frequency information. The proposed model mainly includes an audiogram information extraction network, a frequency information extraction network, and a quality score mapping network. The audiogram is the input of the audiogram information extraction network, which helps the system capture the information related to hearing loss. In addition, the low-frequency bands of speech contain loudness information and the medium and high-frequency components contribute to semantic comprehension. The information of two frequency bands is input to the frequency information extraction network to obtain time-frequency information. When obtaining the high-level features of different frequency bands and audiograms, they are fused into two groups of tensors that distinguish the information of different frequency bands and used as the input of the attention layer to calculate the corresponding weight distribution. Finally, a dense layer is employed to predict the score of speech quality. The experimental results show that it is reasonable to combine the audiogram and the weight of the information from two frequency bands, which can effectively realize the evaluation of the speech quality of the hearing aid.

  • Comparison of Value- and Reference-Based Memory Page Compaction in Virtualized Systems

    Naoki AOYAMA  Hiroshi YAMADA  

     
    PAPER-Software System

      Pubricized:
    2022/08/31
      Vol:
    E105-D No:12
      Page(s):
    2075-2084

    The issue of copying values or references has historically been studied for managing memory objects, especially in distributed systems. In this paper, we explore a new topic on copying values v.s. references, for memory page compaction on virtualized systems. Memory page compaction moves target physical pages to a contiguous memory region at the operating system kernel level to create huge pages. Memory virtualization provides an opportunity to perform memory page compaction by copying the references of the physical pages. That is, instead of copying pages' values, we can move guest physical pages by changing the mappings of guest-physical to machine-physical pages. The goal of this paper is a quantitative comparison between value- and reference-based memory page compaction. To do so, we developed a software mechanism that achieves memory page compaction by appropriately updating the references of guest-physical pages. We prototyped the mechanism on Linux 4.19.29 and the experimental results show that the prototype's page compaction is up to 78% faster and achieves up to 17% higher performance on the memory-intensive real-world applications as compared to the default value-copy compaction scheme.

  • MemFRCN: Few Shot Object Detection with Memorable Faster-RCNN

    TongWei LU  ShiHai JIA  Hao ZHANG  

     
    LETTER-Vision

      Pubricized:
    2022/05/24
      Vol:
    E105-A No:12
      Page(s):
    1626-1630

    At this stage, research in the field of Few-shot image classification (FSC) has made good progress, but there are still many difficulties in the field of Few-shot object detection (FSOD). Almost all of the current FSOD methods face catastrophic forgetting problems, which are manifested in that the accuracy of base class recognition will drop seriously when acquiring the ability to recognize Novel classes. And for many methods, the accuracy of the model will fall back as the class increases. To address this problem we propose a new memory-based method called Memorable Faster R-CNN (MemFRCN), which makes the model remember the categories it has already seen. Specifically, we propose a new tow-stage object detector consisting of a memory-based classifier (MemCla), a fully connected neural network classifier (FCC) and an adaptive fusion block (AdFus). The former stores the embedding vector of each category as memory, which enables the model to have memory capabilities to avoid catastrophic forgetting events. The final part fuses the outputs of FCC and MemCla, which can automatically adjust the fusion method of the model when the number of samples increases so that the model can achieve better performance under various conditions. Our method can perform well on unseen classes while maintaining the detection accuracy of seen classes. Experimental results demonstrate that our method outperforms other current methods on multiple benchmarks.

  • Holmes: A Hardware-Oriented Optimizer Using Logarithms

    Yoshiharu YAMAGISHI  Tatsuya KANEKO  Megumi AKAI-KASAYA  Tetsuya ASAI  

     
    PAPER

      Pubricized:
    2022/05/11
      Vol:
    E105-D No:12
      Page(s):
    2040-2047

    Edge computing, which has been gaining attention in recent years, has many advantages, such as reducing the load on the cloud, not being affected by the communication environment, and providing excellent security. Therefore, many researchers have attempted to implement neural networks, which are representative of machine learning in edge computing. Neural networks can be divided into inference and learning parts; however, there has been little research on implementing the learning component in edge computing in contrast to the inference part. This is because learning requires more memory and computation than inference, easily exceeding the limit of resources available for edge computing. To overcome this problem, this research focuses on the optimizer, which is the heart of learning. In this paper, we introduce our new optimizer, hardware-oriented logarithmic momentum estimation (Holmes), which incorporates new perspectives not found in existing optimizers in terms of characteristics and strengths of hardware. The performance of Holmes was evaluated by comparing it with other optimizers with respect to learning progress and convergence speed. Important aspects of hardware implementation, such as memory and operation requirements are also discussed. The results show that Holmes is a good match for edge computing with relatively low resource requirements and fast learning convergence. Holmes will help create an era in which advanced machine learning can be realized on edge computing.

  • Sputtering Gas Pressure Dependence on the LaBxNy Insulator Formation for Pentacene-Based Back-Gate Type Floating-Gate Memory with an Amorphous Rubrene Passivation Layer

    Eun-Ki HONG  Kyung Eun PARK  Shun-ichiro OHMI  

     
    PAPER

      Pubricized:
    2022/06/27
      Vol:
    E105-C No:10
      Page(s):
    589-595

    In this research, the effect of Ar/N2-plasma sputtering gas pressure on the LaBxNy tunnel and block layer was investigated for pentacene-based floating-gate memory with an amorphous rubrene (α-rubrene) passivation layer. The influence of α-rubrene passivation layer for memory characteristic was examined. The pentacene-based metal/insulator/metal/insulator/semiconductor (MIMIS) diode and organic field-effect transistor (OFET) were fabricated utilizing N-doped LaB6 metal layer and LaBxNy insulator with α-rubrene passivation layer at annealing temperature of 200°C. In the case of MIMIS diode, the leakage current density and the equivalent oxide thickness (EOT) were decreased from 1.2×10-2 A/cm2 to 1.1×10-7 A/cm2 and 3.5 nm to 3.1 nm, respectively, by decreasing the sputtering gas pressure from 0.47 Pa to 0.19 Pa. In the case of floating-gate type OFET with α-rubrene passivation layer, the larger memory window of 0.68 V was obtained with saturation mobility of 2.2×10-2 cm2/(V·s) and subthreshold swing of 199 mV/dec compared to the device without α-rubrene passivation layer.

  • 28nm Atom-Switch FPGA: Static Timing Analysis and Evaluation

    Xu BAI  Ryusuke NEBASHI  Makoto MIYAMURA  Kazunori FUNAHASHI  Naoki BANNO  Koichiro OKAMOTO  Hideaki NUMATA  Noriyuki IGUCHI  Tadahiko SUGIBAYASHI  Toshitsugu SAKAMOTO  Munehiro TADA  

     
    BRIEF PAPER

      Pubricized:
    2022/06/27
      Vol:
    E105-C No:10
      Page(s):
    627-630

    A static timing analysis (STA) tool for a 28nm atom-switch FPGA (AS-FPGA) is introduced to validate the signal delay of an application circuit before implementation. High accuracy of the STA tool is confirmed by implementing a practical application circuit on the 28nm AS-FPGA. Moreover, dramatic improvement of delay and power is demonstrated in comparison with a previous 40nm AS-FPGA.

  • Fast Gated Recurrent Network for Speech Synthesis

    Bima PRIHASTO  Tzu-Chiang TAI  Pao-Chi CHANG  Jia-Ching WANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/10
      Vol:
    E105-D No:9
      Page(s):
    1634-1638

    The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.

  • A Hierarchical Memory Model for Task-Oriented Dialogue System

    Ya ZENG  Li WAN  Qiuhong LUO  Mao CHEN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/05/16
      Vol:
    E105-D No:8
      Page(s):
    1481-1489

    Traditional pipeline methods for task-oriented dialogue systems are designed individually and expensively. Existing memory augmented end-to-end methods directly map the inputs to outputs and achieve promising results. However, the most existing end-to-end solutions store the dialogue history and knowledge base (KB) information in the same memory and represent KB information in the form of KB triples, making the memory reader's reasoning on the memory more difficult, which makes the system difficult to retrieve the correct information from the memory to generate a response. Some methods introduce many manual annotations to strengthen reasoning. To reduce the use of manual annotations, while strengthening reasoning, we propose a hierarchical memory model (HM2Seq) for task-oriented systems. HM2Seq uses a hierarchical memory to separate the dialogue history and KB information into two memories and stores KB in KB rows, then we use memory rows pointer combined with an entity decoder to perform hierarchical reasoning over memory. The experimental results on two publicly available task-oriented dialogue datasets confirm our hypothesis and show the outstanding performance of our HM2Seq by outperforming the baselines.

  • A Low-Cost Training Method of ReRAM Inference Accelerator Chips for Binarized Neural Networks to Recover Accuracy Degradation due to Statistical Variabilities

    Zian CHEN  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2022/01/31
      Vol:
    E105-C No:8
      Page(s):
    375-384

    A new software based in-situ training (SBIST) method to achieve high accuracies is proposed for binarized neural networks inference accelerator chips in which measured offsets in sense amplifiers (activation binarizers) are transformed into biases in the training software. To expedite this individual training, the initial values for the weights are taken from results of a common forming training process which is conducted in advance by using the offset fluctuation distribution averaged over the fabrication line. SPICE simulation inference results for the accelerator predict that the accuracy recovers to higher than 90% even when the amplifier offset is as large as 40mV only after a few epochs of the individual training.

  • A Conflict-Aware Capacity Control Mechanism for Deep Cache Hierarchy

    Jiaheng LIU  Ryusuke EGAWA  Hiroyuki TAKIZAWA  

     
    PAPER-Computer System

      Pubricized:
    2022/03/09
      Vol:
    E105-D No:6
      Page(s):
    1150-1163

    As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.

  • A Metadata Prefetching Mechanism for Hybrid Memory Architectures Open Access

    Shunsuke TSUKADA  Hikaru TAKAYASHIKI  Masayuki SATO  Kazuhiko KOMATSU  Hiroaki KOBAYASHI  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    232-243

    A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.

  • Polarity Classification of Social Media Feeds Using Incremental Learning — A Deep Learning Approach

    Suresh JAGANATHAN  Sathya MADHUSUDHANAN  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2021/09/15
      Vol:
    E105-A No:3
      Page(s):
    584-593

    Online feeds are streamed continuously in batches with varied polarities at varying times. The system handling the online feeds must be trained to classify all the varying polarities occurring dynamically. The polarity classification system designed for the online feeds must address two significant challenges: i) stability-plasticity, ii) category-proliferation. The challenges faced in the polarity classification of online feeds can be addressed using the technique of incremental learning, which serves to learn new classes dynamically and also retains the previously learned knowledge. This paper proposes a new incremental learning methodology, ILOF (Incremental Learning of Online Feeds) to classify the feeds by adopting Deep Learning Techniques such as RNN (Recurrent Neural Networks) and LSTM (Long Short Term Memory) and also ELM (Extreme Learning Machine) for addressing the above stated problems. The proposed method creates a separate model for each batch using ELM and incrementally learns from the trained batches. The training of each batch avoids the retraining of old feeds, thus saving training time and memory space. The trained feeds can be discarded when new batch of feeds arrives. Experiments are carried out using the standard datasets comprising of long feeds (IMDB, Sentiment140) and short feeds (Twitter, WhatsApp, and Twitter airline sentiment) and the proposed method showed positive results in terms of better performance and accuracy.

  • Simulation-Based Understanding of “Charge-Sharing Phenomenon” Induced by Heavy-Ion Incident on a 65nm Bulk CMOS Memory Circuit

    Akifumi MARU  Akifumi MATSUDA  Satoshi KUBOYAMA  Mamoru YOSHIMOTO  

     
    BRIEF PAPER-Electronic Circuits

      Pubricized:
    2021/08/05
      Vol:
    E105-C No:1
      Page(s):
    47-50

    In order to expect the single event occurrence on highly integrated CMOS memory circuit, quantitative evaluation of charge sharing between memory cells is needed. In this study, charge sharing area induced by heavy ion incident is quantitatively calculated by using device-simulation-based method. The validity of this method is experimentally confirmed using the charged heavy ion accelerator.

21-40hit(654hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.