Keyword Search Result

[Keyword] decoder(114hit)

1-20hit(114hit)

  • Analysis on Norms of Word Embedding and Hidden Vectors in Neural Conversational Model Based on Encoder-Decoder RNN

    Manaya TOMIOKA  Tsuneo KATO  Akihiro TAMURA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/06/30
      Vol:
    E105-D No:10
      Page(s):
    1780-1789

    A neural conversational model (NCM) based on an encoder-decoder recurrent neural network (RNN) with an attention mechanism learns different sequence-to-sequence mappings from what neural machine translation (NMT) learns even when based on the same technique. In the NCM, we confirmed that target-word-to-source-word mappings captured by the attention mechanism are not as clear and stationary as those for NMT. Considering that vector norms indicate a magnitude of information in the processing, we analyzed the inner workings of an encoder-decoder GRU-based NCM focusing on the norms of word embedding vectors and hidden vectors. First, we conducted correlation analyses on the norms of word embedding vectors with frequencies in the training set and with conditional entropies of a bi-gram language model to understand what is correlated with the norms in the encoder and decoder. Second, we conducted correlation analyses on norms of change in the hidden vector of the recurrent layer with their input vectors for the encoder and decoder, respectively. These analyses were done to understand how the magnitude of information propagates through the network. The analytical results suggested that the norms of the word embedding vectors are associated with their semantic information in the encoder, while those are associated with the predictability as a language model in the decoder. The analytical results further revealed how the norms propagate through the recurrent layer in the encoder and decoder.

  • A Reconfigurable 74-140Mbps LDPC Decoding System for CCSDS Standard

    Yun CHEN  Jimin WANG  Shixian LI  Jinfou XIE  Qichen ZHANG  Keshab K. PARHI  Xiaoyang ZENG  

     
    PAPER

      Pubricized:
    2021/05/25
      Vol:
    E104-A No:11
      Page(s):
    1509-1515

    Accumulate Repeat-4 Jagged-Accumulate (AR4JA) codes, which are channel codes designed for deep-space communications, are a series of QC-LDPC codes. Structures of these codes' generator matrix can be exploited to design reconfigurable encoders. To make the decoder reconfigurable and achieve shorter convergence time, turbo-like decoding message passing (TDMP) is chosen as the hardware decoder's decoding schedule and normalized min-sum algorithm (NMSA) is used as decoding algorithm to reduce hardware complexity. In this paper, we propose a reconfigurable decoder and present its FPGA implementation results. The decoder can achieve throughput greater than 74 Mbps.

  • Cross-Domain Energy Consumption Prediction via ED-LSTM Networks

    Ye TAO  Fang KONG  Wenjun JU  Hui LI  Ruichun HOU  

     
    PAPER

      Pubricized:
    2021/05/11
      Vol:
    E104-D No:8
      Page(s):
    1204-1213

    As an important type of science and technology service resource, energy consumption data play a vital role in the process of value chain integration between home appliance manufacturers and the state grid. Accurate electricity consumption prediction is essential for demand response programs in smart grid planning. The vast majority of existing prediction algorithms only exploit data belonging to a single domain, i.e., historical electricity load data. However, dependencies and correlations may exist among different domains, such as the regional weather condition and local residential/industrial energy consumption profiles. To take advantage of cross-domain resources, a hybrid energy consumption prediction framework is presented in this paper. This framework combines the long short-term memory model with an encoder-decoder unit (ED-LSTM) to perform sequence-to-sequence forecasting. Extensive experiments are conducted with several of the most commonly used algorithms over integrated cross-domain datasets. The results indicate that the proposed multistep forecasting framework outperforms most of the existing approaches.

  • Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM

    Shan HE  Yuanyao LU  Shengnan CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/04/19
      Vol:
    E104-D No:7
      Page(s):
    941-947

    The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.

  • Low Delay 4K 120fps HEVC Decoder with Parallel Processing Architecture

    Ken NAKAMURA  Daisuke KOBAYASHI  Yuya OMORI  Tatsuya OSAWA  Takayuki ONISHI  Koyo NITTA  Hiroe IWASAKI  

     
    PAPER

      Vol:
    E103-C No:3
      Page(s):
    77-84

    In this paper, we describe a novel low-delay 4K 120-fps real-time HEVC decoder with a parallel processing architecture that conforms to the HEVC main 4:2:2 10 profile. It supports the hierarchical temporal scalable streams required for Ultra High Definition high-frame-rate broadcasting and also supports low-delay and high-bitrate decoding for video transmission uses. To achieve this support, the decoding processes are parallelized and pipelined at the frame level, slice level, and coding tree unit row level. The proposed decoder was implemented on three FPGAs operated at 133 and 150 MHz, and it achieved 300-Mbps stream decoding and 37-msec end-to-end delay with our concurrently developed 4K 120-fps encoder.

  • An Adaptive Fusion Successive Cancellation List Decoder for Polar Codes with Cyclic Redundancy Check

    Yuhuan WANG  Hang YIN  Zhanxin YANG  Yansong LV  Lu SI  Xinle YU  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2019/07/08
      Vol:
    E103-B No:1
      Page(s):
    43-51

    In this paper, we propose an adaptive fusion successive cancellation list decoder (ADF-SCL) for polar codes with single cyclic redundancy check. The proposed ADF-SCL decoder reasonably avoids unnecessary calculations by selecting the successive cancellation (SC) decoder or the adaptive successive cancellation list (AD-SCL) decoder depending on a log-likelihood ratio (LLR) threshold in the decoding process. Simulation results show that compared to the AD-SCL decoder, the proposed decoder can achieve significant reduction of the average complexity in the low signal-to-noise ratio (SNR) region without degradation of the performance. When Lmax=32 and Eb/N0=0.5dB, the average complexity of the proposed decoder is 14.23% lower than that of the AD-SCL decoder.

  • STBC Based Decoders for Two-User Interference MIMO Channels

    Zhiqiang YI  Meilin HE  Peng PAN  Haiquan WANG  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Pubricized:
    2019/03/14
      Vol:
    E102-B No:9
      Page(s):
    1875-1884

    This paper analyzes the performance of various decoders in a two-user interference channel, and some improved decoders based on enhanced utilization of channel state information at the receiver side are presented. Further, new decoders, namely hierarchical constellation based decoders, are proposed. Simulations show that the improved decoders and the proposed decoders have much better performance than existing decoders. Moreover, the proposed decoders have lower decoding complexity than the traditional maximum likelihood decoder.

  • Simplified Iterative Decoder for Polybinary-Shaped Optical Signals in Super-Nyquist Wavelength Division Multiplexed Systems

    Shuai YUAN  Koji IGARASHI  

     
    PAPER-Fiber-Optic Transmission for Communications

      Pubricized:
    2018/10/11
      Vol:
    E102-B No:4
      Page(s):
    818-823

    In super-Nyquist wavelength division multiplexed systems, performance of forward error correction (FEC) can be improved by an iterative decoder between a maximum likelihood decoder for polybinary shaping and an FEC decoder. The typical iterative decoder includes not only the iteration between the first and second decoders but also the internal iteration within the FEC decoder. Such two-fold loop configuration would increase the computational complexity for decoding. In this paper, we propose the simplified iterative decoder, where the internal iteration in the FEC decoder is not performed, reducing the computational complexity. We numerically evaluate the bit-error rate performance of polybinary-shaped QPSK signals in the simplified iterative decoder. The numerical results show that the FEC performance can be improved in the simplified scheme, compared with the typical iterative decoder. In addition, the performance of the simplified iterative decoder has been investigated by the extrinsic information transfer (EXIT) chart.

  • A Unified Neural Network for Quality Estimation of Machine Translation

    Maoxi LI  Qingyu XIANG  Zhiming CHEN  Mingwen WANG  

     
    LETTER-Natural Language Processing

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2417-2421

    The-state-of-the-art neural quality estimation (QE) of machine translation model consists of two sub-networks that are tuned separately, a bidirectional recurrent neural network (RNN) encoder-decoder trained for neural machine translation, called the predictor, and an RNN trained for sentence-level QE tasks, called the estimator. We propose to combine the two sub-networks into a whole neural network, called the unified neural network. When training, the bidirectional RNN encoder-decoder are initialized and pre-trained with the bilingual parallel corpus, and then, the networks are trained jointly to minimize the mean absolute error over the QE training samples. Compared with the predictor and estimator approach, the use of a unified neural network helps to train the parameters of the neural networks that are more suitable for the QE task. Experimental results on the benchmark data set of the WMT17 sentence-level QE shared task show that the proposed unified neural network approach consistently outperforms the predictor and estimator approach and significantly outperforms the other baseline QE approaches.

  • An Analysis of Time Domain Reed Solomon Decoder with FPGA Implementation

    Kentaro KATO  Somsak CHOOMCHUAY  

     
    PAPER-Computer System

      Pubricized:
    2017/08/23
      Vol:
    E100-D No:12
      Page(s):
    2953-2961

    This paper analyzes the time domain Reed Solomon Decoder with FPGA implementation. Data throughput and area is carefully evaluated compared with typical frequency domain Reed Solomon Decoder. In this analysis, three hardware architecture to enhance the data throughput, namely, the pipelined architecture, the parallel architecture, and the truncated arrays, is evaluated, too. The evaluation reveals that the number of the consumed resources of RS(255, 239) is about 20% smaller than those of the frequency domain decoder although data throughput is less than 10% of the frequency domain decoder. The number of the consumed resources of the pipelined architecture is 28% smaller than that of the parallel architecture when data throughput is same. It is because the pipeline architecture requires less extra logics than the parallel architecture. To get higher data throughput, the pipelined architecture is better than the parallel architecture from the viewpoint of consumed resources.

  • Design of a High-Throughput Sliding Block Viterbi Decoder for IEEE 802.11ac WLAN Systems

    Kai-Feng XIA  Bin WU  Tao XIONG  Cheng-Ying CHEN  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:8
      Page(s):
    1606-1614

    This paper presents a high-throughput sliding block Viterbi decoder for IEEE 802.11ac systems. A 64-state bidirectional sliding block Viterbi method is proposed to meet the speed requirement of the system. The decoder throughput goes up to 640Mbps, which can be further increased by adding the block parallelism. Moreover, a modified add-compare-select (ACS) unit is designed to enhance the working frequency. The modified ACS unit obtains nearly 26% speed-up, compared to the conventional ACS unit. However, the area overhead and power dissipation are almost the same. The decoder is designed in a SMIC 0.13µm technology, and it occupies 1.96mm2 core area and 105mW power consumption with an energy efficiency of 0.1641nJ/bit with a 1.2V voltage supply.

  • A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access

    Heungseop AHN  Seungwon CHOI  

     
    PAPER-Communication Theory and Signals

      Vol:
    E100-A No:5
      Page(s):
    1188-1196

    The sub-blocking algorithm has been known as a core component in implementing a turbo decoder using a Graphic Processing Unit (GPU) to use as many cores in the GPU as possible for parallel processing. However, even though the sub-blocking algorithm allows a large number of threads in a given GPU to be adopted for processing a large number of sub-blocks in parallel, each thread must access the global memory with strided addresses, which results in uncoalesced memory access. Because uncoalesced memory access causes a lot of unnecessary memory transactions, the memory bandwidth efficiency drops significantly, possibly as low as 1/8 in the case of an Long Term Evolution (LTE) turbo decoder, depending upon the compute capability of a GPU. In this paper, we present a novel method for converting uncoalesced memory access into coalesced access in a way that completely recovers the memory bandwidth efficiency to 100% without additional overhead. Our experimental tests, performed with NVIDIA's Geforce GTX 780 Ti GPU, show that the proposed method can enhance the throughput by nearly 30% compared with a conventional turbo decoder that suffers from uncoalesced memory access. Throughput provided by the proposed method has been observed to be 51.4Mbps when the number of iterations and that of sub-blocks are set to 6 and 32, respectively, in our experimental tests, which far exceeds the performance of previous works implemented the Max-Log-MAP algorithm.

  • Reduced Complexity K-Best Decoder via Adaptive Symbol Constellation for Uncoded MIMO Wireless Systems

    Juan Francisco CASTILLO-LEON  Marco CARDENAS-JUAREZ  Victor M. GARCIA-MOLLA  Enrique STEVENS-NAVARRO  Ulises PINEDA-RICO  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2016/08/22
      Vol:
    E100-B No:2
      Page(s):
    336-343

    In this paper, we present a low and variable computation complexity decoder based on K-Best for uncoded detection in spatially multiplexed MIMO systems. In the variable complexity K-Best (VKB), the detection of each symbol is carried out using only a symbol constellation of variable size. This symbol constellation is obtained by considering the channel properties and a given target SNR. Simulations show that the proposed technique almost matches the performance of the original K-Best decoder. Moreover, it is able to reduce the average computation complexity by at least 75% in terms of the number of visited nodes.

  • Performance Analysis Based on Density Evolution on Fault Erasure Belief Propagation Decoder

    Hiroki MORI  Tadashi WADAYAMA  

     
    PAPER-Coding Theory and Techniques

      Vol:
    E99-A No:12
      Page(s):
    2155-2161

    In this paper, we will present analysis on the fault erasure BP decoders based on the density evolution. In the fault BP decoder, the messages exchanged in a BP process are stochastically corrupted due to unreliable logic gates and flip-flops; i.e., we assume circuit components with transient faults. We derived a set of the density evolution equations for the fault erasure BP processes. Our density evolution analysis reveals the asymptotic behaviors of the estimation error probability of the fault erasure BP decoders. In contrast to the fault free cases, it is observed that the error probabilities of the fault erasure BP decoder converge to positive values, and that there exists a discontinuity in an error curve corresponding to the fault BP threshold. It is also shown that an message encoding technique provides higher fault BP thresholds than those of the original decoders at the cost of increased circuit size.

  • Sparse-Graph Codes and Peeling Decoder for Compressed Sensing

    Weijun ZENG  Huali WANG  Xiaofu WU  Hui TIAN  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:9
      Page(s):
    1712-1716

    In this paper, we propose a compressed sensing scheme using sparse-graph codes and peeling decoder (SGPD). By using a mix method for construction of sensing matrices proposed by Pawar and Ramchandran, it generates local sensing matrices and implements sensing and signal recovery in an adaptive manner. Then, we show how to optimize the construction of local sensing matrices using the theory of sparse-graph codes. Like the existing compressed sensing schemes based on sparse-graph codes with “good” degree profile, SGPD requires only O(k) measurements to recover a k-sparse signal of dimension n in the noiseless setting. In the presence of noise, SGPD performs better than the existing compressed sensing schemes based on sparse-graph codes, still with a similar implementation cost. Furthermore, the average variable node degree for sensing matrices is empirically minimized for SGPD among various existing CS schemes, which can reduce the sensing computational complexity.

  • Self-Adaptive Scaled Min-Sum Algorithm for LDPC Decoders Based on Delta-Min

    Keol CHO  Ki-Seok CHUNG  

     
    LETTER-Coding Theory

      Vol:
    E99-A No:8
      Page(s):
    1632-1634

    A self-adaptive scaled min-sum algorithm for LDPC decoding based on the difference between the first two minima of the check node messages (Δmin) is proposed. Δmin is utilized for adjusting the scaling factor of the check node messages, and simulation results show that the proposed algorithm improves the error correcting performance compared to existing algorithms.

  • High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder

    Jianbin ZHOU  Dajiang ZHOU  Shihao WANG  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2519-2527

    8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K for 8-bit design, and 251.1K for 10-bit design. The proposed VLSI architecture can support 4320p@120fps H.265 intra decoding (8-bit or 10-bit), with all 35 intra prediction modes and prediction unit sizes ranging from 4×4 to 64×64.

  • Implementation of Viterbi Decoder toward GPU-Based SDR Receiver

    Kosuke TOMITA  Masahide HATANAKA  Takao ONOYE  

     
    PAPER

      Vol:
    E98-A No:11
      Page(s):
    2246-2253

    Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.

  • Unified Parameter Decoder Architecture for H.265/HEVC Motion Vector and Boundary Strength Decoding

    Shihao WANG  Dajiang ZHOU  Jianbin ZHOU  Takeshi YOSHIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1356-1365

    In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90nm process, our design costs 93.3k logic gates with 23.0kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.

  • A Low Complexity Fixed Sphere Decoder with Statistical Threshold for MIMO Systems

    Jangyong PARK  Yunho JUNG  Jaeseok KIM  

     
    LETTER-Digital Signal Processing

      Vol:
    E98-A No:2
      Page(s):
    735-739

    In this letter, we propose a low complexity fixed sphere decoder (FSD) with statistical threshold for multiple-input and multiple-output (MIMO) systems. The proposed algorithm is developed by applying two threshold-based pruning algorithms using an initial detection and statistical noise constraint to the FSD. The proposed FSD algorithm is suitable for a fully pipelined hardware implementation and also has low complexity because the threshold of the proposed pruning algorithm is pre-calculated and independently applied to the path without sorting operation. Simulation results show that the proposed FSD has the performance of the original FSD as well as a low complexity compared to the original FSD and other low complexity FSD algorithms.

1-20hit(114hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.