Keyword Search Result

[Keyword] turbo decoder(9hit)

1-9hit
  • A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access

    Heungseop AHN  Seungwon CHOI  

     
    PAPER-Communication Theory and Signals

      Vol:
    E100-A No:5
      Page(s):
    1188-1196

    The sub-blocking algorithm has been known as a core component in implementing a turbo decoder using a Graphic Processing Unit (GPU) to use as many cores in the GPU as possible for parallel processing. However, even though the sub-blocking algorithm allows a large number of threads in a given GPU to be adopted for processing a large number of sub-blocks in parallel, each thread must access the global memory with strided addresses, which results in uncoalesced memory access. Because uncoalesced memory access causes a lot of unnecessary memory transactions, the memory bandwidth efficiency drops significantly, possibly as low as 1/8 in the case of an Long Term Evolution (LTE) turbo decoder, depending upon the compute capability of a GPU. In this paper, we present a novel method for converting uncoalesced memory access into coalesced access in a way that completely recovers the memory bandwidth efficiency to 100% without additional overhead. Our experimental tests, performed with NVIDIA's Geforce GTX 780 Ti GPU, show that the proposed method can enhance the throughput by nearly 30% compared with a conventional turbo decoder that suffers from uncoalesced memory access. Throughput provided by the proposed method has been observed to be 51.4Mbps when the number of iterations and that of sub-blocks are set to 6 and 32, respectively, in our experimental tests, which far exceeds the performance of previous works implemented the Max-Log-MAP algorithm.

  • An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU

    Rongchun LI  Yong DOU  Jiaqing XU  Xin NIU  Shice NI  

     
    PAPER-Digital Signal Processing

      Vol:
    E97-A No:5
      Page(s):
    1027-1036

    In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.

  • A 1.5 Gb/s Highly Parallel Turbo Decoder for 3GPP LTE/LTE-Advanced

    Yun CHEN  Xubin CHEN  Zhiyuan GUO  Xiaoyang ZENG  Defeng HUANG  

     
    LETTER-Fundamental Theories for Communications

      Vol:
    E96-B No:5
      Page(s):
    1211-1214

    A highly parallel turbo decoder for 3GPP LTE/LTE-Advanced systems is presented. It consists of 32 radix-4 soft-in/soft-out (SISO) decoders. Each SISO decoder is based on the proposed full-parallel sliding window (SW) schedule. Implemented in a 0.13 µm CMOS technology, the proposed design occupies 12.96 mm2 and achieves 1.5 Gb/s while decoding size-6144 blocks with 5.5 iterations. Compared with conventional SW schedule, the throughput is improved by 30–76% with 19.2% area overhead and negligible energy overhead.

  • High Throughput Turbo Decoding Scheme

    Jaesung CHOI  Joonyoung SHIN  Jeong Woo LEE  

     
    LETTER-Fundamental Theories for Communications

      Vol:
    E95-B No:6
      Page(s):
    2109-2112

    A new high-throughput turbo decoding scheme adopting double flow, sliding window and shuffled decoding is proposed. Analytical and numerical results show that the proposed scheme requires low number of clock cycles and small memory size to achieve a BER performance equivalent to those of existing schemes.

  • High-Speed and Low-Complexity Decoding Architecture for Double Binary Turbo Code

    Kon-Woo KWON  Kwang-Hyun BAEK  Jeong Woo LEE  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E94-A No:11
      Page(s):
    2458-2461

    We propose a high-speed and low-complexity architecture for the very large-scale integration (VLSI) implementation of the maximum a posteriori probability (MAP) algorithm suited to the double binary turbo decoder. For this purpose, equation manipulations on the conventional Linear-Log-MAP algorithm and architectural optimization are proposed. It is shown by synthesized simulations that the proposed architecture improves speed, area and power compared with the state-of-the-art Linear-Log-MAP architecture. It is also observed that the proposed architecture shows good overall performance in terms of error correction capability as well as decoder hardware's speed, complexity and throughput.

  • Development of 100 MHz Bandwidth Testbed toward IMT-Advanced and Experimental Results Including Rotational OFDM and Twin Turbo Decoder Transmission Performances

    Noriaki MIYAZAKI  Yasuyuki HATAKAWA  Toshinori SUZUKI  

     
    PAPER-Broadband Wireless Access System

      Vol:
    E92-A No:9
      Page(s):
    2209-2217

    Aiming at actual evaluation of IMT-Advanced system performance using field tests, this paper develops an IMT-Advanced testbed system with a transmission bandwidth of 100 MHz. Taking into account recent advances in research and development of an IMT-Advanced system, orthogonal frequency division multiplexing (OFDM) with multiple-input multiple-output (MIMO) are also promising technologies in IMT-Advanced. In addition, in order to meet the requirements for IMT-Advanced, the system seems to have a bandwidth of about 100 MHz with the aid of MIMO transmission. The developed system is based on the above more reliable prediction compared with previous studies, and the goals of this development are to provide a more realistic transmission performance, judgment criteria for operators introducing new air interfaces, and to explore new applications. This paper also presents the experimental results of rotational OFDM (R-OFDM) and twin turbo (T2) decoder implemented in the testbed and demonstrates that our proposals are better than the conventional schemes in actual radio transmission. Both physical layer technologies have been proposed by the authors, however, the previous works are only predicated on computer simulation. In this paper, the proposals are experimentally evaluated by distorting the transmitted signal on radio waves with a fading simulator and additional noise generator. When the packet error rate performance is measured, the measurement results are verified to be in good agreement with the simulation results. The experimental results also demonstrate that the R-OFDM can reduce the required carrier to the interference power ratio (CIR) of OFDM by about 1.1 dB in single-input single output (SISO) multi-path fading channel. In addition, it becomes clear that the T2 decoder is better than the turbo decoder in error correction, and the required CIR reduction achieves about 0.8 dB in SISO AWGN channel. The throughput performances are also measured with different modulation and coding conditions, and the measured forward throughput in the SISO AWGN channel achieves up to 373.6 Mbps. In addition, by use of 22 MIMO transmission, the measurements results substantiate that throughput of 512.7 Mbps can be realized even in the multi-path fading condition.

  • Design of Low Power QPP Interleave Address Generator Using the Periodicity of QPP

    Won-Ho LEE  Chong Suck RIM  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E92-A No:6
      Page(s):
    1538-1540

    This paper presents two power-saving designs for Quadratic Polynomial Permutation (QPP) interleave address generator of which interleave length K is fixed and unfixed, respectively. These designs are based on our observation that the quadratic term f2x2%K of f(x) = (f1x+f2x2)%K, which is the QPP address generating function, has a short period and is symmetric within the period. Power consumption is reduced by 27.4% in the design with fixed-K and 5.4% in the design with unfixed-K on the average for various values of K, when compared with existing designs.

  • A Low-Complexity Stopping Criterion for Iterative Turbo Decoding

    Dong-Soo LEE  In-Cheol PARK  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E88-B No:1
      Page(s):
    399-401

    This letter proposes an efficient and simple stopping criterion for turbo decoding, which is derived by observing the behavior of log-likelihood ratio (LLR) values. Based on the behavior, the proposed criterion counts the number of absolute LLR values less than a threshold and the number of hard decision 1's in order to complete the iterative decoding procedure. Simulation results show that the proposed approach achieves a reduced number of iterations while maintaining similar BER/FER performance to the previous criteria.

  • Implementation of a Two-Step SOVA Decoder with a Fixed Scaling Factor

    Taek-Won KWON  Jun-Rim CHOI  

     
    PAPER-Wireless Communication Technology

      Vol:
    E86-B No:6
      Page(s):
    1893-1900

    Two implementation schemes for a two-step SOVA (Soft Output Viterbi Algorithm) decoder are proposed and verified in a chip. One uses the combination of trace back (TB) logic to find the survivor state and double trace back logic to find the weighting factor of a two-step SOVA. The other is that the reliability values are divided by a scaling factor in order to compensate for the distortion brought by overestimating those values in SOVA. We introduced a fixed scaling factor of 0.25 or 0.33 for a rate 1/3 and designed an 8-state Turbo decoder with a 256-bit frame size to lower the reliability values. The implemented architecture of the two-step SOVA decoder allows important savings in area and high-speed processing compared with the one-step SOVA decoder using register exchange (RE) or trace-back (TB) method. The chip is fabricated using 0.65 µm gate array at Samsung Electronics and it shows higher SNR performance by 2 dB at the BER 10-4 than that of a two-step SOVA decoder without a scaling factor.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.