IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] turbo decoder(9hit)

1-9hit

A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access
Heungseop AHN Seungwon CHOI

PAPER-Communication Theory and Signals

Vol:
E100-A No:5
Page(s):
1188-1196
The sub-blocking algorithm has been known as a core component in implementing a turbo decoder using a Graphic Processing Unit (GPU) to use as many cores in the GPU as possible for parallel processing. However, even though the sub-blocking algorithm allows a large number of threads in a given GPU to be adopted for processing a large number of sub-blocks in parallel, each thread must access the global memory with strided addresses, which results in uncoalesced memory access. Because uncoalesced memory access causes a lot of unnecessary memory transactions, the memory bandwidth efficiency drops significantly, possibly as low as 1/8 in the case of an Long Term Evolution (LTE) turbo decoder, depending upon the compute capability of a GPU. In this paper, we present a novel method for converting uncoalesced memory access into coalesced access in a way that completely recovers the memory bandwidth efficiency to 100% without additional overhead. Our experimental tests, performed with NVIDIA's Geforce GTX 780 Ti GPU, show that the proposed method can enhance the throughput by nearly 30% compared with a conventional turbo decoder that suffers from uncoalesced memory access. Throughput provided by the proposed method has been observed to be 51.4Mbps when the number of iterations and that of sub-blocks are set to 6 and 32, respectively, in our experimental tests, which far exceeds the performance of previous works implemented the Max-Log-MAP algorithm.
An Efficient Parallel SOVA-Based Turbo Decoder for Software Defined Radio on GPU
Rongchun LI Yong DOU Jiaqing XU Xin NIU Shice NI

PAPER-Digital Signal Processing

Vol:
E97-A No:5
Page(s):
1027-1036
In this paper, we propose a fully parallel Turbo decoder for Software-Defined Radio (SDR) on the Graphics Processing Unit (GPU) platform. Soft Output Viterbi algorithm (SOVA) is chosen for its low complexity and high throughput. The parallelism of SOVA is fully analyzed and the whole codeword is divided into multiple sub-codewords, where the turbo-pass decoding procedures are performed in parallel by independent sub-decoders. In each sub-decoder, an efficient initialization method is exploited to assure the bit error ratio (BER) performance. The sub-decoders are mapped to numerous blocks on the GPU. Several optimization methods are employed to enhance the throughput, such as the memory optimization, codeword packing scheme, and asynchronous data transfer. The experiment shows that our decoder has BER performance close to Max-Log-MAP and the peak throughput is 127.84Mbps, which is about two orders of magnitude faster than that of central processing unit (CPU) implementation, which is comparable to application-specific integrated circuit (ASIC) solutions. The presented decoder can achieve higher throughput than that of the existing fastest GPU-based implementation.
A 1.5 Gb/s Highly Parallel Turbo Decoder for 3GPP LTE/LTE-Advanced
Yun CHEN Xubin CHEN Zhiyuan GUO Xiaoyang ZENG Defeng HUANG

LETTER-Fundamental Theories for Communications

Vol:
E96-B No:5
Page(s):
1211-1214
A highly parallel turbo decoder for 3GPP LTE/LTE-Advanced systems is presented. It consists of 32 radix-4 soft-in/soft-out (SISO) decoders. Each SISO decoder is based on the proposed full-parallel sliding window (SW) schedule. Implemented in a 0.13 µm CMOS technology, the proposed design occupies 12.96 mm2 and achieves 1.5 Gb/s while decoding size-6144 blocks with 5.5 iterations. Compared with conventional SW schedule, the throughput is improved by 30–76% with 19.2% area overhead and negligible energy overhead.
High Throughput Turbo Decoding Scheme
Jaesung CHOI Joonyoung SHIN Jeong Woo LEE

LETTER-Fundamental Theories for Communications

Vol:
E95-B No:6
Page(s):
2109-2112
A new high-throughput turbo decoding scheme adopting double flow, sliding window and shuffled decoding is proposed. Analytical and numerical results show that the proposed scheme requires low number of clock cycles and small memory size to achieve a BER performance equivalent to those of existing schemes.
High-Speed and Low-Complexity Decoding Architecture for Double Binary Turbo Code
Kon-Woo KWON Kwang-Hyun BAEK Jeong Woo LEE

LETTER-VLSI Design Technology and CAD

Vol:
E94-A No:11
Page(s):
2458-2461
We propose a high-speed and low-complexity architecture for the very large-scale integration (VLSI) implementation of the maximum a posteriori probability (MAP) algorithm suited to the double binary turbo decoder. For this purpose, equation manipulations on the conventional Linear-Log-MAP algorithm and architectural optimization are proposed. It is shown by synthesized simulations that the proposed architecture improves speed, area and power compared with the state-of-the-art Linear-Log-MAP architecture. It is also observed that the proposed architecture shows good overall performance in terms of error correction capability as well as decoder hardware's speed, complexity and throughput.
Development of 100 MHz Bandwidth Testbed toward IMT-Advanced and Experimental Results Including Rotational OFDM and Twin Turbo Decoder Transmission Performances
Noriaki MIYAZAKI Yasuyuki HATAKAWA Toshinori SUZUKI

PAPER-Broadband Wireless Access System

Vol:
E92-A No:9
Page(s):
2209-2217
Aiming at actual evaluation of IMT-Advanced system performance using field tests, this paper develops an IMT-Advanced testbed system with a transmission bandwidth of 100 MHz. Taking into account recent advances in research and development of an IMT-Advanced system, orthogonal frequency division multiplexing (OFDM) with multiple-input multiple-output (MIMO) are also promising technologies in IMT-Advanced. In addition, in order to meet the requirements for IMT-Advanced, the system seems to have a bandwidth of about 100 MHz with the aid of MIMO transmission. The developed system is based on the above more reliable prediction compared with previous studies, and the goals of this development are to provide a more realistic transmission performance, judgment criteria for operators introducing new air interfaces, and to explore new applications. This paper also presents the experimental results of rotational OFDM (R-OFDM) and twin turbo (T2) decoder implemented in the testbed and demonstrates that our proposals are better than the conventional schemes in actual radio transmission. Both physical layer technologies have been proposed by the authors, however, the previous works are only predicated on computer simulation. In this paper, the proposals are experimentally evaluated by distorting the transmitted signal on radio waves with a fading simulator and additional noise generator. When the packet error rate performance is measured, the measurement results are verified to be in good agreement with the simulation results. The experimental results also demonstrate that the R-OFDM can reduce the required carrier to the interference power ratio (CIR) of OFDM by about 1.1 dB in single-input single output (SISO) multi-path fading channel. In addition, it becomes clear that the T2 decoder is better than the turbo decoder in error correction, and the required CIR reduction achieves about 0.8 dB in SISO AWGN channel. The throughput performances are also measured with different modulation and coding conditions, and the measured forward throughput in the SISO AWGN channel achieves up to 373.6 Mbps. In addition, by use of 22 MIMO transmission, the measurements results substantiate that throughput of 512.7 Mbps can be realized even in the multi-path fading condition.
Design of Low Power QPP Interleave Address Generator Using the Periodicity of QPP
Won-Ho LEE Chong Suck RIM

LETTER-VLSI Design Technology and CAD

Vol:
E92-A No:6
Page(s):
1538-1540
This paper presents two power-saving designs for Quadratic Polynomial Permutation (QPP) interleave address generator of which interleave length K is fixed and unfixed, respectively. These designs are based on our observation that the quadratic term f2x2%K of f(x) = (f1x+f2x2)%K, which is the QPP address generating function, has a short period and is symmetric within the period. Power consumption is reduced by 27.4% in the design with fixed-K and 5.4% in the design with unfixed-K on the average for various values of K, when compared with existing designs.
A Low-Complexity Stopping Criterion for Iterative Turbo Decoding
Dong-Soo LEE In-Cheol PARK

LETTER-Wireless Communication Technologies

Vol:
E88-B No:1
Page(s):
399-401
This letter proposes an efficient and simple stopping criterion for turbo decoding, which is derived by observing the behavior of log-likelihood ratio (LLR) values. Based on the behavior, the proposed criterion counts the number of absolute LLR values less than a threshold and the number of hard decision 1's in order to complete the iterative decoding procedure. Simulation results show that the proposed approach achieves a reduced number of iterations while maintaining similar BER/FER performance to the previous criteria.
Implementation of a Two-Step SOVA Decoder with a Fixed Scaling Factor
Taek-Won KWON Jun-Rim CHOI

PAPER-Wireless Communication Technology

Vol:
E86-B No:6
Page(s):
1893-1900
Two implementation schemes for a two-step SOVA (Soft Output Viterbi Algorithm) decoder are proposed and verified in a chip. One uses the combination of trace back (TB) logic to find the survivor state and double trace back logic to find the weighting factor of a two-step SOVA. The other is that the reliability values are divided by a scaling factor in order to compensate for the distortion brought by overestimating those values in SOVA. We introduced a fixed scaling factor of 0.25 or 0.33 for a rate 1/3 and designed an 8-state Turbo decoder with a 256-bit frame size to lower the reliability values. The implemented architecture of the two-step SOVA decoder allows important savings in area and high-speed processing compared with the one-step SOVA decoder using register exchange (RE) or trace-back (TB) method. The chip is fabricated using 0.65 µm gate array at Samsung Electronics and it shows higher SNR performance by 2 dB at the BER 10-4 than that of a two-step SOVA decoder without a scaling factor.