In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.
Bongjin KIM
Korea Advanced Institute of Science and Technology (KAIST)
In-Cheol PARK
Korea Advanced Institute of Science and Technology (KAIST)
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Bongjin KIM, In-Cheol PARK, "Area-Efficient QC-LDPC Decoder Architecture Based on Stride Scheduling and Memory Bank Division" in IEICE TRANSACTIONS on Communications,
vol. E96-B, no. 7, pp. 1772-1779, July 2013, doi: 10.1587/transcom.E96.B.1772.
Abstract: In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.
URL: https://globals.ieice.org/en_transactions/communications/10.1587/transcom.E96.B.1772/_p
Copy
@ARTICLE{e96-b_7_1772,
author={Bongjin KIM, In-Cheol PARK, },
journal={IEICE TRANSACTIONS on Communications},
title={Area-Efficient QC-LDPC Decoder Architecture Based on Stride Scheduling and Memory Bank Division},
year={2013},
volume={E96-B},
number={7},
pages={1772-1779},
abstract={In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.},
keywords={},
doi={10.1587/transcom.E96.B.1772},
ISSN={1745-1345},
month={July},}
Copy
TY - JOUR
TI - Area-Efficient QC-LDPC Decoder Architecture Based on Stride Scheduling and Memory Bank Division
T2 - IEICE TRANSACTIONS on Communications
SP - 1772
EP - 1779
AU - Bongjin KIM
AU - In-Cheol PARK
PY - 2013
DO - 10.1587/transcom.E96.B.1772
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E96-B
IS - 7
JA - IEICE TRANSACTIONS on Communications
Y1 - July 2013
AB - In this paper, an area-efficient decoder architecture is proposed for the quasi-cyclic low-density parity check (QC-LDPC) codes specified in the IEEE 802.16e WiMAX standard. The decoder supports all the code rates and codeword lengths defined in the standard. In order to achieve low area and maximize hardware utilization, the decoder utilizes 4 decoding function units, which is the greatest common divisor of the expansion factors. In addition, the decoder adopts a novel scheduling scheme named stride scheduling, which stores the extrinsic messages in non-sequential order to replace the conventional complex flexible permutation network with simple small-sized cyclic shifters and also minimize the number of memory accesses. To further minimize the complexity, the number of extrinsic memory instances for 24 block columns is reduced to 5 banks by identifying independent sets. All the memory instances used in the decoder are single-port memories which cost less area and price compared to dual-port ones. Finally, the decoding function units have partially parallel structure to make the decoding throughput sufficiently over the requirement of the WiMAX standard. The proposed decoder is synthesized with 49 K equivalent gates and 54,144 bits of memory, and the implementation occupies 0.40 mm2 in a 65 nm CMOS technology.
ER -