One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4
Gugang GAO
Southeast University,JiangSu Police Institute
Peng CAO
Southeast University
Jun YANG
Southeast University
Longxing SHI
Southeast University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Gugang GAO, Peng CAO, Jun YANG, Longxing SHI, "Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 8, pp. 1654-1666, August 2013, doi: 10.1587/transinf.E96.D.1654.
Abstract: One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.1654/_p
Copy
@ARTICLE{e96-d_8_1654,
author={Gugang GAO, Peng CAO, Jun YANG, Longxing SHI, },
journal={IEICE TRANSACTIONS on Information},
title={Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC},
year={2013},
volume={E96-D},
number={8},
pages={1654-1666},
abstract={One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4
keywords={},
doi={10.1587/transinf.E96.D.1654},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC
T2 - IEICE TRANSACTIONS on Information
SP - 1654
EP - 1666
AU - Gugang GAO
AU - Peng CAO
AU - Jun YANG
AU - Longxing SHI
PY - 2013
DO - 10.1587/transinf.E96.D.1654
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2013
AB - One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4
ER -