Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

Shuai MU, Dongdong LI, Yubei CHEN, Yangdong DENG, Zhihua WANG

  • Full Text Views

    0

  • Cite this

Summary :

By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.

Publication
IEICE TRANSACTIONS on Information Vol.E96-D No.10 pp.2194-2207
Publication Date
2013/10/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E96.D.2194
Type of Manuscript
PAPER
Category
Computer System

Authors

Shuai MU
  Tsinghua University
Dongdong LI
  Tsinghua University
Yubei CHEN
  Tsinghua University
Yangdong DENG
  Tsinghua University
Zhihua WANG
  Tsinghua University

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.