The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.
Yoshikazu INAGAKI
Fujitsu Computer Technologies Limited,Nara Institute of Science and Technology
Shinya TAKAMAEDA-YAMAZAKI
Nara Institute of Science and Technology
Jun YAO
Nara Institute of Science and Technology
Yasuhiko NAKASHIMA
Nara Institute of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yoshikazu INAGAKI, Shinya TAKAMAEDA-YAMAZAKI, Jun YAO, Yasuhiko NAKASHIMA, "Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators" in IEICE TRANSACTIONS on Information,
vol. E98-D, no. 12, pp. 2141-2149, December 2015, doi: 10.1587/transinf.2015PAP0015.
Abstract: The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015PAP0015/_p
Copy
@ARTICLE{e98-d_12_2141,
author={Yoshikazu INAGAKI, Shinya TAKAMAEDA-YAMAZAKI, Jun YAO, Yasuhiko NAKASHIMA, },
journal={IEICE TRANSACTIONS on Information},
title={Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators},
year={2015},
volume={E98-D},
number={12},
pages={2141-2149},
abstract={The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.},
keywords={},
doi={10.1587/transinf.2015PAP0015},
ISSN={1745-1361},
month={December},}
Copy
TY - JOUR
TI - Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators
T2 - IEICE TRANSACTIONS on Information
SP - 2141
EP - 2149
AU - Yoshikazu INAGAKI
AU - Shinya TAKAMAEDA-YAMAZAKI
AU - Jun YAO
AU - Yasuhiko NAKASHIMA
PY - 2015
DO - 10.1587/transinf.2015PAP0015
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2015
AB - The Energy-aware Multi-mode Accelerator eXtension [24],[25] (EMAX) is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data, and image processing as well as low-power consumption. However, before mapping algorithms on the accelerator, application developers require sufficient knowledge of the hardware organization and specially designed instructions. They also need significant effort to tune the code for improving execution efficiency when no well-designed compiler or library is available. To address this problem, we focus on library support for stencil (nearest-neighbor) computations that represent a class of algorithms commonly used in many partial differential equation (PDE) solvers. In this research, we address the following topics: (1) system configuration, features, and mnemonics of EMAX; (2) instruction mapping techniques that reduce the amount of data to be read from the main memory; (3) performance evaluation of the library for PDE solvers. With the features of a library that can reuse the local data across the outer loop iterations and map many instructions by unrolling the outer loops, the amount of data to be read from the main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library reduced the execution time 23% more than a general-purpose processor.
ER -