The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Xu ZHOU, Kai LU, Xiaoping WANG, Wenzhe ZHANG, Kai ZHANG, Xu LI, Gen LI, "Deterministic Message Passing for Distributed Parallel Computing" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 5, pp. 1068-1077, May 2013, doi: 10.1587/transinf.E96.D.1068.
Abstract: The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E96.D.1068/_p
Copy
@ARTICLE{e96-d_5_1068,
author={Xu ZHOU, Kai LU, Xiaoping WANG, Wenzhe ZHANG, Kai ZHANG, Xu LI, Gen LI, },
journal={IEICE TRANSACTIONS on Information},
title={Deterministic Message Passing for Distributed Parallel Computing},
year={2013},
volume={E96-D},
number={5},
pages={1068-1077},
abstract={The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).},
keywords={},
doi={10.1587/transinf.E96.D.1068},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Deterministic Message Passing for Distributed Parallel Computing
T2 - IEICE TRANSACTIONS on Information
SP - 1068
EP - 1077
AU - Xu ZHOU
AU - Kai LU
AU - Xiaoping WANG
AU - Wenzhe ZHANG
AU - Kai ZHANG
AU - Xu LI
AU - Gen LI
PY - 2013
DO - 10.1587/transinf.E96.D.1068
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2013
AB - The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
ER -