Optimizing Hash Join with MapReduce on Multi-Core CPUs

Tong YUAN; Zhijing LIU; Hui LIU

doi:10.1587/transinf.2015EDP7306

Optimizing Hash Join with MapReduce on Multi-Core CPUs

Tong YUAN, Zhijing LIU, Hui LIU

Full Text Views

0

Share
Cite this

Summary :

In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.5 pp.1316-1325

Publication Date: 2016/05/01

Publicized: 2016/02/04

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7306

Type of Manuscript: PAPER

Category: Data Engineering, Web Information Systems

Authors

Tong YUAN
  Xidian University
Zhijing LIU
  Xidian University
Hui LIU
  Xidian University

Keyword

hash join, database system, MapReduce, multi-core CPU, cuckoo hashing

Cite this

Copy

Tong YUAN, Zhijing LIU, Hui LIU, "Optimizing Hash Join with MapReduce on Multi-Core CPUs" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 5, pp. 1316-1325, May 2016, doi: 10.1587/transinf.2015EDP7306.
Abstract: In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7306/_p

Copy

@ARTICLE{e99-d_5_1316,
author={Tong YUAN, Zhijing LIU, Hui LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Optimizing Hash Join with MapReduce on Multi-Core CPUs},
year={2016},
volume={E99-D},
number={5},
pages={1316-1325},
abstract={In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.},
keywords={},
doi={10.1587/transinf.2015EDP7306},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Optimizing Hash Join with MapReduce on Multi-Core CPUs
T2 - IEICE TRANSACTIONS on Information
SP - 1316
EP - 1325
AU - Tong YUAN
AU - Zhijing LIU
AU - Hui LIU
PY - 2016
DO - 10.1587/transinf.2015EDP7306
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2016
AB - In this paper, we exploit MapReduce framework and other optimizations to improve the performance of hash join algorithms on multi-core CPUs, including No partition hash join and partition hash join. We first implement hash join algorithms with a shared-memory MapReduce model on multi-core CPUs, including partition phase, build phase, and probe phase. Then we design an improved cuckoo hash table for our hash join, which consists of a cuckoo hash table and a chained hash table. Based on our implementation, we also propose two optimizations, one for the usage of SIMD instructions, and the other for partition phase. Through experimental result and analysis, we finally find that the partition hash join often outperforms the No partition hash join, and our hash join algorithm is faster than previous work by an average of 30%.
ER -