Managing the Synchronization in the Lambda Architecture for Optimized Big Data Analysis

Thomas VANHOVE; Gregory VAN SEGHBROECK; Tim WAUTERS; Bruno VOLCKAERT; Filip DE TURCK

doi:10.1587/transcom.2015ITI0001

Open Access
Managing the Synchronization in the Lambda Architecture for Optimized Big Data Analysis

Thomas VANHOVE, Gregory VAN SEGHBROECK, Tim WAUTERS, Bruno VOLCKAERT, Filip DE TURCK

Full Text Views

102

Share
Cite this

Free PDF (2MB)

Summary :

In a world of continuously expanding amounts of data, retrieving interesting information from enormous data sets becomes more complex every day. Solutions for precomputing views on these big data sets mostly follow either an offline approach, which is slow but can take into account the entire data set, or a streaming approach, which is fast but only relies on the latest data entries. A hybrid solution was introduced through the Lambda architecture concept. It combines both offline and streaming approaches by analyzing data in a fast speed layer first, and in a slower batch layer later. However, this introduces a new synchronization challenge: once the data is analyzed by the batch layer, the corresponding information needs to be removed in the speed layer without introducing redundancy or loss of data. In this paper we propose a new approach to implement the Lambda architecture concept independent of the technologies used for offline and stream computing. A universal solution is provided to manage the complex synchronization introduced by the Lambda architecture and techniques to provide fault tolerance. The proposed solution is evaluated by means of detailed experimental results.

Publication: IEICE TRANSACTIONS on Communications Vol.E99-B No.2 pp.297-306

Publication Date: 2016/02/01

Publicized

Online ISSN: 1745-1345

DOI: 10.1587/transcom.2015ITI0001

Type of Manuscript: Special Section INVITED PAPER (Special Section on Management for the Era of Internet of Things and Big Data)

Category

Authors

Thomas VANHOVE
  Ghent University - iMinds
Gregory VAN SEGHBROECK
  Ghent University - iMinds
Tim WAUTERS
  Ghent University - iMinds
Bruno VOLCKAERT
  Ghent University - iMinds
Filip DE TURCK
  Ghent University - iMinds

Keyword

Lambda architecture, synchronization, big data, Tengu

Cite this

Copy

Thomas VANHOVE, Gregory VAN SEGHBROECK, Tim WAUTERS, Bruno VOLCKAERT, Filip DE TURCK, "Managing the Synchronization in the Lambda Architecture for Optimized Big Data Analysis" in IEICE TRANSACTIONS on Communications, vol. E99-B, no. 2, pp. 297-306, February 2016, doi: 10.1587/transcom.2015ITI0001.
Abstract: In a world of continuously expanding amounts of data, retrieving interesting information from enormous data sets becomes more complex every day. Solutions for precomputing views on these big data sets mostly follow either an offline approach, which is slow but can take into account the entire data set, or a streaming approach, which is fast but only relies on the latest data entries. A hybrid solution was introduced through the Lambda architecture concept. It combines both offline and streaming approaches by analyzing data in a fast speed layer first, and in a slower batch layer later. However, this introduces a new synchronization challenge: once the data is analyzed by the batch layer, the corresponding information needs to be removed in the speed layer without introducing redundancy or loss of data. In this paper we propose a new approach to implement the Lambda architecture concept independent of the technologies used for offline and stream computing. A universal solution is provided to manage the complex synchronization introduced by the Lambda architecture and techniques to provide fault tolerance. The proposed solution is evaluated by means of detailed experimental results.
URL: https://globals.ieice.org/en_transactions/communications/10.1587/transcom.2015ITI0001/_p

Copy

@ARTICLE{e99-b_2_297,
author={Thomas VANHOVE, Gregory VAN SEGHBROECK, Tim WAUTERS, Bruno VOLCKAERT, Filip DE TURCK, },
journal={IEICE TRANSACTIONS on Communications},
title={Managing the Synchronization in the Lambda Architecture for Optimized Big Data Analysis},
year={2016},
volume={E99-B},
number={2},
pages={297-306},
abstract={In a world of continuously expanding amounts of data, retrieving interesting information from enormous data sets becomes more complex every day. Solutions for precomputing views on these big data sets mostly follow either an offline approach, which is slow but can take into account the entire data set, or a streaming approach, which is fast but only relies on the latest data entries. A hybrid solution was introduced through the Lambda architecture concept. It combines both offline and streaming approaches by analyzing data in a fast speed layer first, and in a slower batch layer later. However, this introduces a new synchronization challenge: once the data is analyzed by the batch layer, the corresponding information needs to be removed in the speed layer without introducing redundancy or loss of data. In this paper we propose a new approach to implement the Lambda architecture concept independent of the technologies used for offline and stream computing. A universal solution is provided to manage the complex synchronization introduced by the Lambda architecture and techniques to provide fault tolerance. The proposed solution is evaluated by means of detailed experimental results.},
keywords={},
doi={10.1587/transcom.2015ITI0001},
ISSN={1745-1345},
month={February},}

Copy

TY - JOUR
TI - Managing the Synchronization in the Lambda Architecture for Optimized Big Data Analysis
T2 - IEICE TRANSACTIONS on Communications
SP - 297
EP - 306
AU - Thomas VANHOVE
AU - Gregory VAN SEGHBROECK
AU - Tim WAUTERS
AU - Bruno VOLCKAERT
AU - Filip DE TURCK
PY - 2016
DO - 10.1587/transcom.2015ITI0001
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E99-B
IS - 2
JA - IEICE TRANSACTIONS on Communications
Y1 - February 2016
AB - In a world of continuously expanding amounts of data, retrieving interesting information from enormous data sets becomes more complex every day. Solutions for precomputing views on these big data sets mostly follow either an offline approach, which is slow but can take into account the entire data set, or a streaming approach, which is fast but only relies on the latest data entries. A hybrid solution was introduced through the Lambda architecture concept. It combines both offline and streaming approaches by analyzing data in a fast speed layer first, and in a slower batch layer later. However, this introduces a new synchronization challenge: once the data is analyzed by the batch layer, the corresponding information needs to be removed in the speed layer without introducing redundancy or loss of data. In this paper we propose a new approach to implement the Lambda architecture concept independent of the technologies used for offline and stream computing. A universal solution is provided to manage the complex synchronization introduced by the Lambda architecture and techniques to provide fault tolerance. The proposed solution is evaluated by means of detailed experimental results.
ER -