Random Forest Algorithm for Linked Data Using a Parallel Processing Environment

Dongkyu JEON; Wooju KIM

doi:10.1587/transinf.2014EDP7171

Random Forest Algorithm for Linked Data Using a Parallel Processing Environment

Dongkyu JEON, Wooju KIM

Full Text Views

0

Share
Cite this

Summary :

In recent years, there has been a significant growth in the importance of data mining of graph-structured data due to this technology's rapid increase in both scale and application areas. Many previous studies have investigated decision tree learning on Semantic Web-based linked data to uncover implicit knowledge. In the present paper, we suggest a new random forest algorithm for linked data to overcome the underlying limitations of the decision tree algorithm, such as local optimal decisions and generalization error. Moreover, we designed a parallel processing environment for random forest learning to manage large-size linked data and increase the efficiency of multiple tree generation. For this purpose, we modified the previous candidate feature searching method of the decision tree algorithm for linked data to reduce the feature searching space of random forest learning and developed feature selection methods that are adjusted to linked data. Using a distributed index-based search engine, we designed a parallel random forest learning system for linked data to generate random forests in parallel. Our proposed system enables users to simultaneously generate multiple decision trees from distributed stored linked data. To evaluate the performance of the proposed algorithm, we performed experiments to compare the classification accuracy when using the single decision tree algorithm. The experimental results revealed that our random forest algorithm is more accurate than the single decision tree algorithm.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.2 pp.372-380

Publication Date: 2015/02/01

Publicized: 2014/11/12

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2014EDP7171

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Dongkyu JEON
Yonsei University
Wooju KIM
Yonsei University

Keyword

linked data, random forests, parallel processing, semantic Web

Cite this

Copy

Dongkyu JEON, Wooju KIM, "Random Forest Algorithm for Linked Data Using a Parallel Processing Environment" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 2, pp. 372-380, February 2015, doi: 10.1587/transinf.2014EDP7171.
Abstract: In recent years, there has been a significant growth in the importance of data mining of graph-structured data due to this technology's rapid increase in both scale and application areas. Many previous studies have investigated decision tree learning on Semantic Web-based linked data to uncover implicit knowledge. In the present paper, we suggest a new random forest algorithm for linked data to overcome the underlying limitations of the decision tree algorithm, such as local optimal decisions and generalization error. Moreover, we designed a parallel processing environment for random forest learning to manage large-size linked data and increase the efficiency of multiple tree generation. For this purpose, we modified the previous candidate feature searching method of the decision tree algorithm for linked data to reduce the feature searching space of random forest learning and developed feature selection methods that are adjusted to linked data. Using a distributed index-based search engine, we designed a parallel random forest learning system for linked data to generate random forests in parallel. Our proposed system enables users to simultaneously generate multiple decision trees from distributed stored linked data. To evaluate the performance of the proposed algorithm, we performed experiments to compare the classification accuracy when using the single decision tree algorithm. The experimental results revealed that our random forest algorithm is more accurate than the single decision tree algorithm.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2014EDP7171/_p

Copy

@ARTICLE{e98-d_2_372,
author={Dongkyu JEON, Wooju KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Random Forest Algorithm for Linked Data Using a Parallel Processing Environment},
year={2015},
volume={E98-D},
number={2},
pages={372-380},
abstract={In recent years, there has been a significant growth in the importance of data mining of graph-structured data due to this technology's rapid increase in both scale and application areas. Many previous studies have investigated decision tree learning on Semantic Web-based linked data to uncover implicit knowledge. In the present paper, we suggest a new random forest algorithm for linked data to overcome the underlying limitations of the decision tree algorithm, such as local optimal decisions and generalization error. Moreover, we designed a parallel processing environment for random forest learning to manage large-size linked data and increase the efficiency of multiple tree generation. For this purpose, we modified the previous candidate feature searching method of the decision tree algorithm for linked data to reduce the feature searching space of random forest learning and developed feature selection methods that are adjusted to linked data. Using a distributed index-based search engine, we designed a parallel random forest learning system for linked data to generate random forests in parallel. Our proposed system enables users to simultaneously generate multiple decision trees from distributed stored linked data. To evaluate the performance of the proposed algorithm, we performed experiments to compare the classification accuracy when using the single decision tree algorithm. The experimental results revealed that our random forest algorithm is more accurate than the single decision tree algorithm.},
keywords={},
doi={10.1587/transinf.2014EDP7171},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Random Forest Algorithm for Linked Data Using a Parallel Processing Environment
T2 - IEICE TRANSACTIONS on Information
SP - 372
EP - 380
AU - Dongkyu JEON
AU - Wooju KIM
PY - 2015
DO - 10.1587/transinf.2014EDP7171
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2015
AB - In recent years, there has been a significant growth in the importance of data mining of graph-structured data due to this technology's rapid increase in both scale and application areas. Many previous studies have investigated decision tree learning on Semantic Web-based linked data to uncover implicit knowledge. In the present paper, we suggest a new random forest algorithm for linked data to overcome the underlying limitations of the decision tree algorithm, such as local optimal decisions and generalization error. Moreover, we designed a parallel processing environment for random forest learning to manage large-size linked data and increase the efficiency of multiple tree generation. For this purpose, we modified the previous candidate feature searching method of the decision tree algorithm for linked data to reduce the feature searching space of random forest learning and developed feature selection methods that are adjusted to linked data. Using a distributed index-based search engine, we designed a parallel random forest learning system for linked data to generate random forests in parallel. Our proposed system enables users to simultaneously generate multiple decision trees from distributed stored linked data. To evaluate the performance of the proposed algorithm, we performed experiments to compare the classification accuracy when using the single decision tree algorithm. The experimental results revealed that our random forest algorithm is more accurate than the single decision tree algorithm.
ER -