1-5hit |
Woo-Lam KANG Hyeon-Gyu KIM Yoon-Joon LEE
This paper presents a method to reduce I/O cost in MapReduce when online analytical processing (OLAP) queries are used for data analysis. The proposed method consists of two basic ideas. First, to reduce network transmission cost, mappers are organized to receive only data necessary to perform a map task, not an entire set of input data. Second, to reduce storage consumption, only record IDs are stored for checkpointing, not the raw records. Experiments conducted with TPC-H benchmark show that the proposed method is about 40% faster than Hive, the well-known data warehouse solution for MapReduce, while reducing the size of data stored for checkpoining to about 80%.
Deokmin HAAM Hyeon-Gyu KIM Myoung-Ho KIM
This paper presents a filtering method for efficient face image retrieval over large volume of face databases. The proposed method employs a new face image descriptor, called a cell-orientation vector (COV). It has a simple form: a 72-dimensional vector of integers from 0 to 8. Despite of its simplicity, it achieves high accuracy and efficiency. Our experimental results show that the proposed method based on COVs provides better performance than a recent approach based on identity-based quantization in terms of both accuracy and efficiency.
Tae-Hyung KWON Hyeon-Gyu KIM Myoung-Ho KIM Jin-Hyun SON
A multiple stream join is one of the most important but high cost operations in ubiquitous streaming services. In this paper, we propose a newly improved and practical algorithm for joining multiple streams called AMJoin, which improves the multiple join performance by guaranteeing the detection of join failures in constant time. To achieve this goal, we first design a new data structure called BiHT (Bit-vector Hash Table) and present the overall behavior of AMJoin in detail. In addition, we show various experimental results and their analyses for clarifying its efficiency and practicability.
Hyeon-Gyu KIM Woo-Lam KANG Yoon-Joon LEE Myoung-Ho KIM
In this paper, we propose a predicate indexing method which handles equality and inequality tests separately. Our method uses a hash table for the equality test and a balanced binary search tree for the inequality test. Such a separate structure reduces a height of the search tree and the number of comparisons per tree node, as well as the cost for tree rebalancing. We compared our method with the IBS-tree which is one of the popular indexing methods suitable for data stream processing. Our experimental results show that the proposed method provides better insertion and search performances than the IBS-tree.
Hyeon-Gyu KIM Woo-Lam KANG Myoung-Ho KIM
Bursty and out-of-order tuple arrivals complicate the process of determining contents and boundaries of sliding windows. To process windows over such streams efficiently, we need to address two issues regarding fast tuple insertion and disorder control. In this paper, we focus on these issues to process sliding windows efficiently over disordered data streams.