1-3hit |
Md. Anisuzzaman SIDDIQUE Hao TIAN Yasuhiko MORIMOTO
Filtering uninteresting data is important to utilize “big data”. Skyline query is popular technique to filter uninteresting data, in which it selects a set of objects that are not dominated by another from a given large database. However, a skyline query often retrieves too many objects to analyze intensively especially for high-dimensional dataset. To solve the problem, k-dominant skyline queries have been introduced. The size of databases sometimes become too large to compute in a centralized environment. Conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper, we consider an efficient parallel algorithm to process k-dominant skyline query in MapReduce framework. Extensive experiments demonstrate the scalability of proposed algorithm for synthetic big datasets under different settings of data distribution, dimensionality, and cardinality.
Takeshi FUKUDA Yasuhiko MORIMOTO Shinichi MORISHITA Takeshi TOKUYAMA
In this paper, we investigate inverse problems of the interval query problem in application to data mining. Let I be the set of all intervals on U = {1, 2, , n}. Consider an objective function f(I), conditional functions ui(I) on I, and define an optimization problem of finding the interval I maximizing f(I) subject to ui(I) > Ki for given real numbers Ki (i = 1, 2, , h). We propose efficient alogorithms to solve the above optimization problem if the objective function is either additive or quotient, and the conditional functions are additive, where a function f is additive if f(I) = ΣiIf^(i) extending a function f^ on U, and quotient if it is represented as a quotient of two additive functions. We use computational-geometric methods such as convex hull, range searching, and multidimensional divide-and-conquer.
Md. Anisuzzaman SIDDIQUE Yasuhiko MORIMOTO
Given a set of objects, a skyline query finds the objects that are not dominated by others. We consider a skyline query for sets of objects in a database in this paper. Let s be the number of objects in each set and n be the number of objects in the database. The number of sets in the database amounts to nCs. We propose an efficient algorithm to compute convex skyline of the nCs sets. We call the retrieve skyline objectsets as "convex skyline objectsets". Experimental evaluation using real and synthetic datasets demonstrates that the proposed skyline objectset query is meaningful and is scalable enough to handle large and high dimensional databases. Recently, we have to aware individual's privacy. Sometimes, we have to hide individual values and are only allowed to disclose aggregated values of objects. In such situation, we cannot use conventional skyline queries. The proposed function can be a promising alternative in decision making in a privacy aware environment.