1-11hit |
The study proposes a personalised session-based recommender system that embeds items by using Word2Vec and sequentially updates the session and user embeddings with the hierarchicalization and aggregation of item embeddings. To process a recommendation request, the system constructs a real-time user embedding that considers users’ general preferences and sequential behaviour to handle short-term changes in user preferences with a low computational cost. The system performance was experimentally evaluated in terms of the accuracy, diversity, and novelty of the ranking of recommended items and the training and prediction times of the system for three different datasets. The results of these evaluations were then compared with those of the five baseline systems. According to the evaluation experiment, the proposed system achieved a relatively high recommendation accuracy compared with baseline systems and the diversity and novelty scores of the proposed system did not fall below 90% for any dataset. Furthermore, the training times of the Word2Vec-based systems, including the proposed system, were shorter than those of FPMC and GRU4Rec. The evaluation results suggest that the proposed recommender system succeeds in keeping the computational cost for training low while maintaining high-level recommendation accuracy, diversity, and novelty.
Ryoto OMACHI Yasuyuki MURAKAMI
The damage cost caused by malware has been increasing in the world. Usually, malwares are packed so that it is not detected. It is a hard task even for professional malware analysts to identify the packers especially when the malwares are multi-layer packed. In this letter, we propose a method to identify the packers for multi-layer packed malwares by using k-nearest neighbor algorithm with entropy-analysis for the malwares.
As the data size of Web-related multi-label classification problems continues to increase, the label space has also grown extremely large. For example, the number of labels appearing in Web page tagging and E-commerce recommendation tasks reaches hundreds of thousands or even millions. In this paper, we propose a graph partitioning tree (GPT), which is a novel approach for extreme multi-label learning. At an internal node of the tree, the GPT learns a linear separator to partition a feature space, considering approximate k-nearest neighbor graph of the label vectors. We also developed a simple sequential optimization procedure for learning the linear binary classifiers. Extensive experiments on large-scale real-world data sets showed that our method achieves better prediction accuracy than state-of-the-art tree-based methods, while maintaining fast prediction.
Vince Jebryl MONTERO Yong-Jin JEONG
This paper presents an approach for developing an algorithm for automatic license plate recognition system (ALPR) on complex scenes. A plate-style classification method is also proposed in this paper to address the inherent challenges for ALPR in a system that uses multiple plate-styles (e.g., different fonts, multiple plate lay-out, variations in character sequences) which is the case in the current Philippine license plate system. Methods are proposed for each ALPR module: plate detection, character segmentation, and character recognition. K-nearest neighbor (KNN) is used as a classifier for character recognition together with a proposed confidence scoring to rate the decision made by the classifier. A small dataset of Philippine license plates but with relevant features of complex scenarios for ALPR is prepared. Using the proposed system on the prepared dataset, the performance of the system is evaluated on different categories of complex scenes. The proposed algorithm structure shows promising results and yielded an overall accuracy higher than the existing ALPR systems on the dataset consisting mostly of complex scenes.
Extreme multi-label classification methods have been widely used in Web-scale classification tasks such as Web page tagging and product recommendation. In this paper, we present a novel graph embedding method called “AnnexML”. At the training step, AnnexML constructs a k-nearest neighbor graph of label vectors and attempts to reproduce the graph structure in the embedding space. The prediction is efficiently performed by using an approximate nearest neighbor search method that efficiently explores the learned k-nearest neighbor graph in the embedding space. We conducted evaluations on several large-scale real-world data sets and compared our method with recent state-of-the-art methods. Experimental results show that our AnnexML can significantly improve prediction accuracy, especially on data sets that have a larger label space. In addition, AnnexML improves the trade-off between prediction time and accuracy. At the same level of accuracy, the prediction time of AnnexML was up to 58 times faster than that of SLEEC, a state-of-the-art embedding-based method.
Li ZHANG Dawei LI Xuecheng ZOU Yu HU Xiaowei XU
With an annual growth of billions of sensor-based devices, it is an urgent need to do stream mining for the massive data streams produced by these devices. Cloud computing is a competitive choice for this, with powerful computational capabilities. However, it sacrifices real-time feature and energy efficiency. Application-specific integrated circuit (ASIC) is with high performance and efficiency, which is not cost-effective for diverse applications. The general-purpose microcontroller is of low performance. Therefore, it is a challenge to do stream mining on these low-cost devices with scalability and efficiency. In this paper, we introduce an FPGA-based scalable and parameterized architecture for stream mining.Particularly, Dynamic Time Warping (DTW) based k-Nearest Neighbor (kNN) is adopted in the architecture. Two processing element (PE) rings for DTW and kNN are designed to achieve parameterization and scalability with high performance. We implement the proposed architecture on an FPGA and perform a comprehensive performance evaluation. The experimental results indicate thatcompared to the multi-core CPU-based implementation, our approach demonstrates over one order of magnitude on speedup and three orders of magnitude on energy-efficiency.
Eiji UCHINO Ryosuke KUBOTA Takanori KOGA Hideaki MISAWA Noriaki SUETAKE
In this paper we propose a novel classification method for the multiple k-nearest neighbor (MkNN) classifier and show its practical application to medical image processing. The proposed method performs fine classification when a pair of the spatial coordinate of the observation data in the observation space and its corresponding feature vector in the feature space is provided. The proposed MkNN classifier uses the continuity of the distribution of features of the same class not only in the feature space but also in the observation space. In order to validate the performance of the present method, it is applied to the tissue characterization problem of coronary plaque. The quantitative and qualitative validity of the proposed MkNN classifier have been confirmed by actual experiments.
Tomohiro WARASHINA Kazuo AOYAMA Hiroshi SAWADA Takashi HATTORI
This paper presents an efficient method using Hadoop MapReduce for constructing a K-nearest neighbor graph (K-NNG) from a large-scale data set. K-NNG has been utilized as a data structure for data analysis techniques in various applications. If we are to apply the techniques to a large-scale data set, it is desirable that we develop an efficient K-NNG construction method. We focus on NN-Descent, which is a recently proposed method that efficiently constructs an approximate K-NNG. NN-Descent is implemented on a shared-memory system with OpenMP-based parallelization, and its extension for the Hadoop MapReduce framework is implied for a larger data set such that the shared-memory system is difficult to deal with. However, a simple extension for the Hadoop MapReduce framework is impractical since it requires extremely high system performance because of the high memory consumption and the low data transmission efficiency of MapReduce jobs. The proposed method relaxes the requirement by improving the MapReduce jobs, which employs an appropriate key-value pair format and an efficient sampling strategy. Experiments on large-scale data sets demonstrate that the proposed method both works efficiently and is scalable in terms of a data size, the number of machine nodes, and the graph structural parameter K.
Kenshi SAHO Takuya SAKAMOTO Toru SATO Kenichi INOUE Takeshi FUKUDA
The classification of human motion is an important aspect of monitoring pedestrian traffic. This requires the development of advanced surveillance and monitoring systems. Methods to achieve this have been proposed using micro-Doppler radars. However, reliable long-term data and/or complicated procedures are needed to classify motion accurately with these conventional methods because their accuracy and real-time capabilities are invariably inadequate. This paper proposes an accurate and real-time method for classifying the movements of pedestrians using ultra wide-band (UWB) Doppler radar to overcome these problems. The classification of various movements is achieved by extracting feature parameters based on UWB Doppler radar images and their radial velocity distributions. Experiments were carried out assuming six types of pedestrian movements (pedestrians swinging both arms, swinging only one arm, swinging no arms, on crutches, pushing wheelchairs, and seated in wheelchairs). We found they could be classified using the proposed feature parameters and a k-nearest neighbor algorithm. A classification accuracy of 96% was achieved with a mean calculation time of 0.55s. Moreover, the classification accuracy was 99% using our proposed method for classifying three groups of pedestrian movements (normal walkers, those on crutches, and those in wheelchairs).
Biao WANG Wenming YANG Weifeng LI Qingmin LIAO
In the task of face recognition, a challenging issue is the one sample problem, namely, there is only one training sample per person. Principal component analysis (PCA) seeks a low-dimensional representation that maximizes the global scatter of the training samples, and thus is suitable for one sample problem. However, standard PCA is sensitive to the outliers and emphasizes more on the relatively distant sample pairs, which implies that the close samples belonging to different classes tend to be merged together. In this paper, we propose two-stage block-based whitened PCA (TS-BWPCA) to address this problem. For a specific probe image, in the first stage, we seek the K-Nearest Neighbors (K-NNs) in the whitened PCA space and thus exclude most of samples which are distant to the probe. In the second stage, we maximize the “local” scatter by performing whitened PCA on the K nearest samples, which could explore the most discriminative information for similar classes. Moreover, block-based scheme is incorporated to address the small sample problem. This two-stage process is actually a coarse-to-fine scheme that can maximize both global and local scatter, and thus overcomes the aforementioned shortcomings of PCA. Experimental results on FERET face database show that our proposed algorithm is better than several representative approaches.
Ruicong ZHI Qiuqi RUAN Jiying WU
This paper proposes a novel algorithm for image feature extraction-the dual two-dimensional fuzzy class preserving projections ((2D)2FCPP). The main advantages of (2D)2FCPP over two-dimensional locality preserving projections (2DLPP) are: (1) utilizing the fuzzy assignation mechanisms to construct the weight matrix, which can improve the classification results; (2) incorporating 2DLPP and alternative 2DLPP to get a more efficient dimensionality reduction method-(2D)2LPP.