Kazuo AOYAMA Kazumi SAITO Tetsuo IKEDA
This paper presents an efficient acceleration algorithm for Lloyd-type k-means clustering, which is suitable to a large-scale and high-dimensional data set with potentially numerous classes. The algorithm employs a novel projection-based filter (PRJ) to avoid unnecessary distance calculations, resulting in high-speed performance keeping the same results as a standard Lloyd's algorithm. The PRJ exploits a summable lower bound on a squared distance defined in a lower-dimensional space to which data points are projected. The summable lower bound can make the bound tighter dynamically by incremental addition of components in the lower-dimensional space within each iteration although the existing lower bounds used in other acceleration algorithms work only once as a fixed filter. Experimental results on large-scale and high-dimensional real image data sets demonstrate that the proposed algorithm works at high speed and with low memory consumption when large k values are given, compared with the state-of-the-art algorithms.
Ensemble learning is widely used in the field of sensor network monitoring and target identification. To improve the generalization ability and classification precision of ensemble learning, we first propose an approximate attribute reduction algorithm based on rough sets in this paper. The reduction algorithm uses mutual information to measure attribute importance and introduces a correction coefficient and an approximation parameter. Based on a random sampling strategy, we use the approximate attribute reduction algorithm to implement the multi-modal sample space perturbation. To further reduce the ensemble size and realize a dynamic subset of base classifiers that best matches the test sample, we define a similarity parameter between the test samples and training sample sets that takes the similarity and number of the training samples into consideration. We then propose a k-means clustering-based dynamic ensemble selection algorithm. Simulations show that the multi-modal perturbation method effectively selects important attributes and reduces the influence of noise on the classification results. The classification precision and runtime of experiments demonstrate the effectiveness of the proposed dynamic ensemble selection algorithm.
Tsuyoshi HIGASHIGUCHI Norimichi UKITA Masayuki KANBARA Norihiro HAGITA
This paper proposes a method for predicting individuality-preserving gait patterns. Physical rehabilitation can be performed using visual and/or physical instructions by physiotherapists or exoskeletal robots. However, a template-based rehabilitation may produce discomfort and pain in a patient because of deviations from the natural gait of each patient. Our work addresses this problem by predicting an individuality-preserving gait pattern for each patient. In this prediction, the transition of the gait patterns is modeled by associating the sequence of a 3D skeleton in gait with its continuous-value gait features (e.g., walking speed or step width). In the space of the prediction model, the arrangement of the gait patterns are optimized so that (1) similar gait patterns are close to each other and (2) the gait feature changes smoothly between neighboring gait patterns. This model allows to predict individuality-preserving gait patterns of each patient even if his/her various gait patterns are not available for prediction. The effectiveness of the proposed method is demonstrated quantitatively. with two datasets.
Koya MITSUZUKA Michihiro KOIBUCHI Hideharu AMANO Hiroki MATSUTANI
In parallel processing applications, a few worker nodes called “stragglers”, which execute their tasks significantly slower than other tasks, increase the execution time of the job. In this paper, we propose a network switch based straggler handling system to mitigate the burden of the compute nodes. We also propose how to offload detecting stragglers and computing their results in the network switch with no additional communications between worker nodes. We introduce some approximate techniques for the proxy computation and response at the switch; thus our switch is called “ApproxSW.” As a result of a simulation experiment, the proposed approximation based on task similarity achieves the best accuracy in terms of quality of generated Map outputs. We also analyze how to suppress unnecessary proxy computation by the ApproxSW. We implement ApproxSW on NetFPGA-SUME board that has four 10Gbit Ethernet (10GbE) interfaces and a Virtex-7 FPGA. Experimental results shows that the ApproxSW functions do not degrade the original 10GbE switch performance.
Takashi WATANABE Akito MONDEN Zeynep YÜCEL Yasutaka KAMEI Shuji MORISAKI
Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case.
Yuehua WANG Zhinong ZHONG Anran YANG Ning JING
Review rating prediction is an important problem in machine learning and data mining areas and has attracted much attention in recent years. Most existing methods for review rating prediction on Location-Based Social Networks only capture the semantics of texts, but ignore user information (social links, geolocations, etc.), which makes them less personalized and brings down the prediction accuracy. For example, a user's visit to a venue may be influenced by their friends' suggestions or the travel distance to the venue. To address this problem, we develop a review rating prediction framework named TSG by utilizing users' review Text, Social links and the Geolocation information with machine learning techniques. Experimental results demonstrate the effectiveness of the framework.
Warunya WUNNASRI Jaruwat PAILAI Yusuke HAYASHI Tsukasa HIRASHIMA
Collaborative learning is an active teaching and learning strategy, in which learners who give each other elaborated explanations can learn most. However, it is difficult for learners to explain their own understanding elaborately in collaborative learning. In this study, we propose a collaborative use of a Kit-Build concept map (KB map) called “Reciprocal KB map”. In a Reciprocal KB map for a pair discussion, at first, the two participants make their own concept maps expressing their comprehension. Then, they exchange the components of their maps and request each other to reconstruct their maps by using the components. The differences between the original map and the reconstructed map are diagnosed automatically as an advantage of the KB map. Reciprocal KB map is expected to encourage pair discussion to recognize the understanding of each other and to create an effective discussion. In an experiment reported in this paper, Reciprocal KB map was used for supporting a pair discussion and was compared with a pair discussion which was supported by a traditional concept map. Nineteen pairs of university students were requested to use the traditional concept map in their discussion, while 20 pairs of university students used Reciprocal KB map for discussing the same topic. The results of the experiment were analyzed using three metrics: a discussion score, a similarity score, and questionnaires. The discussion score, which investigates the value of talk in discussion, demonstrates that Reciprocal KB map can promote more effective discussion between the partners compared to the traditional concept map. The similarity score, which evaluates the similarity of the concept maps, demonstrates that Reciprocal KB map can encourage the pair of partners to understand each other better compared to the traditional concept map. Last, the questionnaires illustrate that Reciprocal KB map can support the pair of partners to collaborate in the discussion smoothly and that the participants accepted this method for sharing their understanding with each other. These results suggest that Reciprocal KB map is a promising approach for encouraging pairs of partners to understand each other and to promote the effective discussions.
In order to obtain road information, we propose an information acquisition method using infrared laser radar by detecting 3D reflector code on roadside. The infrared laser radar on vehicle scans the 3D reflector code on guardrail. Through experiments, we show that the proposed method is able to obtain road information by detecting 3D reflector code on guardrail.
Takayoshi SHOUDAI Tetsuhiro MIYAHARA Tomoyuki UCHIDA Satoshi MATSUMOTO Yusuke SUZUKI
A term is a connected acyclic graph (unrooted unordered tree) pattern with structured variables, which are ordered lists of one or more distinct vertices. A variable of a term has a variable label and can be replaced with an arbitrary tree by hyperedge replacement according to the variable label. The dimension of a term is the maximum number of vertices in the variables of it. A term is said to be linear if each variable label in it occurs exactly once. Let T be a tree and t a linear term. In this paper, we study the graph pattern matching problem (GPMP) for T and t, which decides whether or not T is obtained from t by replacing variables in t with some trees. First we show that GPMP for T and t is NP-complete if the dimension of t is greater than or equal to 4. Next we give a polynomial time algorithm for solving GPMP for a tree of bounded degree and a linear term of bounded dimension. Finally we show that GPMP for a tree of arbitrary degree and a linear term of dimension 2 is solvable in polynomial time.
Multisignatures are digital signatures for a group consisting of multiple signers where each signer signs common documents via interaction with its co-signers and the data size of the resultant signatures for the group is independent of the number of signers. In this work, we propose a multisignature scheme, whose security can be tightly reduced to the CDH problem in bilinear groups, in the strongest security model where nothing more is required than that each signer has a public key, i.e., the plain public key model. Loosely speaking, our main idea for a tight reduction is to utilize a three-round interaction in a full-domain hash construction. Namely, we surmise that a full-domain hash construction with three-round interaction will become tightly secure under the CDH problem. In addition, we show that the existing scheme by Zhou et al. (ISC 2011) can be improved to a construction with a tight security reduction as an application of our proof framework.
Geunseok YANG Tao ZHANG Byungjeong LEE
Many software development teams usually tend to focus on maintenance activities in general. Recently, many studies on bug severity prediction have been proposed to help a bug reporter determine severity. But they do not consider the reporter's expression of emotion appearing in the bug report when they predict the bug severity level. In this paper, we propose a novel approach to severity prediction for reported bugs by using emotion similarity. First, we do not only compute an emotion-word probability vector by using smoothed unigram model (UM), but we also use the new bug report to find similar-emotion bug reports with Kullback-Leibler divergence (KL-divergence). Then, we introduce a new algorithm, Emotion Similarity (ES)-Multinomial, which modifies the original Naïve Bayes Multinomial algorithm. We train the model with emotion bug reports by using ES-Multinomial. Finally, we can predict the bug severity level in the new bug report. To compare the performance in bug severity prediction, we select related studies including Emotion Words-based Dictionary (EWD)-Multinomial, Naïve Bayes Multinomial, and another study as baseline approaches in open source projects (e.g., Eclipse, GNU, JBoss, Mozilla, and WireShark). The results show that our approach outperforms the baselines, and can reflect reporters' emotional expressions during the bug reporting.
Lei ZHANG Guoxing ZHANG Zhizheng LIANG Qingfu FAN Yadong LI
The traditional Markov prediction methods of the taxi destination rely only on the previous 2 to 3 GPS points. They negelect long-term dependencies within a taxi trajectory. We adopt a Recurrent Neural Network (RNN) to explore the long-term dependencies to predict the taxi destination as the multiple hidden layers of RNN can store these dependencies. However, the hidden layers of RNN are very sensitive to small perturbations to reduce the prediction accuracy when the amount of taxi trajectories is increasing. In order to improve the prediction accuracy of taxi destination and reduce the training time, we embed suprisal-driven zoneout (SDZ) to RNN, hence a taxi destination prediction method by regularized RNN with SDZ (TDPRS). SDZ can not only improve the robustness of TDPRS, but also reduce the training time by adopting partial update of parameters instead of a full update. Experiments with a Porto taxi trajectory data show that TDPRS improves the prediction accuracy by 12% compared to RNN prediction method in literature[4]. At the same time, the prediction time is reduced by 7%.
Kenichi FUKUDA Toshimitsu USHIO
A composite system consists of many subsystems, which have interconnections with other subsystems. For such a system, in general, we utilize decentralized control, where each subsystem is controlled by a local controller. On the other hand, event-triggered control is one of useful approaches to reduce the amount of communications between a controller and a plant. In the event-triggered control, an event triggering mechanism (ETM) monitors the information of the plant, and determines the time to transmit the data. In this paper, we propose a design of ETMs for the decentralized event-triggered control of nonlinear composite systems using an M-matrix. We consider the composite system where there is an ETM for each subsystem, and ETMs monitor local states of the corresponding subsystems. Each ETM is designed so that the composite system is stabilized. Moreover, we deal with the case of linear systems. Finally, we perform simulation to show that the proposed triggering rules are useful for decentralized control.
Youngjun YOO Daesung JUNG Sangchul WON
We propose a weighted subtask controller and sufficient conditions for boundedness of the controller both velocity and acceleration domain. Prior to designing the subtask controller, a task controller is designed for global asymptotic stability of task space error and subtask error. Although the subtask error converges to zero by the task controller, the boundedness of the subtask controller is also important, therefore its boundedness conditions are presented. The weighted pseudo inverse is introduced to relax the constraints of the null-space of Jacobian. Using the pseudo inverse, we design subtask controller and propose sufficient conditions for boundedness of the auxiliary signal to show the existence of the inverse kinematic solution. The results of experiments using 7-DOF WAM show the effectiveness of the proposed controller.
Lei ZHANG Qingfu FAN Guoxing ZHANG Zhizheng LIANG
Existing trajectory prediction methods suffer from the “data sparsity” and neglect “time awareness”, which leads to low accuracy. Aiming to the problem, we propose a fast time-aware sparse trajectories prediction with tensor factorization method (TSTP-TF). Firstly, we do trajectory synthesis based on trajectory entropy and put synthesized trajectories into the original trajectory space. It resolves the sparse problem of trajectory data and makes the new trajectory space more reliable. Then, we introduce multidimensional tensor modeling into Markov model to add the time dimension. Tensor factorization is adopted to infer the missing regions transition probabilities to further solve the problem of data sparsity. Due to the scale of the tensor, we design a divide and conquer tensor factorization model to reduce memory consumption and speed up decomposition. Experiments with real dataset show that TSTP-TF improves prediction accuracy generally by as much as 9% and 2% compared to the Baseline algorithm and ESTP-MF algorithm, respectively.
Kunihiro NODA Takashi KOBAYASHI Noritoshi ATSUMI
Behaviors of an object-oriented system can be visualized as reverse-engineered sequence diagrams from execution traces. This approach is a valuable tool for program comprehension tasks. However, owing to the massiveness of information contained in an execution trace, a reverse-engineered sequence diagram is often afflicted by a scalability issue. To address this issue, many trace summarization techniques have been proposed. Most of the previous techniques focused on reducing the vertical size of the diagram. To cope with the scalability issue, decreasing the horizontal size of the diagram is also very important. Nonetheless, few studies have addressed this point; thus, there is a lot of needs for further development of horizontal summarization techniques. We present in this paper a method for identifying core objects for trace summarization by analyzing reference relations and dynamic properties. Visualizing only interactions related to core objects, we can obtain a horizontally compactified reverse-engineered sequence diagram that contains system's key behaviors. To identify core objects, first, we detect and eliminate temporary objects that are trivial for a system by analyzing reference relations and lifetimes of objects. Then, estimating the importance of each non-trivial object based on their dynamic properties, we identify highly important ones (i.e., core objects). We implemented our technique in our tool and evaluated it by using traces from various open-source software systems. The results showed that our technique was much more effective in terms of the horizontal reduction of a reverse-engineered sequence diagram, compared with the state-of-the-art trace summarization technique. The horizontal compression ratio of our technique was 134.6 on average, whereas that of the state-of-the-art technique was 11.5. The runtime overhead imposed by our technique was 167.6% on average. This overhead is relatively small compared with recent scalable dynamic analysis techniques, which shows the practicality of our technique. Overall, our technique can achieve a significant reduction of the horizontal size of a reverse-engineered sequence diagram with a small overhead and is expected to be a valuable tool for program comprehension.
Naohiro TODA Tetsuya NAKAGAMI Yoichi YAMAZAKI Hiroki YOSHIOKA Shuji KOYAMA
In X-ray computed tomography, scattered X-rays are generally removed by using a post-patient collimator located in front of the detector. In this paper, we show that the scattered X-rays have the potential to improve the estimation accuracy of the attenuation coefficient in computed tomography. In order to clarify the problem, we simplified the geometry of the computed tomography into a thin cylinder composed of a homogeneous material so that only one attenuation coefficient needs to be estimated. We then conducted a Monte Carlo numerical experiment on improving the estimation accuracy of attenuation coefficient by measuring the scattered X-rays with several dedicated toroidal detectors around the cylinder in addition to the primary X-rays. We further present a theoretical analysis to explain the experimental results. We employed a model that uses a T-junction (i.e., T-junction model) to divide the photon transport into primary and scattered components. This division is processed with respect to the attenuation coefficient. Using several T-junction models connected in series, we modeled the case of several scatter detectors. The estimation accuracy was evaluated according to the variance of the efficient estimator, i.e., the Cramer-Rao lower bound. We confirmed that the variance decreases as the number of scatter detectors increases, which implies that using scattered X-rays can reduce the irradiation dose for patients.
Tatsuro KOJO Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA
Non-volatile memories are a promising alternative to memory design but data stored in them still may be destructed due to crosstalk and radiation. The data stored in them can be restored by using error-correcting codes but they require extra bits to correct bit errors. One of the largest problems in non-volatile memories is that they consume ten to hundred times more energy than normal memories in bit-writing. It is quite necessary to reduce writing bits. Recently, a REC code (bit-write-reducing and error-correcting code) is proposed for non-volatile memories which can reduce writing bits and has a capability of error correction. The REC code is generated from a linear systematic error-correcting code but it must include the codeword of all 1's, i.e., 11…1. The codeword bit length must be longer in order to satisfy this condition. In this letter, we propose a method to generate a relaxed REC code which is generated from a relaxed error-correcting code, which does not necessarily include the codeword of all 1's and thus its codeword bit length can be shorter. We prove that the maximum flipping bits of the relaxed REC code is still limited theoretically. Experimental results show that the relaxed REC code efficiently reduce the number of writing bits.
Tianyi XIE Bin LYU Zhen YANG Feng TIAN
In this letter, we study a wireless powered communication network (WPCN) with non-orthogonal multiple access (NOMA), where the user clustering scheme that groups each two users in a cluster is adopted to guarantee the system performance. The two users in a cluster transmit data simultaneously via NOMA, while time division multiple access (TDMA) is used among clusters. We aim to maximize the system throughput by finding the optimal cluster permutation and the optimal time allocation, which can be obtained by solving the optimization problems corresponding to all cluster permutations. The closed-form solution of each optimization problem is obtained by exploiting its constraint structures. However, the complexity of this exhaustive method is quite high, we further propose a sub-optimal clustering scheme with low complexity. The simulation results demonstrate the superiority of the proposed scheme.
Bo WEI Kenji KANAI Wataru KAWAKAMI Jiro KATTO
Throughput prediction is one of the promising techniques to improve the quality of service (QoS) and quality of experience (QoE) of mobile applications. To address the problem of predicting future throughput distribution accurately during the whole session, which can exhibit large throughput fluctuations in different scenarios (especially scenarios of moving user), we propose a history-based throughput prediction method that utilizes time series analysis and machine learning techniques for mobile network communication. This method is called the Hybrid Prediction with the Autoregressive Model and Hidden Markov Model (HOAH). Different from existing methods, HOAH uses Support Vector Machine (SVM) to classify the throughput transition into two classes, and predicts the transmission control protocol (TCP) throughput by switching between the Autoregressive Model (AR Model) and the Gaussian Mixture Model-Hidden Markov Model (GMM-HMM). We conduct field experiments to evaluate the proposed method in seven different scenarios. The results show that HOAH can predict future throughput effectively and decreases the prediction error by a maximum of 55.95% compared with other methods.