Keyword Search Result

[Keyword] hidden Markov model(71hit)


  • Construction of Ergodic GMM-HMMs for Classification between Healthy Individuals and Patients Suffering from Pulmonary Disease Open Access

    Masaru YAMASHITA  

    PAPER-Pattern Recognition

    E107-D No:4

    Owing to the several cases wherein abnormal sounds, called adventitious sounds, are included in the lung sounds of a patient suffering from pulmonary disease, the objective of this study was to automatically detect abnormal sounds from auscultatory sounds. To this end, we expressed the acoustic features of the normal lung sounds of healthy people and abnormal lung sounds of patients using Gaussian mixture model (GMM)-hidden Markov models (HMMs), and distinguished between normal and abnormal lung sounds. In our previous study, we constructed left-to-right GMM-HMMs with a limited number of states. Because we expressed abnormal sounds that occur intermittently and repeatedly using limited states, the GMM-HMMs could not express the acoustic features of abnormal sounds. Furthermore, because the analysis frame length and intervals were long, the GMM-HMMs could not express the acoustic features of short time segments, such as heart sounds. Therefore, the classification rate of normal and abnormal respiration was low (86.60%). In this study, we propose the construction of ergodic GMM-HMMs with a repetitive structure for intermittent sounds. Furthermore, we considered a suitable frame length and frame interval to analyze acoustic features. Using the ergodic GMM-HMM, which can express the acoustic features of abnormal sounds and heart sounds that occur repeatedly in detail, the classification rate increased (89.34%). The results obtained in this study demonstrated the effectiveness of the proposed method.

  • Acoustic HMMs to Detect Abnormal Respiration with Limited Training Data

    Masaru YAMASHITA  

    PAPER-Pattern Recognition

    E106-D No:3

    In many situations, abnormal sounds, called adventitious sounds, are included with the lung sounds of a subject suffering from pulmonary diseases. Thus, a method to automatically detect abnormal sounds in auscultation was proposed. The acoustic features of normal lung sounds for control subjects and abnormal lung sounds for patients are expressed using hidden markov models (HMMs) to distinguish between normal and abnormal lung sounds. Furthermore, abnormal sounds were detected in a noisy environment, including heart sounds, using a heart-sound model. However, the F1-score obtained in detecting abnormal respiration was low (0.8493). Moreover, the duration and acoustic properties of segments of respiratory, heart, and adventitious sounds varied. In our previous method, the appropriate HMMs for the heart and adventitious sound segments were constructed. Although the properties of the types of adventitious sounds varied, an appropriate topology for each type was not considered. In this study, appropriate HMMs for the segments of each type of adventitious sound and other segments were constructed. The F1-score was increased (0.8726) by selecting a suitable topology for each segment. The results demonstrate the effectiveness of the proposed method.

  • Predicting A Growing Stage of Rice Plants Based on The Cropping Records over 25 Years — A Trial of Feature Engineering Incorporating Hidden Regional Characteristics —

    Hiroshi UEHARA  Yasuhiro IUCHI  Yusuke FUKAZAWA  Yoshihiro KANETA  


    E105-D No:5

    This study tries to predict date of ear emergence of rice plants, based on cropping records over 25 years. Predicting ear emergence of rice plants is known to be crucial for practicing good harvesting quality, and has long been dependent upon old farmers who acquire skills of intuitive prediction based on their long term experiences. Facing with aging farmers, data driven approach for the prediction have been pursued. Nevertheless, they are not necessarily sufficient in terms of practical use. One of the issue is to adopt weather forecast as the feature so that the predictive performance is varied by the accuracy of the forecast. The other issue is that the performance is varied by region and the regional characteristics have not been used as the features for the prediction. With this background, we propose a feature engineering to quantify hidden regional characteristics as the feature for the prediction. Further the feature is engineered based only on observational data without any forecast. Applying our proposal to the data on the cropping records resulted in sufficient predictive performance, ±2.69days of RMSE.

  • Proposing High-Smart Approach for Content Authentication and Tampering Detection of Arabic Text Transmitted via Internet

    Fahd N. AL-WESABI  

    PAPER-Information Network

    E103-D No:10

    The security and reliability of Arabic text exchanged via the Internet have become a challenging area for the research community. Arabic text is very sensitive to modify by malicious attacks and easy to make changes on diacritics i.e. Fat-ha, Kasra and Damma, which are represent the syntax of Arabic language and can make the meaning is differing. In this paper, a Hybrid of Natural Language Processing and Zero-Watermarking Approach (HNLPZWA) has been proposed for the content authentication and tampering detection of Arabic text. The HNLPZWA approach embeds and detects the watermark logically without altering the original text document to embed a watermark key. Fifth level order of word mechanism based on hidden Markov model is integrated with digital zero-watermarking techniques to improve the tampering detection accuracy issues of the previous literature proposed by the researchers. Fifth-level order of Markov model is used as a natural language processing technique in order to analyze the Arabic text. Moreover, it extracts the features of interrelationship between contexts of the text and utilizes the extracted features as watermark information and validates it later with attacked Arabic text to detect any tampering occurred on it. HNLPZWA has been implemented using PHP with VS code IDE. Tampering detection accuracy of HNLPZWA is proved with experiments using four datasets of varying lengths under multiple random locations of insertion, reorder and deletion attacks of experimental datasets. The experimental results show that HNLPZWA is more sensitive for all kinds of tampering attacks with high level accuracy of tampering detection.

  • HOAH: A Hybrid TCP Throughput Prediction with Autoregressive Model and Hidden Markov Model for Mobile Networks

    Bo WEI  Kenji KANAI  Wataru KAWAKAMI  Jiro KATTO  


    E101-B No:7

    Throughput prediction is one of the promising techniques to improve the quality of service (QoS) and quality of experience (QoE) of mobile applications. To address the problem of predicting future throughput distribution accurately during the whole session, which can exhibit large throughput fluctuations in different scenarios (especially scenarios of moving user), we propose a history-based throughput prediction method that utilizes time series analysis and machine learning techniques for mobile network communication. This method is called the Hybrid Prediction with the Autoregressive Model and Hidden Markov Model (HOAH). Different from existing methods, HOAH uses Support Vector Machine (SVM) to classify the throughput transition into two classes, and predicts the transmission control protocol (TCP) throughput by switching between the Autoregressive Model (AR Model) and the Gaussian Mixture Model-Hidden Markov Model (GMM-HMM). We conduct field experiments to evaluate the proposed method in seven different scenarios. The results show that HOAH can predict future throughput effectively and decreases the prediction error by a maximum of 55.95% compared with other methods.

  • Workflow Extraction for Service Operation Using Multiple Unstructured Trouble Tickets

    Akio WATANABE  Keisuke ISHIBASHI  Tsuyoshi TOYONO  Keishiro WATANABE  Tatsuaki KIMURA  Yoichi MATSUO  Kohei SHIOMOTO  Ryoichi KAWAHARA  


    E101-D No:4

    In current large-scale IT systems, troubleshooting has become more complicated due to the diversification in the causes of failures, which has increased operational costs. Thus, clarifying the troubleshooting process also becomes important, though it is also time-consuming. We propose a method of automatically extracting a workflow, a graph indicating a troubleshooting process, using multiple trouble tickets. Our method extracts an operator's actions from free-format texts and aligns relative sentences between multiple trouble tickets. Our method uses a stochastic model to detect a resolution, a frequent action pattern that helps us understand how to solve a problem. We validated our method using real trouble-ticket data captured from a real network operation and showed that it can extract a workflow to identify the cause of a failure.

  • Dynamic Texture Classification Using Multivariate Hidden Markov Model

    Yu-Long QIAO  Zheng-Yi XING  


    E101-A No:1

    Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time. Hidden Markov model (HMM) is a statistical model, which has been used to model the dynamic texture. However, the texture is a region property. The traditional HMM models the property of a single pixel along the time, and does not consider the dependence of the spatial adjacent pixels of the dynamic texture. In this paper, the multivariate hidden Markov model (MHMM) is proposed to characterize and classify the dynamic textures. Specifically, the spatial adjacent pixels are modeled with multivariate hidden Markov model, in which the hidden states of those pixels are modeled with the multivariate Markov chain, and the intensity values of those pixels are modeled as the observation variables. Then the model parameters are used to describe the dynamic texture and the classification is based on the maximum likelihood criterion. The experiments on two benchmark datasets demonstrate the effectiveness of the introduced method.

  • HMM-Based Maximum Likelihood Frame Alignment for Voice Conversion from a Nonparallel Corpus

    Ki-Seung LEE  

    LETTER-Speech and Hearing

    E100-D No:12

    One of the problems associated with voice conversion from a nonparallel corpus is how to find the best match or alignment between the source and the target vector sequences without linguistic information. In a previous study, alignment was achieved by minimizing the distance between the source vector and the transformed vector. This method, however, yielded a sequence of feature vectors that were not well matched with the underlying speaker model. In this letter, the vectors were selected from the candidates by maximizing the overall likelihood of the selected vectors with respect to the target model in the HMM context. Both objective and subjective evaluations were carried out using the CMU ARCTIC database to verify the effectiveness of the proposed method.

  • A Bayesian Approach to Image Recognition Based on Separable Lattice Hidden Markov Models

    Kei SAWADA  Akira TAMAMORI  Kei HASHIMOTO  Yoshihiko NANKAKU  Keiichi TOKUDA  

    PAPER-Pattern Recognition

    E99-D No:12

    This paper proposes a Bayesian approach to image recognition based on separable lattice hidden Markov models (SL-HMMs). The geometric variations of the object to be recognized, e.g., size, location, and rotation, are an essential problem in image recognition. SL-HMMs, which have been proposed to reduce the effect of geometric variations, can perform elastic matching both horizontally and vertically. This makes it possible to model not only invariances to the size and location of the object but also nonlinear warping in both dimensions. The maximum likelihood (ML) method has been used in training SL-HMMs. However, in some image recognition tasks, it is difficult to acquire sufficient training data, and the ML method suffers from the over-fitting problem when there is insufficient training data. This study aims to accurately estimate SL-HMMs using the maximum a posteriori (MAP) and variational Bayesian (VB) methods. The MAP and VB methods can utilize prior distributions representing useful prior information, and the VB method is expected to obtain high generalization ability by marginalization of model parameters. Furthermore, to overcome the local maximum problem in the MAP and VB methods, the deterministic annealing expectation maximization algorithm is applied for training SL-HMMs. Face recognition experiments performed on the XM2VTS database indicated that the proposed method offers significantly improved image recognition performance. Additionally, comparative experiment results showed that the proposed method was more robust to geometric variations than convolutional neural networks.

  • Image Recognition Based on Separable Lattice Trajectory 2-D HMMs

    Akira TAMAMORI  Yoshihiko NANKAKU  Keiichi TOKUDA  

    PAPER-Pattern Recognition

    E97-D No:7

    In this paper, a novel statistical model based on 2-D HMMs for image recognition is proposed. Recently, separable lattice 2-D HMMs (SL2D-HMMs) were proposed to model invariance to size and location deformation. However, their modeling accuracy is still insufficient because of the following two assumptions, which are inherited from 1-D HMMs: i) the stationary statistics within each state and ii) the conditional independent assumption of state output probabilities. To overcome these shortcomings in 1-D HMMs, trajectory HMMs were proposed and successfully applied to speech recognition and speech synthesis. This paper derives 2-D trajectory HMMs by reformulating the likelihood of SL2D-HMMs through the imposition of explicit relationships between static and dynamic features. The proposed model can efficiently capture dependencies between adjacent observations without increasing the number of model parameters. The effectiveness of the proposed model was evaluated in face recognition experiments on the XM2VTS database.

  • Motion Pattern Study and Analysis from Video Monitoring Trajectory

    Kai KANG  Weibin LIU  Weiwei XING  

    PAPER-Pattern Recognition

    E97-D No:6

    This paper introduces an unsupervised method for motion pattern learning and abnormality detection from video surveillance. In the preprocessing steps, trajectories are segmented based on their locations, and the sub-trajectories are represented as codebooks. Under our framework, Hidden Markov Models (HMMs) are used to characterize the motion pattern feature of the trajectory groups. The state of trajectory is represented by a HMM and has a probability distribution over the possible output sub-trajectories. Bayesian Information Criterion (BIC) is introduced to measure the similarity between groups. Based on the pairwise similarity scores, an affinity matrix is constructed which indicates the distance between different trajectory groups. An Adaptable Dynamic Hierarchical Clustering (ADHC) tree is proposed to gradually merge the most similar groups and form the trajectory motion patterns, which implements a simpler and more tractable dynamical clustering procedure in updating the clustering results with lower time complexity and avoids the traditional overfitting problem. By using the HMM models generated for the obtained trajectory motion patterns, we may recognize motion patterns and detect anomalies by computing the likelihood of the given trajectory, where a maximum likelihood for HMM indicates a pattern, and a small one below a threshold suggests an anomaly. Experiments are performed on EIFPD trajectory datasets from a structureless scene, where pedestrians choose their walking paths randomly. The experimental results show that our method can accurately learn motion patterns and detect anomalies with better performance.

  • Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs

    Chen-Yu YANG  Zhen-Hua LING  Li-Rong DAI  

    PAPER-Speech Synthesis and Related Topics

    E97-D No:6

    In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.

  • A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

    Guangji HE  Takanobu SUGAHARA  Yuki MIYAMOTO  Shintaro IZUMI  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  


    E96-C No:4

    This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.

  • An Extension of Separable Lattice 2-D HMMs for Rotational Data Variations

    Akira TAMAMORI  Yoshihiko NANKAKU  Keiichi TOKUDA  

    PAPER-Pattern Recognition

    E95-D No:8

    This paper proposes a new generative model which can deal with rotational data variations by extending Separable Lattice 2-D HMMs (SL2D-HMMs). In image recognition, geometrical variations such as size, location and rotation degrade the performance. Therefore, the appropriate normalization processes for such variations are required. SL2D-HMMs can perform an elastic matching in both horizontal and vertical directions; this makes it possible to model invariance to size and location. To deal with rotational variations, we introduce additional HMM states which represent the shifts of the state alignments among the observation lines in a particular direction. Face recognition experiments show that the proposed method improves the performance significantly for rotational variation data.

  • Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition Open Access

    Yasuhisa FUJII  Kazumasa YAMAMOTO  Seiichi NAKAGAWA  

    PAPER-Speech and Hearing

    E95-D No:8

    In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.

  • Online Anomaly Prediction for Real-Time Stream Processing

    Yuanqiang HUANG  Zhongzhi LUAN  Depei QIAN  Zhigao DU  Ting CHEN  Yuebin BAI  

    PAPER-Network Management/Operation

    E95-B No:6

    With the consideration of real-time stream processing technology, it's important to develop high availability mechanism to guarantee stream-based application not interfered by faults caused by potential anomalies. In this paper, we present a novel online prediction technique for predicting some anomalies which may occur in the near future. Concretely, we first present a value prediction which combines the Hidden Markov Model and the Mixture of Expert Model to predict the values of feature metrics in the near future. Then we employ the Support Vector Machine to do anomaly identification, which is a procedure to identify the kind of anomaly that we are about to alarm. The purpose of our approach is to achieve a tradeoff between fault penalty and resource cost. The experiment results show that our approach is of high accuracy for common anomaly prediction and low runtime overhead.

  • A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

    Kazuhiro NAKAMURA  Ryo SHIMAZAKI  Masatoshi YAMAMOTO  Kazuyoshi TAKAGI  Naofumi TAKAGI  


    E95-C No:4

    This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.

  • Robust Gait-Based Person Identification against Walking Speed Variations

    Muhammad Rasyid AQMAR  Koichi SHINODA  Sadaoki FURUI  

    PAPER-Image Recognition, Computer Vision

    E95-D No:2

    Variations in walking speed have a strong impact on gait-based person identification. We propose a method that is robust against walking-speed variations. It is based on a combination of cubic higher-order local auto-correlation (CHLAC), gait silhouette-based principal component analysis (GSP), and a statistical framework using hidden Markov models (HMMs). The CHLAC features capture the within-phase spatio-temporal characteristics of each individual, the GSP features retain more shape/phase information for better gait sequence alignment, and the HMMs classify the ID of each gait even when walking speed changes nonlinearly. We compared the performance of our method with other conventional methods using five different databases, SOTON, USF-NIST, CMU-MoBo, TokyoTech A and TokyoTech B. The proposed method was equal to or better than the others when the speed did not change greatly, and it was significantly better when the speed varied across and within a gait sequence.

  • HMM-Based Underwater Target Classification with Synthesized Active Sonar Signals

    Taehwan KIM  Keunsung BAE  

    LETTER-Digital Signal Processing

    E94-A No:10

    This paper deals with underwater target classification using synthesized active sonar signals. Firstly, we synthesized active sonar returns from a 3D highlight model of underwater targets using the ray tracing algorithm. Then, we applied a multiaspect target classification scheme based on a hidden Markov model to classify them. For feature extraction from the synthesized sonar signals, a matching pursuit algorithm was used. The experimental results depending on the number of observations and signal-to-noise ratios are presented with our discussions.

  • VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

    Hiroki NOGUCHI  Kazuo MIURA  Tsuyoshi FUJINAGA  Takanobu SUGAHARA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  


    E94-C No:4

    We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.


FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.