1-11hit |
Shu-Min TSAI Jia-Ching WANG Jar-Ferr YANG Jhing-Fa WANG
In this paper, we propose a speech coding translation scheme by transferring coding parameters between GSM half rate and G.729 coders. Compared to the conventional decode-then-encode (DTE) scheme, the proposed parameter conversions provide speech interoperability between mobile and IP networks with reducing computational complexity and coding delay. Simulation results show that the proposed methods can reduce about 30% computational load and coding delay acquired in the target encoders and achieve almost imperceptible degradation in performance.
Jia-Ching WANG Hsiao-Ping LEE Jhing-Fa WANG Chung-Hsien YANG
In this paper, a new subspace-based speech enhancement algorithm is presented. First, we construct a perceptual filterbank from psycho-acoustic model and incorporate it in the subspace-based enhancement approach. This filterbank is created through a five-level wavelet packet decomposition. The masking properties of the human auditory system are then derived based on the perceptual filterbank. Finally, the prior SNR and the masking threshold of each critical band are taken to decide the attenuation factor of the optimal linear estimator. Five different types of in-car noises in TAICAR database were used in our evaluation. The experimental results demonstrated that our approach outperformed conventional subspace and spectral subtraction methods.
Viet-Hang DUONG Manh-Quan BUI Jian-Jiun DING Bach-Tung PHAM Pham The BAO Jia-Ching WANG
In this work, two new proposed NMF models are developed for facial expression recognition. They are called maximum volume constrained nonnegative matrix factorization (MV_NMF) and maximum volume constrained graph nonnegative matrix factorization (MV_GNMF). They achieve sparseness from a larger simplicial cone constraint and the extracted features preserve the topological structure of the original images.
Seksan MATHULAPRANGSAN Yuan-Shan LEE Jia-Ching WANG
This study presents a joint dictionary learning approach for speech emotion recognition named locality preserved joint nonnegative matrix factorization (LP-JNMF). The learned representations are shared between the learned dictionaries and annotation matrix. Moreover, a locality penalty term is incorporated into the objective function. Thus, the system's discriminability is further improved.
Jhing-Fa WANG Jia-Ching WANG An-Nan SUEN Chung-Hsien WU Fan-Min LI
In this paper, we present an efficient VLSI architecture for the stand-alone application of a speech recognition system based on discriminative Bayesian neural network (DBNN). Regarding the recognition phase, the architecture of the Bayesian distance unit (BDU) is constructed first. In association with the BDU, we propose a template-serial architecture for the path distance accumulation to perform the recognition procedure. A corresponding architecture is also developed to accelerate the discriminative training procedure. It contains the intelligent look-up table for the sigmoid function. In comparison to the traditional one-table method, the memory size reduces drastically with only slight loss of accuracy. Combining the proposed hardware accelerators with the cost efficient programmable core, we took the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility.
Anand PAUL Jhing-Fa WANG Jia-Ching WANG An-Chao TSAI Jang-Ting CHEN
This paper introduces a block based motion estimation algorithm based on projection with adaptive window size selection. The blocks cannot match well if their corresponding 1D projection does not match well, with this as foundation 2D block matching problem is translated to a simpler 1D matching, which eliminates majority of potential pixel participation. This projection method is combined with adaptive window size selection in which, appropriate search window for each block is determined on the basis of motion vectors and prediction errors obtained for the previous block, which makes this novel method several times faster than exhaustive search with negligible performance degradation. Encoding QCIF size video by the proposed method results in reduction of computational complexity of motion estimation by roughly 45% and over all encoding by 23%, while maintaining image/video quality.
Video coding plays an important role in human life especially in communications. H.264/AVC is a prominent video coding standard that has been used in a variety of applications due to its high efficiency comes from several new coding techniques. However, the extremely high encoding complexity hinders itself from real-time applications. This paper presents a new encoding algorithm that makes use of particle swarm optimization (PSO) to train discriminant functions for classification based fast mode decision. Experimental results show that the proposed algorithm can successfully reduce encoding time at the expense of negligible quality degradation and bitrate increases.
Bima PRIHASTO Tzu-Chiang TAI Pao-Chi CHANG Jia-Ching WANG
The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.
Viet-Hang DUONG Manh-Quan BUI Jian-Jiun DING Yuan-Shan LEE Bach-Tung PHAM Pham The BAO Jia-Ching WANG
This work presents a new approach which derives a learned data representation method through matrix factorization on the complex domain. In particular, we introduce an encoding matrix-a new representation of data-that satisfies the simplicial constraint of the projective basis matrix on the field of complex numbers. A complex optimization framework is provided. It employs the gradient descent method and computes the derivative of the cost function based on Wirtinger's calculus.
Chung-Hsien YANG Jia-Ching WANG Jhing-Fa WANG Chi-Wei CHANG
Two-dimensional discrete wavelet transform (DWT) for processing image is conventionally designed by line-based architectures, which are simple and have low complexity. However, they suffer from two main shortcomings - the memory required for storing intermediate data and the long latency of computing wavelet coefficients. This work presents a new block-based architecture for computing lifting-based 2-D DWT coefficients. This architecture yields a significantly lower buffer size. Additionally, the latency is reduced from N2 down to 3N as compared to the line-based architectures. The proposed architecture supports the JPEG2000 default filters and has been realized in ARM-based ALTERA EPXA10 Development Board at a frequency of 44.33 MHz.
Toan H. VU An DANG Jia-Ching WANG
We develop a deep neural network (DNN) for detecting driver drowsiness in videos. The proposed DNN model that receives driver's faces extracted from video frames as inputs consists of three components - a convolutional neural network (CNN), a convolutional control gate-based recurrent neural network (ConvCGRNN), and a voting layer. The CNN is to learn facial representations from global faces which are then fed to the ConvCGRNN to learn their temporal dependencies. The voting layer works like an ensemble of many sub-classifiers to predict drowsiness state. Experimental results on the NTHU-DDD dataset show that our model not only achieve a competitive accuracy of 84.81% without any post-processing but it can work in real-time with a high speed of about 100 fps.