Author Search Result

[Author] Minsoo HAHN(15hit)

1-15hit
  • Efficient Residual Coding Method of Spatial Audio Object Coding with Two-Step Coding Structure for Interactive Audio Services

    Byonghwa LEE  Kwangki KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/04/08
      Vol:
    E99-D No:7
      Page(s):
    1949-1952

    In interactive audio services, users can render audio objects rather freely to match their desires and the spatial audio object coding (SAOC) scheme is fairly good both in the sense of bitrate and audio quality. But rather perceptible audio quality degradation can occur when an object is suppressed or played alone. To complement this, the SAOC scheme with Two-Step Coding (SAOC-TSC) was proposed. But the bitrate of the side information increases two times compared to that of the original SAOC due to the bitrate needed for the residual coding used to enhance the audio quality. In this paper, an efficient residual coding method of the SAOC-TSC is proposed to reduce the side information bitrate without audio quality degradation or complexity increase.

  • A New Speech Enhancement Algorithm for Car Environment Noise Cancellation with MBD and Kalman Filtering

    Seungkwon BEACK  Seung H. NAM  Minsoo HAHN  

     
    LETTER

      Vol:
    E88-A No:3
      Page(s):
    685-689

    We present a new speech enhancement algorithm in a car environment with two microphones. The car audio signals and other background noises are the target noises to be suppressed. Our algorithm is composed of two main parts, i.e., the spatial and the temporal processes. The multi-channel blind deconvolution (MBD) is applied to the spatial process while the Kalman filter with a second-order high pass filter, for the temporal one. For the fast convergence, the MBD is newly expressed in frequency-domain with a normalization matrix. The final performance evaluated with the severely car noise corrupted speech shows that our algorithm produces noticeably enhanced speech.

  • Soft Counting Poisson Mixture Model-Based Polling Method for Speech/Nonspeech Classification

    Youngjoo SUH  Hoirin KIM  Minsoo HAHN  Yongju LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:12
      Page(s):
    2994-2997

    In this letter, a new segment-level speech/nonspeech classification method based on the Poisson polling technique is proposed. The proposed method makes two modifications from the baseline Poisson polling method to further improve the classification accuracy. One of them is to employ Poisson mixture models to more accurately represent various segmental patterns of the observed frequencies for frame-level input features. The other is the soft counting-based frequency estimation to improve the reliability of the observed frequencies. The effectiveness of the proposed method is confirmed by the experimental results showing the maximum error reduction of 39% compared to the segmentally accumulated log-likelihood ratio-based method.

  • Mastering Signal Processing in MPEG SAOC

    Kwangki KIM  Minsoo HAHN  Jinsul KIM  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:12
      Page(s):
    3053-3059

    MPEG spatial audio object coding (SAOC) is a new audio coding standard which efficiently represents various audio objects as a down-mix signal and spatial parameters. MPEG SAOC has a backward compatibility with existing playback systems for the down-mix signal. If a mastering signal is used for providing CD-like sound quality instead of the down-mix signal, an output signal decoded with the mastering signal may be easily degraded due to the difference between the down-mix and the mastering signals. To successfully use the mastering signal in MPEG SAOC, the difference between two signals should be eliminated. As a simple mastering signal processing, we propose a mastering signal processing using the mastering down-mix gain (MDG) which is similar to the arbitrary down-mix gain of MPEG Surround. Also, we propose an enhanced mastering signal processing using the MDG bias in order to reduce quantization errors of the MDG. Experimental results show that the proposed schemes can improve sound quality of the output signal decoded with the mastering signal. Especially, the enhanced method shows better performance than the simple method in the aspects of the quantization errors and the sound quality.

  • An Efficient Shared Adaptive Packet Loss Concealment Scheme through 1-Port Gateway System for Internet Telephony Service

    Jinsul KIM  Hyunwoo LEE  Won RYU  Byungsun LEE  Minsoo HAHN  

     
    LETTER-QoS Control Mechanism and System

      Vol:
    E91-B No:5
      Page(s):
    1370-1374

    In this letter, we propose a shared adaptive packet loss concealment scheme for the high quality guaranteed Internet telephony service which connects multiple users. In order to recover packet loss efficiently in the all-IP based convergence environment, we provide a robust signal recovery scheme which is based on the shared adaptive both-side information utilization. This scheme is provided according to the average magnitude variation across the frames and the pitch period replication on the 1-port gateway (G/W) system. The simulated performance demonstrates that the proposed scheme has the advantages of low processing times and high recovery rates in the all-IP based ubiquitous environment.

  • Pathological Voice Detection Using Efficient Combination of Heterogeneous Features

    Ji-Yeoun LEE  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:2
      Page(s):
    367-370

    Combination of mutually complementary features is necessary to cope with various changes in pattern classification between normal and pathological voices. This paper proposes a method to improve pathological/normal voice classification performance by combining heterogeneous features. Different combinations of auditory-based and higher-order features are investigated. Their performances are measured by Gaussian mixture models (GMMs), linear discriminant analysis (LDA), and a classification and regression tree (CART) method. The proposed classification method by using the CART analysis is shown to be an effective method for pathological voice detection, with a 92.7% classification performance rate. This is a noticeable improvement of 54.32% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • Objective Pathological Voice Quality Assessment Based on HOS Features

    Ji-Yeoun LEE  Sangbae JEONG  Hong-Shik CHOI  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:12
      Page(s):
    2888-2891

    This work proposes new features to improve the pathological voice quality classification performance. They are the means, the variances, and the perturbations of the higher-order statistics (HOS) such as the skewness and the kurtosis. The HOS-based features show meaningful differences among normal, grade 1, grade 2, and grade 3 voices classified in the GRBAS scale. The jitter, the shimmer, the harmonic-to-noise ratio (HNR), and the variance of the short-time energy are utilized as the conventional features. The performances are measured by the classification and regression tree (CART) method. Specifically, the CART-based method by utilizing both the conventional features and the HOS-based ones shows its effectiveness in the pathological voice quality measurement, with the classification accuracy of 87.8%.

  • New Variable-Bit-Rate Scheme for Waveform Interpolative Coders

    Heesik YANG  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Digital Signal Processing

      Vol:
    E90-A No:7
      Page(s):
    1469-1472

    In this paper, we propose a new variable-bit-rate speech coder based on the waveform interpolation concept. After the coder extracts all parameters, the amounts of distortions between the current and the predicted parameters, which are estimated by extrapolation using the past two parameters, are measured for all parameters. A parameter is not transmitted unless the distortion exceeds the preset threshold. At the decoder side, the non-transmitted parameter is reconstructed by extrapolation with the past two parameters used to synthesize signals. In this way, we can reduce 26% of the total bit rate while maintaining the speech quality degradation below the 0.1 perceptual evaluation of speech quality (PESQ) score.

  • Probabilistic Adaptation Mode Control Algorithm for GSC-Based Noise Reduction

    Seungho HAN  Jungpyo HONG  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E93-A No:3
      Page(s):
    627-630

    An efficient noise reduction algorithm is proposed to improve speech recognition performance for human machine interfaces. In the algorithm, a probabilistic adaptation mode controller (AMC) is designed and adopted to the generalized sidelobe canceller (GSC). To detect target speech intervals, the proposed AMC calculates the inter-channel correlation and estimates speech absence probability (SAP). Based on the SAP, the adaptation mode of the adaptive filter in the GSC is decided. Experimental results show the proposed algorithm significantly improves speech recognition performances and signal-to-noise ratios in real noisy environments.

  • Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

    Sang-Jin KIM  Jong-Jin KIM  Minsoo HAHN  

     
    LETTER

      Vol:
    E89-D No:3
      Page(s):
    1116-1119

    Development of a hidden Markov model (HMM)-based Korean speech synthesis system and its evaluation is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, morpheme, word phrase, utterance, and break strength. The developed system produced speech with a fairly good prosody. The synthesized speech is evaluated and compared with that of our corpus-based unit concatenating Korean text-to-speech system. The two systems were trained with the same manually labeled speech database.

  • Improved Spectral Envelope Coding Algorithm Using Adaptive Filtering for G.729.1

    Keunseok CHO  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E97-A No:11
      Page(s):
    2254-2257

    This paper proposes a new algorithm to encode the spectral envelope for G.729.1 more accurately. It applies the normalized least-mean- square (NLMS) algorithm to each subband energy of the modified discrete cosine transform (MDCT) in the time-domain alias cancellation (TDAC) of G.729.1. By utilizing the estimation error of subband energies by means of NLMS, allocated bit reduction for spectral envelope coding is achieved. The saved bits are then reused to improve the spectral envelope estimation and thus enhance the sound quality. Experimental results confirm that the proposed algorithm improves the sound quality under both clean and packet loss conditions.

  • Improved Noise Reduction with Packet Loss Recovery Based on Post-Filtering over IP Networks

    Jinsul KIM  Hyunwoo LEE  Won RYU  Seungho HAN  Minsoo HAHN  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E91-B No:3
      Page(s):
    975-979

    This letter mainly focuses on improving current noise reduction methods to solve the critical speech distortion problems with robust noise reduction in noisy speech signals for speech enhancement over IP networks. For robust noise reduction with packet loss recovery, we propose a novel optimized Wiener filtering technique that uses the estimated SNR (Signal-to-Noise Ratio) with packet loss recovery method which is applied as post-filtering over IP-networks. Simulation results demonstrate that the proposed scheme provides better reduction and recovery rates with considering packet loss and SNR environment than other methods.

  • Response Time Reduction of Speech Recognizers Using Single Gaussians

    Sangbae JEONG  Hoirin KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:5
      Page(s):
    868-871

    In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.

  • Two-Band Excitation for HMM-Based Speech Synthesis

    Sang-Jin KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:1
      Page(s):
    378-381

    This letter describes a two-band excitation model for HMM-based speech synthesis. The HMM-based speech synthesis system generates speech from the HMM training data of the spectral and excitation parameters. Synthesized speech has a typical quality of "vocoded sound" mostly because of the simple excitation model with the voiced/unvoiced selection. In this letter, two-band excitation based on the harmonic plus noise speech model is proposed for generating the mixed excitation source. With this model, we can generate the mixed excitation more accurately and reduce the memory for the trained excitation data as well.

  • An Enhanced Distortion Measure Based VBR for Waveform Interpolative Speech Coders

    Heesik YANG  Sangbae JEONG  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E91-A No:4
      Page(s):
    1222-1225

    In our previous study, a distortion measure based variable bit rate (DM-VBR) scheme in waveform interpolation (WI) coders was proposed. In this paper, the repetition method is proposed to estimate non-transmitted parameters instead of the extrapolation method. For the further reduction of slowly evolving waveform (SEW) bit rates, the dimensions of the past parameters, which are different from those of the current parameters, are converted to match the dimension of the current ones. Distortions between interpolated sub-frames and original sub-frames are measured for the reduction of the SEW parameters. And the usefulness of several other distortion measures is also investigated instead of the simple log spectral distortion. Experimental results show that the coder adopting the new schemes offers above 41% bit rate reduction with almost unnoticeable output speech degradation.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.